All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 39/51] drm/i915/guc: Connect reset modparam updates to GuC policy flags
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:04     ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

On Fri, Jul 16, 2021 at 01:17:12PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Changing the reset module parameter has no effect on a running GuC.
> The corresponding entry in the ADS must be updated and then the GuC
> informed via a Host2GuC message.
> 
> The new debugfs interface to module parameters allows this to happen.
> However, connecting the parameter data address back to anything useful
> is messy. One option would be to pass a new private data structure
> address through instead of just the parameter pointer. However, that
> means having a new (and different) data structure for each parameter
> and a new (and different) write function for each parameter. This
> method keeps everything generic by instead using a string lookup on
> the directory entry name.
> 
> Signed-off-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c |  2 +-
>  drivers/gpu/drm/i915/i915_debugfs_params.c | 31 ++++++++++++++++++++++
>  2 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 2ad5fcd4e1b7..c6d0b762d82c 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -99,7 +99,7 @@ static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
>  		policy_offset
>  	};
>  
> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>  }
>  
>  int intel_guc_global_policies_update(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/i915_debugfs_params.c b/drivers/gpu/drm/i915/i915_debugfs_params.c
> index 4e2b077692cb..8ecd8b42f048 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs_params.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs_params.c
> @@ -6,9 +6,20 @@
>  #include <linux/kernel.h>
>  
>  #include "i915_debugfs_params.h"
> +#include "gt/intel_gt.h"
> +#include "gt/uc/intel_guc.h"
>  #include "i915_drv.h"
>  #include "i915_params.h"
>  
> +#define MATCH_DEBUGFS_NODE_NAME(_file, _name)	(strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0)
> +
> +#define GET_I915(i915, name, ptr)	\
> +	do {	\
> +		struct i915_params *params;	\
> +		params = container_of(((void *) (ptr)), typeof(*params), name);	\
> +		(i915) = container_of(params, typeof(*(i915)), params);	\
> +	} while(0)
> +
>  /* int param */
>  static int i915_param_int_show(struct seq_file *m, void *data)
>  {
> @@ -24,6 +35,16 @@ static int i915_param_int_open(struct inode *inode, struct file *file)
>  	return single_open(file, i915_param_int_show, inode->i_private);
>  }
>  
> +static int notify_guc(struct drm_i915_private *i915)
> +{
> +	int ret = 0;
> +
> +	if (intel_uc_uses_guc_submission(&i915->gt.uc))
> +		ret = intel_guc_global_policies_update(&i915->gt.uc.guc);
> +
> +	return ret;
> +}
> +
>  static ssize_t i915_param_int_write(struct file *file,
>  				    const char __user *ubuf, size_t len,
>  				    loff_t *offp)
> @@ -81,8 +102,10 @@ static ssize_t i915_param_uint_write(struct file *file,
>  				     const char __user *ubuf, size_t len,
>  				     loff_t *offp)
>  {
> +	struct drm_i915_private *i915;
>  	struct seq_file *m = file->private_data;
>  	unsigned int *value = m->private;
> +	unsigned int old = *value;
>  	int ret;
>  
>  	ret = kstrtouint_from_user(ubuf, len, 0, value);
> @@ -95,6 +118,14 @@ static ssize_t i915_param_uint_write(struct file *file,
>  			*value = b;
>  	}
>  
> +	if (!ret && MATCH_DEBUGFS_NODE_NAME(file, "reset")) {
> +		GET_I915(i915, reset, value);
> +
> +		ret = notify_guc(i915);
> +		if (ret)
> +			*value = old;
> +	}
> +
>  	return ret ?: len;
>  }
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 39/51] drm/i915/guc: Connect reset modparam updates to GuC policy flags
@ 2021-07-16 20:04     ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:04 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:12PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Changing the reset module parameter has no effect on a running GuC.
> The corresponding entry in the ADS must be updated and then the GuC
> informed via a Host2GuC message.
> 
> The new debugfs interface to module parameters allows this to happen.
> However, connecting the parameter data address back to anything useful
> is messy. One option would be to pass a new private data structure
> address through instead of just the parameter pointer. However, that
> means having a new (and different) data structure for each parameter
> and a new (and different) write function for each parameter. This
> method keeps everything generic by instead using a string lookup on
> the directory entry name.
> 
> Signed-off-by: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c |  2 +-
>  drivers/gpu/drm/i915/i915_debugfs_params.c | 31 ++++++++++++++++++++++
>  2 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 2ad5fcd4e1b7..c6d0b762d82c 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -99,7 +99,7 @@ static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
>  		policy_offset
>  	};
>  
> -	return intel_guc_send(guc, action, ARRAY_SIZE(action));
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>  }
>  
>  int intel_guc_global_policies_update(struct intel_guc *guc)
> diff --git a/drivers/gpu/drm/i915/i915_debugfs_params.c b/drivers/gpu/drm/i915/i915_debugfs_params.c
> index 4e2b077692cb..8ecd8b42f048 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs_params.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs_params.c
> @@ -6,9 +6,20 @@
>  #include <linux/kernel.h>
>  
>  #include "i915_debugfs_params.h"
> +#include "gt/intel_gt.h"
> +#include "gt/uc/intel_guc.h"
>  #include "i915_drv.h"
>  #include "i915_params.h"
>  
> +#define MATCH_DEBUGFS_NODE_NAME(_file, _name)	(strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0)
> +
> +#define GET_I915(i915, name, ptr)	\
> +	do {	\
> +		struct i915_params *params;	\
> +		params = container_of(((void *) (ptr)), typeof(*params), name);	\
> +		(i915) = container_of(params, typeof(*(i915)), params);	\
> +	} while(0)
> +
>  /* int param */
>  static int i915_param_int_show(struct seq_file *m, void *data)
>  {
> @@ -24,6 +35,16 @@ static int i915_param_int_open(struct inode *inode, struct file *file)
>  	return single_open(file, i915_param_int_show, inode->i_private);
>  }
>  
> +static int notify_guc(struct drm_i915_private *i915)
> +{
> +	int ret = 0;
> +
> +	if (intel_uc_uses_guc_submission(&i915->gt.uc))
> +		ret = intel_guc_global_policies_update(&i915->gt.uc.guc);
> +
> +	return ret;
> +}
> +
>  static ssize_t i915_param_int_write(struct file *file,
>  				    const char __user *ubuf, size_t len,
>  				    loff_t *offp)
> @@ -81,8 +102,10 @@ static ssize_t i915_param_uint_write(struct file *file,
>  				     const char __user *ubuf, size_t len,
>  				     loff_t *offp)
>  {
> +	struct drm_i915_private *i915;
>  	struct seq_file *m = file->private_data;
>  	unsigned int *value = m->private;
> +	unsigned int old = *value;
>  	int ret;
>  
>  	ret = kstrtouint_from_user(ubuf, len, 0, value);
> @@ -95,6 +118,14 @@ static ssize_t i915_param_uint_write(struct file *file,
>  			*value = b;
>  	}
>  
> +	if (!ret && MATCH_DEBUGFS_NODE_NAME(file, "reset")) {
> +		GET_I915(i915, reset, value);
> +
> +		ret = notify_guc(i915);
> +		if (ret)
> +			*value = old;
> +	}
> +
>  	return ret ?: len;
>  }
>  
> -- 
> 2.28.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 44/51] drm/i915/selftest: Better error reporting from hangcheck selftest
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:13     ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:17PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> There are many ways in which the hangcheck selftest can fail. Very few
> of them actually printed an error message to say what happened. So,
> fill in the missing messages.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 89 ++++++++++++++++----
>  1 file changed, 72 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 7aea10aa1fb4..0ed87cc4d063 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -378,6 +378,7 @@ static int igt_reset_nop(void *arg)
>  			ce = intel_context_create(engine);
>  			if (IS_ERR(ce)) {
>  				err = PTR_ERR(ce);
> +				pr_err("[%s] Create context failed: %d!\n", engine->name, err);
>  				break;
>  			}
>  
> @@ -387,6 +388,7 @@ static int igt_reset_nop(void *arg)
>  				rq = intel_context_create_request(ce);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
> +					pr_err("[%s] Create request failed: %d!\n", engine->name, err);
>  					break;
>  				}
>  
> @@ -401,24 +403,31 @@ static int igt_reset_nop(void *arg)
>  		igt_global_reset_unlock(gt);
>  
>  		if (intel_gt_is_wedged(gt)) {
> +			pr_err("[%s] GT is wedged!\n", engine->name);
>  			err = -EIO;
>  			break;
>  		}
>  
>  		if (i915_reset_count(global) != reset_count + ++count) {
> -			pr_err("Full GPU reset not recorded!\n");
> +			pr_err("[%s] Reset not recorded: %d vs %d + %d!\n",
> +			       engine->name, i915_reset_count(global), reset_count, count);
>  			err = -EINVAL;
>  			break;
>  		}
>  
>  		err = igt_flush_test(gt->i915);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  	} while (time_before(jiffies, end_time));
>  	pr_info("%s: %d resets\n", __func__, count);
>  
> -	if (igt_flush_test(gt->i915))
> +	if (igt_flush_test(gt->i915)) {
> +		pr_err("Post flush failed: %d!\n", err);
>  		err = -EIO;
> +	}
> +
>  	return err;
>  }
>  
> @@ -441,8 +450,10 @@ static int igt_reset_nop_engine(void *arg)
>  		int err;
>  
>  		ce = intel_context_create(engine);
> -		if (IS_ERR(ce))
> +		if (IS_ERR(ce)) {
> +			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
>  			return PTR_ERR(ce);
> +		}
>  
>  		reset_count = i915_reset_count(global);
>  		reset_engine_count = i915_reset_engine_count(global, engine);
> @@ -550,8 +561,10 @@ static int igt_reset_fail_engine(void *arg)
>  		int err;
>  
>  		ce = intel_context_create(engine);
> -		if (IS_ERR(ce))
> +		if (IS_ERR(ce)) {
> +			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
>  			return PTR_ERR(ce);
> +		}
>  
>  		st_engine_heartbeat_disable(engine);
>  		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
> @@ -711,6 +724,7 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  				rq = hang_create_request(&h, engine);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
> +					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  					break;
>  				}
>  
> @@ -765,12 +779,16 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  			break;
>  
>  		err = igt_flush_test(gt->i915);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  	}
>  
> -	if (intel_gt_is_wedged(gt))
> +	if (intel_gt_is_wedged(gt)) {
> +		pr_err("GT is wedged!\n");
>  		err = -EIO;
> +	}
>  
>  	if (active)
>  		hang_fini(&h);
> @@ -837,6 +855,7 @@ static int active_engine(void *data)
>  		ce[count] = intel_context_create(engine);
>  		if (IS_ERR(ce[count])) {
>  			err = PTR_ERR(ce[count]);
> +			pr_err("[%s] Create context #%ld failed: %d!\n", engine->name, count, err);
>  			while (--count)
>  				intel_context_put(ce[count]);
>  			return err;
> @@ -852,6 +871,7 @@ static int active_engine(void *data)
>  		new = intel_context_create_request(ce[idx]);
>  		if (IS_ERR(new)) {
>  			err = PTR_ERR(new);
> +			pr_err("[%s] Create request #%d failed: %d!\n", engine->name, idx, err);
>  			break;
>  		}
>  
> @@ -867,8 +887,10 @@ static int active_engine(void *data)
>  		}
>  
>  		err = active_request_put(old);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Request put failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  
>  		cond_resched();
>  	}
> @@ -876,6 +898,9 @@ static int active_engine(void *data)
>  	for (count = 0; count < ARRAY_SIZE(rq); count++) {
>  		int err__ = active_request_put(rq[count]);
>  
> +		if (err)
> +			pr_err("[%s] Request put #%ld failed: %d!\n", engine->name, count, err);
> +
>  		/* Keep the first error */
>  		if (!err)
>  			err = err__;
> @@ -949,6 +974,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  					  "igt/%s", other->name);
>  			if (IS_ERR(tsk)) {
>  				err = PTR_ERR(tsk);
> +				pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err);
>  				goto unwind;
>  			}
>  
> @@ -967,6 +993,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  				rq = hang_create_request(&h, engine);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
> +					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  					break;
>  				}
>  
> @@ -999,10 +1026,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  			if (rq) {
>  				if (rq->fence.error != -EIO) {
>  					pr_err("i915_reset_engine(%s:%s):"
> -					       " failed to reset request %llx:%lld\n",
> +					       " failed to reset request %lld:%lld [0x%04X]\n",
>  					       engine->name, test_name,
>  					       rq->fence.context,
> -					       rq->fence.seqno);
> +					       rq->fence.seqno, rq->context->guc_id);
>  					i915_request_put(rq);
>  
>  					GEM_TRACE_DUMP();
> @@ -1101,8 +1128,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  			break;
>  
>  		err = igt_flush_test(gt->i915);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  	}
>  
>  	if (intel_gt_is_wedged(gt))
> @@ -1180,12 +1209,15 @@ static int igt_reset_wait(void *arg)
>  	igt_global_reset_lock(gt);
>  
>  	err = hang_init(&h, gt);
> -	if (err)
> +	if (err) {
> +		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
>  		goto unlock;
> +	}
>  
>  	rq = hang_create_request(&h, engine);
>  	if (IS_ERR(rq)) {
>  		err = PTR_ERR(rq);
> +		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  		goto fini;
>  	}
>  
> @@ -1310,12 +1342,15 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	/* Check that we can recover an unbind stuck on a hanging request */
>  
>  	err = hang_init(&h, gt);
> -	if (err)
> +	if (err) {
> +		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
>  		return err;
> +	}
>  
>  	obj = i915_gem_object_create_internal(gt->i915, SZ_1M);
>  	if (IS_ERR(obj)) {
>  		err = PTR_ERR(obj);
> +		pr_err("[%s] Create object failed: %d!\n", engine->name, err);
>  		goto fini;
>  	}
>  
> @@ -1330,12 +1365,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	arg.vma = i915_vma_instance(obj, vm, NULL);
>  	if (IS_ERR(arg.vma)) {
>  		err = PTR_ERR(arg.vma);
> +		pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
>  		goto out_obj;
>  	}
>  
>  	rq = hang_create_request(&h, engine);
>  	if (IS_ERR(rq)) {
>  		err = PTR_ERR(rq);
> +		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  		goto out_obj;
>  	}
>  
> @@ -1347,6 +1384,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	err = i915_vma_pin(arg.vma, 0, 0, pin_flags);
>  	if (err) {
>  		i915_request_add(rq);
> +		pr_err("[%s] VMA pin failed: %d!\n", engine->name, err);
>  		goto out_obj;
>  	}
>  
> @@ -1363,8 +1401,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	i915_vma_lock(arg.vma);
>  	err = i915_request_await_object(rq, arg.vma->obj,
>  					flags & EXEC_OBJECT_WRITE);
> -	if (err == 0)
> +	if (err == 0) {
>  		err = i915_vma_move_to_active(arg.vma, rq, flags);
> +		if (err)
> +			pr_err("[%s] Move to active failed: %d!\n", engine->name, err);
> +	} else {
> +		pr_err("[%s] Request await failed: %d!\n", engine->name, err);
> +	}
> +
>  	i915_vma_unlock(arg.vma);
>  
>  	if (flags & EXEC_OBJECT_NEEDS_FENCE)
> @@ -1392,6 +1436,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	tsk = kthread_run(fn, &arg, "igt/evict_vma");
>  	if (IS_ERR(tsk)) {
>  		err = PTR_ERR(tsk);
> +		pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err);
>  		tsk = NULL;
>  		goto out_reset;
>  	}
> @@ -1518,6 +1563,7 @@ static int igt_reset_queue(void *arg)
>  		prev = hang_create_request(&h, engine);
>  		if (IS_ERR(prev)) {
>  			err = PTR_ERR(prev);
> +			pr_err("[%s] Create 'prev' hang request failed: %d!\n", engine->name, err);
>  			goto fini;
>  		}
>  
> @@ -1532,6 +1578,7 @@ static int igt_reset_queue(void *arg)
>  			rq = hang_create_request(&h, engine);
>  			if (IS_ERR(rq)) {
>  				err = PTR_ERR(rq);
> +				pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  				goto fini;
>  			}
>  
> @@ -1619,8 +1666,10 @@ static int igt_reset_queue(void *arg)
>  		i915_request_put(prev);
>  
>  		err = igt_flush_test(gt->i915);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  	}
>  
>  fini:
> @@ -1653,12 +1702,15 @@ static int igt_handle_error(void *arg)
>  		return 0;
>  
>  	err = hang_init(&h, gt);
> -	if (err)
> +	if (err) {
> +		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
>  		return err;
> +	}
>  
>  	rq = hang_create_request(&h, engine);
>  	if (IS_ERR(rq)) {
>  		err = PTR_ERR(rq);
> +		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  		goto err_fini;
>  	}
>  
> @@ -1743,12 +1795,15 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
>  		return err;
>  
>  	err = hang_init(&h, engine->gt);
> -	if (err)
> +	if (err) {
> +		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
>  		return err;
> +	}
>  
>  	rq = hang_create_request(&h, engine);
>  	if (IS_ERR(rq)) {
>  		err = PTR_ERR(rq);
> +		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  		goto out;
>  	}
>  
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 44/51] drm/i915/selftest: Better error reporting from hangcheck selftest
@ 2021-07-16 20:13     ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:13 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:17PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> There are many ways in which the hangcheck selftest can fail. Very few
> of them actually printed an error message to say what happened. So,
> fill in the missing messages.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 89 ++++++++++++++++----
>  1 file changed, 72 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 7aea10aa1fb4..0ed87cc4d063 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -378,6 +378,7 @@ static int igt_reset_nop(void *arg)
>  			ce = intel_context_create(engine);
>  			if (IS_ERR(ce)) {
>  				err = PTR_ERR(ce);
> +				pr_err("[%s] Create context failed: %d!\n", engine->name, err);
>  				break;
>  			}
>  
> @@ -387,6 +388,7 @@ static int igt_reset_nop(void *arg)
>  				rq = intel_context_create_request(ce);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
> +					pr_err("[%s] Create request failed: %d!\n", engine->name, err);
>  					break;
>  				}
>  
> @@ -401,24 +403,31 @@ static int igt_reset_nop(void *arg)
>  		igt_global_reset_unlock(gt);
>  
>  		if (intel_gt_is_wedged(gt)) {
> +			pr_err("[%s] GT is wedged!\n", engine->name);
>  			err = -EIO;
>  			break;
>  		}
>  
>  		if (i915_reset_count(global) != reset_count + ++count) {
> -			pr_err("Full GPU reset not recorded!\n");
> +			pr_err("[%s] Reset not recorded: %d vs %d + %d!\n",
> +			       engine->name, i915_reset_count(global), reset_count, count);
>  			err = -EINVAL;
>  			break;
>  		}
>  
>  		err = igt_flush_test(gt->i915);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  	} while (time_before(jiffies, end_time));
>  	pr_info("%s: %d resets\n", __func__, count);
>  
> -	if (igt_flush_test(gt->i915))
> +	if (igt_flush_test(gt->i915)) {
> +		pr_err("Post flush failed: %d!\n", err);
>  		err = -EIO;
> +	}
> +
>  	return err;
>  }
>  
> @@ -441,8 +450,10 @@ static int igt_reset_nop_engine(void *arg)
>  		int err;
>  
>  		ce = intel_context_create(engine);
> -		if (IS_ERR(ce))
> +		if (IS_ERR(ce)) {
> +			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
>  			return PTR_ERR(ce);
> +		}
>  
>  		reset_count = i915_reset_count(global);
>  		reset_engine_count = i915_reset_engine_count(global, engine);
> @@ -550,8 +561,10 @@ static int igt_reset_fail_engine(void *arg)
>  		int err;
>  
>  		ce = intel_context_create(engine);
> -		if (IS_ERR(ce))
> +		if (IS_ERR(ce)) {
> +			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
>  			return PTR_ERR(ce);
> +		}
>  
>  		st_engine_heartbeat_disable(engine);
>  		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
> @@ -711,6 +724,7 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  				rq = hang_create_request(&h, engine);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
> +					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  					break;
>  				}
>  
> @@ -765,12 +779,16 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  			break;
>  
>  		err = igt_flush_test(gt->i915);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  	}
>  
> -	if (intel_gt_is_wedged(gt))
> +	if (intel_gt_is_wedged(gt)) {
> +		pr_err("GT is wedged!\n");
>  		err = -EIO;
> +	}
>  
>  	if (active)
>  		hang_fini(&h);
> @@ -837,6 +855,7 @@ static int active_engine(void *data)
>  		ce[count] = intel_context_create(engine);
>  		if (IS_ERR(ce[count])) {
>  			err = PTR_ERR(ce[count]);
> +			pr_err("[%s] Create context #%ld failed: %d!\n", engine->name, count, err);
>  			while (--count)
>  				intel_context_put(ce[count]);
>  			return err;
> @@ -852,6 +871,7 @@ static int active_engine(void *data)
>  		new = intel_context_create_request(ce[idx]);
>  		if (IS_ERR(new)) {
>  			err = PTR_ERR(new);
> +			pr_err("[%s] Create request #%d failed: %d!\n", engine->name, idx, err);
>  			break;
>  		}
>  
> @@ -867,8 +887,10 @@ static int active_engine(void *data)
>  		}
>  
>  		err = active_request_put(old);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Request put failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  
>  		cond_resched();
>  	}
> @@ -876,6 +898,9 @@ static int active_engine(void *data)
>  	for (count = 0; count < ARRAY_SIZE(rq); count++) {
>  		int err__ = active_request_put(rq[count]);
>  
> +		if (err)
> +			pr_err("[%s] Request put #%ld failed: %d!\n", engine->name, count, err);
> +
>  		/* Keep the first error */
>  		if (!err)
>  			err = err__;
> @@ -949,6 +974,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  					  "igt/%s", other->name);
>  			if (IS_ERR(tsk)) {
>  				err = PTR_ERR(tsk);
> +				pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err);
>  				goto unwind;
>  			}
>  
> @@ -967,6 +993,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  				rq = hang_create_request(&h, engine);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
> +					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  					break;
>  				}
>  
> @@ -999,10 +1026,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  			if (rq) {
>  				if (rq->fence.error != -EIO) {
>  					pr_err("i915_reset_engine(%s:%s):"
> -					       " failed to reset request %llx:%lld\n",
> +					       " failed to reset request %lld:%lld [0x%04X]\n",
>  					       engine->name, test_name,
>  					       rq->fence.context,
> -					       rq->fence.seqno);
> +					       rq->fence.seqno, rq->context->guc_id);
>  					i915_request_put(rq);
>  
>  					GEM_TRACE_DUMP();
> @@ -1101,8 +1128,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  			break;
>  
>  		err = igt_flush_test(gt->i915);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  	}
>  
>  	if (intel_gt_is_wedged(gt))
> @@ -1180,12 +1209,15 @@ static int igt_reset_wait(void *arg)
>  	igt_global_reset_lock(gt);
>  
>  	err = hang_init(&h, gt);
> -	if (err)
> +	if (err) {
> +		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
>  		goto unlock;
> +	}
>  
>  	rq = hang_create_request(&h, engine);
>  	if (IS_ERR(rq)) {
>  		err = PTR_ERR(rq);
> +		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  		goto fini;
>  	}
>  
> @@ -1310,12 +1342,15 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	/* Check that we can recover an unbind stuck on a hanging request */
>  
>  	err = hang_init(&h, gt);
> -	if (err)
> +	if (err) {
> +		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
>  		return err;
> +	}
>  
>  	obj = i915_gem_object_create_internal(gt->i915, SZ_1M);
>  	if (IS_ERR(obj)) {
>  		err = PTR_ERR(obj);
> +		pr_err("[%s] Create object failed: %d!\n", engine->name, err);
>  		goto fini;
>  	}
>  
> @@ -1330,12 +1365,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	arg.vma = i915_vma_instance(obj, vm, NULL);
>  	if (IS_ERR(arg.vma)) {
>  		err = PTR_ERR(arg.vma);
> +		pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
>  		goto out_obj;
>  	}
>  
>  	rq = hang_create_request(&h, engine);
>  	if (IS_ERR(rq)) {
>  		err = PTR_ERR(rq);
> +		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  		goto out_obj;
>  	}
>  
> @@ -1347,6 +1384,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	err = i915_vma_pin(arg.vma, 0, 0, pin_flags);
>  	if (err) {
>  		i915_request_add(rq);
> +		pr_err("[%s] VMA pin failed: %d!\n", engine->name, err);
>  		goto out_obj;
>  	}
>  
> @@ -1363,8 +1401,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	i915_vma_lock(arg.vma);
>  	err = i915_request_await_object(rq, arg.vma->obj,
>  					flags & EXEC_OBJECT_WRITE);
> -	if (err == 0)
> +	if (err == 0) {
>  		err = i915_vma_move_to_active(arg.vma, rq, flags);
> +		if (err)
> +			pr_err("[%s] Move to active failed: %d!\n", engine->name, err);
> +	} else {
> +		pr_err("[%s] Request await failed: %d!\n", engine->name, err);
> +	}
> +
>  	i915_vma_unlock(arg.vma);
>  
>  	if (flags & EXEC_OBJECT_NEEDS_FENCE)
> @@ -1392,6 +1436,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  	tsk = kthread_run(fn, &arg, "igt/evict_vma");
>  	if (IS_ERR(tsk)) {
>  		err = PTR_ERR(tsk);
> +		pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err);
>  		tsk = NULL;
>  		goto out_reset;
>  	}
> @@ -1518,6 +1563,7 @@ static int igt_reset_queue(void *arg)
>  		prev = hang_create_request(&h, engine);
>  		if (IS_ERR(prev)) {
>  			err = PTR_ERR(prev);
> +			pr_err("[%s] Create 'prev' hang request failed: %d!\n", engine->name, err);
>  			goto fini;
>  		}
>  
> @@ -1532,6 +1578,7 @@ static int igt_reset_queue(void *arg)
>  			rq = hang_create_request(&h, engine);
>  			if (IS_ERR(rq)) {
>  				err = PTR_ERR(rq);
> +				pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  				goto fini;
>  			}
>  
> @@ -1619,8 +1666,10 @@ static int igt_reset_queue(void *arg)
>  		i915_request_put(prev);
>  
>  		err = igt_flush_test(gt->i915);
> -		if (err)
> +		if (err) {
> +			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
>  			break;
> +		}
>  	}
>  
>  fini:
> @@ -1653,12 +1702,15 @@ static int igt_handle_error(void *arg)
>  		return 0;
>  
>  	err = hang_init(&h, gt);
> -	if (err)
> +	if (err) {
> +		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
>  		return err;
> +	}
>  
>  	rq = hang_create_request(&h, engine);
>  	if (IS_ERR(rq)) {
>  		err = PTR_ERR(rq);
> +		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  		goto err_fini;
>  	}
>  
> @@ -1743,12 +1795,15 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
>  		return err;
>  
>  	err = hang_init(&h, engine->gt);
> -	if (err)
> +	if (err) {
> +		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
>  		return err;
> +	}
>  
>  	rq = hang_create_request(&h, engine);
>  	if (IS_ERR(rq)) {
>  		err = PTR_ERR(rq);
> +		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
>  		goto out;
>  	}
>  
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* [PATCH 00/51] GuC submission support
@ 2021-07-16 20:16 ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

As discussed in [1], [2] we are enabling GuC submission support in the
i915. This is a subset of the patches in step 5 described in [1],
basically it is absolute to enable CI with GuC submission on gen11+
platforms.

This series itself will likely be broken down into smaller patch sets to
merge. Likely into CTBs changes, basic submission, virtual engines, and
resets.

A following series will address the missing patches remaining from [1].

Locally tested on TGL machine and basic tests seem to be passing.

v2: Address all review comments in [3], include several more patches to
make CI [4] happy.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/series/89844/
[2] https://patchwork.freedesktop.org/series/91417/
[3] https://patchwork.freedesktop.org/series/91840/
[4] https://patchwork.freedesktop.org/series/91885/

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Daniele Ceraolo Spurio (1):
  drm/i915/guc: Unblock GuC submission on Gen11+

John Harrison (12):
  drm/i915: Track 'serial' counts for virtual engines
  drm/i915/guc: Provide mmio list to be saved/restored on engine reset
  drm/i915/guc: Don't complain about reset races
  drm/i915/guc: Enable GuC engine reset
  drm/i915/guc: Fix for error capture after full GPU reset with GuC
  drm/i915/guc: Hook GuC scheduling policies up
  drm/i915/guc: Connect reset modparam updates to GuC policy flags
  drm/i915/guc: Include scheduling policies in the debugfs state dump
  drm/i915/guc: Add golden context to GuC ADS
  drm/i915/selftest: Better error reporting from hangcheck selftest
  drm/i915/selftest: Fix hangcheck self test for GuC submission
  drm/i915/selftest: Bump selftest timeouts for hangcheck

Matthew Brost (36):
  drm/i915/guc: Add new GuC interface defines and structures
  drm/i915/guc: Remove GuC stage descriptor, add LRC descriptor
  drm/i915/guc: Add LRC descriptor context lookup array
  drm/i915/guc: Implement GuC submission tasklet
  drm/i915/guc: Add bypass tasklet submission path to GuC
  drm/i915/guc: Implement GuC context operations for new inteface
  drm/i915/guc: Insert fence on context when deregistering
  drm/i915/guc: Defer context unpin until scheduling is disabled
  drm/i915/guc: Disable engine barriers with GuC during unpin
  drm/i915/guc: Extend deregistration fence to schedule disable
  drm/i915: Disable preempt busywait when using GuC scheduling
  drm/i915/guc: Ensure request ordering via completion fences
  drm/i915/guc: Disable semaphores when using GuC scheduling
  drm/i915/guc: Ensure G2H response has space in buffer
  drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  drm/i915/guc: Update GuC debugfs to support new GuC
  drm/i915/guc: Add several request trace points
  drm/i915: Add intel_context tracing
  drm/i915/guc: GuC virtual engines
  drm/i915: Hold reference to intel_context over life of i915_request
  drm/i915/guc: Disable bonding extension with GuC submission
  drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  drm/i915: Add i915_sched_engine destroy vfunc
  drm/i915: Move active request tracking to a vfunc
  drm/i915/guc: Reset implementation for new GuC interface
  drm/i915: Reset GPU immediately if submission is disabled
  drm/i915/guc: Add disable interrupts to guc sanitize
  drm/i915/guc: Suspend/resume implementation for new interface
  drm/i915/guc: Handle context reset notification
  drm/i915/guc: Handle engine reset failure notification
  drm/i915/guc: Enable the timer expired interrupt for GuC
  drm/i915/guc: Capture error state on context reset
  drm/i915/guc: Implement banned contexts for GuC submission
  drm/i915/guc: Support request cancellation
  drm/i915/selftest: Increase some timeouts in live_requests
  drm/i915/guc: Implement GuC priority management

Rahul Kumar Singh (2):
  drm/i915/selftest: Fix workarounds selftest for GuC submission
  drm/i915/selftest: Fix MOCS selftest for GuC submission

 drivers/gpu/drm/i915/Makefile                 |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   21 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   44 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   16 +-
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
 drivers/gpu/drm/i915/gt/intel_context.c       |   50 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |   50 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   63 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |   54 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  182 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   71 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   13 +-
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |    4 +
 .../drm/i915/gt/intel_execlists_submission.c  |   95 +-
 .../drm/i915/gt/intel_execlists_submission.h  |    4 -
 drivers/gpu/drm/i915/gt/intel_gt.c            |   21 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |    6 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   21 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    7 +-
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
 drivers/gpu/drm/i915/gt/intel_reset.c         |   50 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   48 +
 drivers/gpu/drm/i915/gt/intel_rps.c           |    4 +
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
 .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
 drivers/gpu/drm/i915/gt/mock_engine.c         |   40 +-
 drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
 .../drm/i915/gt/selftest_engine_heartbeat.c   |   22 +
 .../drm/i915/gt/selftest_engine_heartbeat.h   |    2 +
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   12 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  314 +-
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |   50 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |  132 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   15 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   82 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  101 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  461 ++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    4 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  132 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   16 +-
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   25 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   88 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2776 +++++++++++++++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   18 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  101 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
 drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
 drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
 drivers/gpu/drm/i915/i915_gpu_error.c         |   25 +-
 drivers/gpu/drm/i915/i915_reg.h               |    2 +
 drivers/gpu/drm/i915/i915_request.c           |  174 +-
 drivers/gpu/drm/i915/i915_request.h           |   29 +
 drivers/gpu/drm/i915/i915_scheduler.c         |   16 +-
 drivers/gpu/drm/i915/i915_scheduler.h         |   10 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |   22 +
 drivers/gpu/drm/i915/i915_trace.h             |  219 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |    4 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |    2 +-
 .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
 .../i915/selftests/intel_scheduler_helpers.c  |   89 +
 .../i915/selftests/intel_scheduler_helpers.h  |   35 +
 .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
 include/uapi/drm/i915_drm.h                   |    9 +
 68 files changed, 5119 insertions(+), 862 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h

-- 
2.28.0


^ permalink raw reply	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 00/51] GuC submission support
@ 2021-07-16 20:16 ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

As discussed in [1], [2] we are enabling GuC submission support in the
i915. This is a subset of the patches in step 5 described in [1],
basically it is absolute to enable CI with GuC submission on gen11+
platforms.

This series itself will likely be broken down into smaller patch sets to
merge. Likely into CTBs changes, basic submission, virtual engines, and
resets.

A following series will address the missing patches remaining from [1].

Locally tested on TGL machine and basic tests seem to be passing.

v2: Address all review comments in [3], include several more patches to
make CI [4] happy.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

[1] https://patchwork.freedesktop.org/series/89844/
[2] https://patchwork.freedesktop.org/series/91417/
[3] https://patchwork.freedesktop.org/series/91840/
[4] https://patchwork.freedesktop.org/series/91885/

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Daniele Ceraolo Spurio (1):
  drm/i915/guc: Unblock GuC submission on Gen11+

John Harrison (12):
  drm/i915: Track 'serial' counts for virtual engines
  drm/i915/guc: Provide mmio list to be saved/restored on engine reset
  drm/i915/guc: Don't complain about reset races
  drm/i915/guc: Enable GuC engine reset
  drm/i915/guc: Fix for error capture after full GPU reset with GuC
  drm/i915/guc: Hook GuC scheduling policies up
  drm/i915/guc: Connect reset modparam updates to GuC policy flags
  drm/i915/guc: Include scheduling policies in the debugfs state dump
  drm/i915/guc: Add golden context to GuC ADS
  drm/i915/selftest: Better error reporting from hangcheck selftest
  drm/i915/selftest: Fix hangcheck self test for GuC submission
  drm/i915/selftest: Bump selftest timeouts for hangcheck

Matthew Brost (36):
  drm/i915/guc: Add new GuC interface defines and structures
  drm/i915/guc: Remove GuC stage descriptor, add LRC descriptor
  drm/i915/guc: Add LRC descriptor context lookup array
  drm/i915/guc: Implement GuC submission tasklet
  drm/i915/guc: Add bypass tasklet submission path to GuC
  drm/i915/guc: Implement GuC context operations for new inteface
  drm/i915/guc: Insert fence on context when deregistering
  drm/i915/guc: Defer context unpin until scheduling is disabled
  drm/i915/guc: Disable engine barriers with GuC during unpin
  drm/i915/guc: Extend deregistration fence to schedule disable
  drm/i915: Disable preempt busywait when using GuC scheduling
  drm/i915/guc: Ensure request ordering via completion fences
  drm/i915/guc: Disable semaphores when using GuC scheduling
  drm/i915/guc: Ensure G2H response has space in buffer
  drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  drm/i915/guc: Update GuC debugfs to support new GuC
  drm/i915/guc: Add several request trace points
  drm/i915: Add intel_context tracing
  drm/i915/guc: GuC virtual engines
  drm/i915: Hold reference to intel_context over life of i915_request
  drm/i915/guc: Disable bonding extension with GuC submission
  drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  drm/i915: Add i915_sched_engine destroy vfunc
  drm/i915: Move active request tracking to a vfunc
  drm/i915/guc: Reset implementation for new GuC interface
  drm/i915: Reset GPU immediately if submission is disabled
  drm/i915/guc: Add disable interrupts to guc sanitize
  drm/i915/guc: Suspend/resume implementation for new interface
  drm/i915/guc: Handle context reset notification
  drm/i915/guc: Handle engine reset failure notification
  drm/i915/guc: Enable the timer expired interrupt for GuC
  drm/i915/guc: Capture error state on context reset
  drm/i915/guc: Implement banned contexts for GuC submission
  drm/i915/guc: Support request cancellation
  drm/i915/selftest: Increase some timeouts in live_requests
  drm/i915/guc: Implement GuC priority management

Rahul Kumar Singh (2):
  drm/i915/selftest: Fix workarounds selftest for GuC submission
  drm/i915/selftest: Fix MOCS selftest for GuC submission

 drivers/gpu/drm/i915/Makefile                 |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   21 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   44 +-
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   16 +-
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
 drivers/gpu/drm/i915/gt/intel_context.c       |   50 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |   50 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   63 +-
 drivers/gpu/drm/i915/gt/intel_engine.h        |   54 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  182 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   71 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   13 +-
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |    4 +
 .../drm/i915/gt/intel_execlists_submission.c  |   95 +-
 .../drm/i915/gt/intel_execlists_submission.h  |    4 -
 drivers/gpu/drm/i915/gt/intel_gt.c            |   21 +
 drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |    6 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   21 +-
 drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    7 +-
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
 drivers/gpu/drm/i915/gt/intel_reset.c         |   50 +-
 .../gpu/drm/i915/gt/intel_ring_submission.c   |   48 +
 drivers/gpu/drm/i915/gt/intel_rps.c           |    4 +
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
 .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
 drivers/gpu/drm/i915/gt/mock_engine.c         |   40 +-
 drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
 .../drm/i915/gt/selftest_engine_heartbeat.c   |   22 +
 .../drm/i915/gt/selftest_engine_heartbeat.h   |    2 +
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |   12 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  314 +-
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |   50 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |  132 +-
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   15 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   82 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  101 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  461 ++-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    4 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  132 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   16 +-
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   25 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   88 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2776 +++++++++++++++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   18 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  101 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
 drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
 drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
 drivers/gpu/drm/i915/i915_gpu_error.c         |   25 +-
 drivers/gpu/drm/i915/i915_reg.h               |    2 +
 drivers/gpu/drm/i915/i915_request.c           |  174 +-
 drivers/gpu/drm/i915/i915_request.h           |   29 +
 drivers/gpu/drm/i915/i915_scheduler.c         |   16 +-
 drivers/gpu/drm/i915/i915_scheduler.h         |   10 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |   22 +
 drivers/gpu/drm/i915/i915_trace.h             |  219 +-
 drivers/gpu/drm/i915/selftests/i915_request.c |    4 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |    2 +-
 .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
 .../i915/selftests/intel_scheduler_helpers.c  |   89 +
 .../i915/selftests/intel_scheduler_helpers.h  |   35 +
 .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
 include/uapi/drm/i915_drm.h                   |    9 +
 68 files changed, 5119 insertions(+), 862 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h

-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* [PATCH 01/51] drm/i915/guc: Add new GuC interface defines and structures
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Add new GuC interface defines and structures while maintaining old ones
in parallel.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 14 +++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 41 +++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 2d6198e63ebe..57e18babdf4b 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -124,10 +124,24 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302,
 	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
 	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
+	INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506,
+	INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000,
+	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001,
+	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,
+	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003,
+	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004,
+	INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005,
+	INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006,
+	INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007,
+	INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008,
+	INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009,
 	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
 	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
+	INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
+	INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503,
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
+	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
 	INTEL_GUC_ACTION_LIMIT
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 617ec601648d..28245a217a39 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -17,6 +17,9 @@
 #include "abi/guc_communication_ctb_abi.h"
 #include "abi/guc_messages_abi.h"
 
+#define GUC_CONTEXT_DISABLE		0
+#define GUC_CONTEXT_ENABLE		1
+
 #define GUC_CLIENT_PRIORITY_KMD_HIGH	0
 #define GUC_CLIENT_PRIORITY_HIGH	1
 #define GUC_CLIENT_PRIORITY_KMD_NORMAL	2
@@ -26,6 +29,9 @@
 #define GUC_MAX_STAGE_DESCRIPTORS	1024
 #define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
 
+#define GUC_MAX_LRC_DESCRIPTORS		65535
+#define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
+
 #define GUC_RENDER_ENGINE		0
 #define GUC_VIDEO_ENGINE		1
 #define GUC_BLITTER_ENGINE		2
@@ -237,6 +243,41 @@ struct guc_stage_desc {
 	u64 desc_private;
 } __packed;
 
+#define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
+
+#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
+#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000
+
+/* Preempt to idle on quantum expiry */
+#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE	BIT(0)
+
+/*
+ * GuC Context registration descriptor.
+ * FIXME: This is only required to exist during context registration.
+ * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC
+ * is not required.
+ */
+struct guc_lrc_desc {
+	u32 hw_context_desc;
+	u32 slpm_perf_mode_hint;	/* SPLC v1 only */
+	u32 slpm_freq_hint;
+	u32 engine_submit_mask;		/* In logical space */
+	u8 engine_class;
+	u8 reserved0[3];
+	u32 priority;
+	u32 process_desc;
+	u32 wq_addr;
+	u32 wq_size;
+	u32 context_flags;		/* CONTEXT_REGISTRATION_* */
+	/* Time for one workload to execute. (in micro seconds) */
+	u32 execution_quantum;
+	/* Time to wait for a preemption request to complete before issuing a
+	 * reset. (in micro seconds). */
+	u32 preemption_timeout;
+	u32 policy_flags;		/* CONTEXT_POLICY_* */
+	u32 reserved1[19];
+} __packed;
+
 #define GUC_POWER_UNSPECIFIED	0
 #define GUC_POWER_D0		1
 #define GUC_POWER_D1		2
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 01/51] drm/i915/guc: Add new GuC interface defines and structures
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add new GuC interface defines and structures while maintaining old ones
in parallel.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  | 14 +++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 41 +++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 2d6198e63ebe..57e18babdf4b 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -124,10 +124,24 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_FORCE_LOG_BUFFER_FLUSH = 0x302,
 	INTEL_GUC_ACTION_ENTER_S_STATE = 0x501,
 	INTEL_GUC_ACTION_EXIT_S_STATE = 0x502,
+	INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE = 0x506,
+	INTEL_GUC_ACTION_SCHED_CONTEXT = 0x1000,
+	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET = 0x1001,
+	INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE = 0x1002,
+	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_SET = 0x1003,
+	INTEL_GUC_ACTION_SCHED_ENGINE_MODE_DONE = 0x1004,
+	INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY = 0x1005,
+	INTEL_GUC_ACTION_SET_CONTEXT_EXECUTION_QUANTUM = 0x1006,
+	INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT = 0x1007,
+	INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION = 0x1008,
+	INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION = 0x1009,
 	INTEL_GUC_ACTION_SLPC_REQUEST = 0x3003,
 	INTEL_GUC_ACTION_AUTHENTICATE_HUC = 0x4000,
+	INTEL_GUC_ACTION_REGISTER_CONTEXT = 0x4502,
+	INTEL_GUC_ACTION_DEREGISTER_CONTEXT = 0x4503,
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
+	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
 	INTEL_GUC_ACTION_LIMIT
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 617ec601648d..28245a217a39 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -17,6 +17,9 @@
 #include "abi/guc_communication_ctb_abi.h"
 #include "abi/guc_messages_abi.h"
 
+#define GUC_CONTEXT_DISABLE		0
+#define GUC_CONTEXT_ENABLE		1
+
 #define GUC_CLIENT_PRIORITY_KMD_HIGH	0
 #define GUC_CLIENT_PRIORITY_HIGH	1
 #define GUC_CLIENT_PRIORITY_KMD_NORMAL	2
@@ -26,6 +29,9 @@
 #define GUC_MAX_STAGE_DESCRIPTORS	1024
 #define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
 
+#define GUC_MAX_LRC_DESCRIPTORS		65535
+#define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
+
 #define GUC_RENDER_ENGINE		0
 #define GUC_VIDEO_ENGINE		1
 #define GUC_BLITTER_ENGINE		2
@@ -237,6 +243,41 @@ struct guc_stage_desc {
 	u64 desc_private;
 } __packed;
 
+#define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
+
+#define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
+#define CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US 500000
+
+/* Preempt to idle on quantum expiry */
+#define CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE	BIT(0)
+
+/*
+ * GuC Context registration descriptor.
+ * FIXME: This is only required to exist during context registration.
+ * The current 1:1 between guc_lrc_desc and LRCs for the lifetime of the LRC
+ * is not required.
+ */
+struct guc_lrc_desc {
+	u32 hw_context_desc;
+	u32 slpm_perf_mode_hint;	/* SPLC v1 only */
+	u32 slpm_freq_hint;
+	u32 engine_submit_mask;		/* In logical space */
+	u8 engine_class;
+	u8 reserved0[3];
+	u32 priority;
+	u32 process_desc;
+	u32 wq_addr;
+	u32 wq_size;
+	u32 context_flags;		/* CONTEXT_REGISTRATION_* */
+	/* Time for one workload to execute. (in micro seconds) */
+	u32 execution_quantum;
+	/* Time to wait for a preemption request to complete before issuing a
+	 * reset. (in micro seconds). */
+	u32 preemption_timeout;
+	u32 policy_flags;		/* CONTEXT_POLICY_* */
+	u32 reserved1[19];
+} __packed;
+
 #define GUC_POWER_UNSPECIFIED	0
 #define GUC_POWER_D0		1
 #define GUC_POWER_D1		2
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 02/51] drm/i915/guc: Remove GuC stage descriptor, add LRC descriptor
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Remove old GuC stage descriptor, add LRC descriptor which will be used
by the new GuC interface implemented in this patch series.

v2:
 (John Harrison)
  - s/lrc/LRC/g

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 65 -----------------
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 72 ++++++-------------
 3 files changed, 25 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 72e4653222e2..2625d2d5959f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -43,8 +43,8 @@ struct intel_guc {
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
 
-	struct i915_vma *stage_desc_pool;
-	void *stage_desc_pool_vaddr;
+	struct i915_vma *lrc_desc_pool;
+	void *lrc_desc_pool_vaddr;
 
 	/* Control params for fw initialization */
 	u32 params[GUC_CTL_MAX_DWORDS];
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 28245a217a39..4e4edc368b77 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -26,9 +26,6 @@
 #define GUC_CLIENT_PRIORITY_NORMAL	3
 #define GUC_CLIENT_PRIORITY_NUM		4
 
-#define GUC_MAX_STAGE_DESCRIPTORS	1024
-#define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
-
 #define GUC_MAX_LRC_DESCRIPTORS		65535
 #define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
 
@@ -181,68 +178,6 @@ struct guc_process_desc {
 	u32 reserved[30];
 } __packed;
 
-/* engine id and context id is packed into guc_execlist_context.context_id*/
-#define GUC_ELC_CTXID_OFFSET		0
-#define GUC_ELC_ENGINE_OFFSET		29
-
-/* The execlist context including software and HW information */
-struct guc_execlist_context {
-	u32 context_desc;
-	u32 context_id;
-	u32 ring_status;
-	u32 ring_lrca;
-	u32 ring_begin;
-	u32 ring_end;
-	u32 ring_next_free_location;
-	u32 ring_current_tail_pointer_value;
-	u8 engine_state_submit_value;
-	u8 engine_state_wait_value;
-	u16 pagefault_count;
-	u16 engine_submit_queue_count;
-} __packed;
-
-/*
- * This structure describes a stage set arranged for a particular communication
- * between uKernel (GuC) and Driver (KMD). Technically, this is known as a
- * "GuC Context descriptor" in the specs, but we use the term "stage descriptor"
- * to avoid confusion with all the other things already named "context" in the
- * driver. A static pool of these descriptors are stored inside a GEM object
- * (stage_desc_pool) which is held for the entire lifetime of our interaction
- * with the GuC, being allocated before the GuC is loaded with its firmware.
- */
-struct guc_stage_desc {
-	u32 sched_common_area;
-	u32 stage_id;
-	u32 pas_id;
-	u8 engines_used;
-	u64 db_trigger_cpu;
-	u32 db_trigger_uk;
-	u64 db_trigger_phy;
-	u16 db_id;
-
-	struct guc_execlist_context lrc[GUC_MAX_ENGINES_NUM];
-
-	u8 attribute;
-
-	u32 priority;
-
-	u32 wq_sampled_tail_offset;
-	u32 wq_total_submit_enqueues;
-
-	u32 process_desc;
-	u32 wq_addr;
-	u32 wq_size;
-
-	u32 engine_presence;
-
-	u8 engine_suspended;
-
-	u8 reserved0[3];
-	u64 reserved1[1];
-
-	u64 desc_private;
-} __packed;
-
 #define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
 
 #define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index e9c237b18692..a366890fb840 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,57 +65,35 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-static struct guc_stage_desc *__get_stage_desc(struct intel_guc *guc, u32 id)
+/* Future patches will use this function */
+__attribute__ ((unused))
+static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
-	struct guc_stage_desc *base = guc->stage_desc_pool_vaddr;
+	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
 
-	return &base[id];
-}
-
-static int guc_stage_desc_pool_create(struct intel_guc *guc)
-{
-	u32 size = PAGE_ALIGN(sizeof(struct guc_stage_desc) *
-			      GUC_MAX_STAGE_DESCRIPTORS);
+	GEM_BUG_ON(index >= GUC_MAX_LRC_DESCRIPTORS);
 
-	return intel_guc_allocate_and_map_vma(guc, size, &guc->stage_desc_pool,
-					      &guc->stage_desc_pool_vaddr);
+	return &base[index];
 }
 
-static void guc_stage_desc_pool_destroy(struct intel_guc *guc)
-{
-	i915_vma_unpin_and_release(&guc->stage_desc_pool, I915_VMA_RELEASE_MAP);
-}
-
-/*
- * Initialise/clear the stage descriptor shared with the GuC firmware.
- *
- * This descriptor tells the GuC where (in GGTT space) to find the important
- * data structures related to work submission (process descriptor, write queue,
- * etc).
- */
-static void guc_stage_desc_init(struct intel_guc *guc)
+static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 {
-	struct guc_stage_desc *desc;
-
-	/* we only use 1 stage desc, so hardcode it to 0 */
-	desc = __get_stage_desc(guc, 0);
-	memset(desc, 0, sizeof(*desc));
-
-	desc->attribute = GUC_STAGE_DESC_ATTR_ACTIVE |
-			  GUC_STAGE_DESC_ATTR_KERNEL;
+	u32 size;
+	int ret;
 
-	desc->stage_id = 0;
-	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) *
+			  GUC_MAX_LRC_DESCRIPTORS);
+	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
+					     (void **)&guc->lrc_desc_pool_vaddr);
+	if (ret)
+		return ret;
 
-	desc->wq_size = GUC_WQ_SIZE;
+	return 0;
 }
 
-static void guc_stage_desc_fini(struct intel_guc *guc)
+static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 {
-	struct guc_stage_desc *desc;
-
-	desc = __get_stage_desc(guc, 0);
-	memset(desc, 0, sizeof(*desc));
+	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
 static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
@@ -410,26 +388,25 @@ int intel_guc_submission_init(struct intel_guc *guc)
 {
 	int ret;
 
-	if (guc->stage_desc_pool)
+	if (guc->lrc_desc_pool)
 		return 0;
 
-	ret = guc_stage_desc_pool_create(guc);
+	ret = guc_lrc_desc_pool_create(guc);
 	if (ret)
 		return ret;
 	/*
 	 * Keep static analysers happy, let them know that we allocated the
 	 * vma after testing that it didn't exist earlier.
 	 */
-	GEM_BUG_ON(!guc->stage_desc_pool);
+	GEM_BUG_ON(!guc->lrc_desc_pool);
 
 	return 0;
 }
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
-	if (guc->stage_desc_pool) {
-		guc_stage_desc_pool_destroy(guc);
-	}
+	if (guc->lrc_desc_pool)
+		guc_lrc_desc_pool_destroy(guc);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -695,7 +672,6 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 void intel_guc_submission_enable(struct intel_guc *guc)
 {
-	guc_stage_desc_init(guc);
 }
 
 void intel_guc_submission_disable(struct intel_guc *guc)
@@ -705,8 +681,6 @@ void intel_guc_submission_disable(struct intel_guc *guc)
 	GEM_BUG_ON(gt->awake); /* GT should be parked first */
 
 	/* Note: By the time we're here, GuC may have already been reset */
-
-	guc_stage_desc_fini(guc);
 }
 
 static bool __guc_submission_selected(struct intel_guc *guc)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 02/51] drm/i915/guc: Remove GuC stage descriptor, add LRC descriptor
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Remove old GuC stage descriptor, add LRC descriptor which will be used
by the new GuC interface implemented in this patch series.

v2:
 (John Harrison)
  - s/lrc/LRC/g

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   | 65 -----------------
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 72 ++++++-------------
 3 files changed, 25 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 72e4653222e2..2625d2d5959f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -43,8 +43,8 @@ struct intel_guc {
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
 
-	struct i915_vma *stage_desc_pool;
-	void *stage_desc_pool_vaddr;
+	struct i915_vma *lrc_desc_pool;
+	void *lrc_desc_pool_vaddr;
 
 	/* Control params for fw initialization */
 	u32 params[GUC_CTL_MAX_DWORDS];
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 28245a217a39..4e4edc368b77 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -26,9 +26,6 @@
 #define GUC_CLIENT_PRIORITY_NORMAL	3
 #define GUC_CLIENT_PRIORITY_NUM		4
 
-#define GUC_MAX_STAGE_DESCRIPTORS	1024
-#define	GUC_INVALID_STAGE_ID		GUC_MAX_STAGE_DESCRIPTORS
-
 #define GUC_MAX_LRC_DESCRIPTORS		65535
 #define	GUC_INVALID_LRC_ID		GUC_MAX_LRC_DESCRIPTORS
 
@@ -181,68 +178,6 @@ struct guc_process_desc {
 	u32 reserved[30];
 } __packed;
 
-/* engine id and context id is packed into guc_execlist_context.context_id*/
-#define GUC_ELC_CTXID_OFFSET		0
-#define GUC_ELC_ENGINE_OFFSET		29
-
-/* The execlist context including software and HW information */
-struct guc_execlist_context {
-	u32 context_desc;
-	u32 context_id;
-	u32 ring_status;
-	u32 ring_lrca;
-	u32 ring_begin;
-	u32 ring_end;
-	u32 ring_next_free_location;
-	u32 ring_current_tail_pointer_value;
-	u8 engine_state_submit_value;
-	u8 engine_state_wait_value;
-	u16 pagefault_count;
-	u16 engine_submit_queue_count;
-} __packed;
-
-/*
- * This structure describes a stage set arranged for a particular communication
- * between uKernel (GuC) and Driver (KMD). Technically, this is known as a
- * "GuC Context descriptor" in the specs, but we use the term "stage descriptor"
- * to avoid confusion with all the other things already named "context" in the
- * driver. A static pool of these descriptors are stored inside a GEM object
- * (stage_desc_pool) which is held for the entire lifetime of our interaction
- * with the GuC, being allocated before the GuC is loaded with its firmware.
- */
-struct guc_stage_desc {
-	u32 sched_common_area;
-	u32 stage_id;
-	u32 pas_id;
-	u8 engines_used;
-	u64 db_trigger_cpu;
-	u32 db_trigger_uk;
-	u64 db_trigger_phy;
-	u16 db_id;
-
-	struct guc_execlist_context lrc[GUC_MAX_ENGINES_NUM];
-
-	u8 attribute;
-
-	u32 priority;
-
-	u32 wq_sampled_tail_offset;
-	u32 wq_total_submit_enqueues;
-
-	u32 process_desc;
-	u32 wq_addr;
-	u32 wq_size;
-
-	u32 engine_presence;
-
-	u8 engine_suspended;
-
-	u8 reserved0[3];
-	u64 reserved1[1];
-
-	u64 desc_private;
-} __packed;
-
 #define CONTEXT_REGISTRATION_FLAG_KMD	BIT(0)
 
 #define CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US 1000000
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index e9c237b18692..a366890fb840 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,57 +65,35 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-static struct guc_stage_desc *__get_stage_desc(struct intel_guc *guc, u32 id)
+/* Future patches will use this function */
+__attribute__ ((unused))
+static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
-	struct guc_stage_desc *base = guc->stage_desc_pool_vaddr;
+	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
 
-	return &base[id];
-}
-
-static int guc_stage_desc_pool_create(struct intel_guc *guc)
-{
-	u32 size = PAGE_ALIGN(sizeof(struct guc_stage_desc) *
-			      GUC_MAX_STAGE_DESCRIPTORS);
+	GEM_BUG_ON(index >= GUC_MAX_LRC_DESCRIPTORS);
 
-	return intel_guc_allocate_and_map_vma(guc, size, &guc->stage_desc_pool,
-					      &guc->stage_desc_pool_vaddr);
+	return &base[index];
 }
 
-static void guc_stage_desc_pool_destroy(struct intel_guc *guc)
-{
-	i915_vma_unpin_and_release(&guc->stage_desc_pool, I915_VMA_RELEASE_MAP);
-}
-
-/*
- * Initialise/clear the stage descriptor shared with the GuC firmware.
- *
- * This descriptor tells the GuC where (in GGTT space) to find the important
- * data structures related to work submission (process descriptor, write queue,
- * etc).
- */
-static void guc_stage_desc_init(struct intel_guc *guc)
+static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 {
-	struct guc_stage_desc *desc;
-
-	/* we only use 1 stage desc, so hardcode it to 0 */
-	desc = __get_stage_desc(guc, 0);
-	memset(desc, 0, sizeof(*desc));
-
-	desc->attribute = GUC_STAGE_DESC_ATTR_ACTIVE |
-			  GUC_STAGE_DESC_ATTR_KERNEL;
+	u32 size;
+	int ret;
 
-	desc->stage_id = 0;
-	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	size = PAGE_ALIGN(sizeof(struct guc_lrc_desc) *
+			  GUC_MAX_LRC_DESCRIPTORS);
+	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->lrc_desc_pool,
+					     (void **)&guc->lrc_desc_pool_vaddr);
+	if (ret)
+		return ret;
 
-	desc->wq_size = GUC_WQ_SIZE;
+	return 0;
 }
 
-static void guc_stage_desc_fini(struct intel_guc *guc)
+static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 {
-	struct guc_stage_desc *desc;
-
-	desc = __get_stage_desc(guc, 0);
-	memset(desc, 0, sizeof(*desc));
+	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
 static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
@@ -410,26 +388,25 @@ int intel_guc_submission_init(struct intel_guc *guc)
 {
 	int ret;
 
-	if (guc->stage_desc_pool)
+	if (guc->lrc_desc_pool)
 		return 0;
 
-	ret = guc_stage_desc_pool_create(guc);
+	ret = guc_lrc_desc_pool_create(guc);
 	if (ret)
 		return ret;
 	/*
 	 * Keep static analysers happy, let them know that we allocated the
 	 * vma after testing that it didn't exist earlier.
 	 */
-	GEM_BUG_ON(!guc->stage_desc_pool);
+	GEM_BUG_ON(!guc->lrc_desc_pool);
 
 	return 0;
 }
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
-	if (guc->stage_desc_pool) {
-		guc_stage_desc_pool_destroy(guc);
-	}
+	if (guc->lrc_desc_pool)
+		guc_lrc_desc_pool_destroy(guc);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -695,7 +672,6 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 void intel_guc_submission_enable(struct intel_guc *guc)
 {
-	guc_stage_desc_init(guc);
 }
 
 void intel_guc_submission_disable(struct intel_guc *guc)
@@ -705,8 +681,6 @@ void intel_guc_submission_disable(struct intel_guc *guc)
 	GEM_BUG_ON(gt->awake); /* GT should be parked first */
 
 	/* Note: By the time we're here, GuC may have already been reset */
-
-	guc_stage_desc_fini(guc);
 }
 
 static bool __guc_submission_selected(struct intel_guc *guc)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 03/51] drm/i915/guc: Add LRC descriptor context lookup array
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Add LRC descriptor context lookup array which can resolve the
intel_context from the LRC descriptor index. In addition to lookup, it
can determine if the LRC descriptor context is currently registered with
the GuC by checking if an entry for a descriptor index is present.
Future patches in the series will make use of this array.

v2:
 (Michal)
  - "linux/xarray.h" -> <linux/xarray.h>
  - s/lrc/LRC
 (John H)
  - Fix commit message

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 2625d2d5959f..35783558d261 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -6,6 +6,8 @@
 #ifndef _INTEL_GUC_H_
 #define _INTEL_GUC_H_
 
+#include <linux/xarray.h>
+
 #include "intel_uncore.h"
 #include "intel_guc_fw.h"
 #include "intel_guc_fwif.h"
@@ -46,6 +48,9 @@ struct intel_guc {
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
 
+	/* guc_id to intel_context lookup */
+	struct xarray context_lookup;
+
 	/* Control params for fw initialization */
 	u32 params[GUC_CTL_MAX_DWORDS];
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a366890fb840..23a94a896a0b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-/* Future patches will use this function */
-__attribute__ ((unused))
 static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
 	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
@@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 	return &base[index];
 }
 
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
+{
+	struct intel_context *ce = xa_load(&guc->context_lookup, id);
+
+	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
+
+	return ce;
+}
+
 static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 {
 	u32 size;
@@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
+static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
+{
+	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+
+	memset(desc, 0, sizeof(*desc));
+	xa_erase_irq(&guc->context_lookup, id);
+}
+
+static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
+{
+	return __get_context(guc, id);
+}
+
+static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
+					   struct intel_context *ce)
+{
+	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+}
+
 static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	/* Leaving stub as this function will be used in future patches */
@@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	 */
 	GEM_BUG_ON(!guc->lrc_desc_pool);
 
+	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
+
 	return 0;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 03/51] drm/i915/guc: Add LRC descriptor context lookup array
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add LRC descriptor context lookup array which can resolve the
intel_context from the LRC descriptor index. In addition to lookup, it
can determine if the LRC descriptor context is currently registered with
the GuC by checking if an entry for a descriptor index is present.
Future patches in the series will make use of this array.

v2:
 (Michal)
  - "linux/xarray.h" -> <linux/xarray.h>
  - s/lrc/LRC
 (John H)
  - Fix commit message

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  5 +++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 32 +++++++++++++++++--
 2 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 2625d2d5959f..35783558d261 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -6,6 +6,8 @@
 #ifndef _INTEL_GUC_H_
 #define _INTEL_GUC_H_
 
+#include <linux/xarray.h>
+
 #include "intel_uncore.h"
 #include "intel_guc_fw.h"
 #include "intel_guc_fwif.h"
@@ -46,6 +48,9 @@ struct intel_guc {
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
 
+	/* guc_id to intel_context lookup */
+	struct xarray context_lookup;
+
 	/* Control params for fw initialization */
 	u32 params[GUC_CTL_MAX_DWORDS];
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a366890fb840..23a94a896a0b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -65,8 +65,6 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-/* Future patches will use this function */
-__attribute__ ((unused))
 static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 {
 	struct guc_lrc_desc *base = guc->lrc_desc_pool_vaddr;
@@ -76,6 +74,15 @@ static struct guc_lrc_desc *__get_lrc_desc(struct intel_guc *guc, u32 index)
 	return &base[index];
 }
 
+static inline struct intel_context *__get_context(struct intel_guc *guc, u32 id)
+{
+	struct intel_context *ce = xa_load(&guc->context_lookup, id);
+
+	GEM_BUG_ON(id >= GUC_MAX_LRC_DESCRIPTORS);
+
+	return ce;
+}
+
 static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 {
 	u32 size;
@@ -96,6 +103,25 @@ static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
+static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
+{
+	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+
+	memset(desc, 0, sizeof(*desc));
+	xa_erase_irq(&guc->context_lookup, id);
+}
+
+static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
+{
+	return __get_context(guc, id);
+}
+
+static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
+					   struct intel_context *ce)
+{
+	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+}
+
 static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	/* Leaving stub as this function will be used in future patches */
@@ -400,6 +426,8 @@ int intel_guc_submission_init(struct intel_guc *guc)
 	 */
 	GEM_BUG_ON(!guc->lrc_desc_pool);
 
+	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
+
 	return 0;
 }
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet submits is used for the submission path.

Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.

In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.

v2:
 (John Harrison)
  - Clean up some comments

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++---------
 3 files changed, 127 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 90026c177105..6d99631d19b9 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -137,6 +137,15 @@ struct intel_context {
 	struct intel_sseu sseu;
 
 	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
+
+	/* GuC scheduling state flags that do not require a lock. */
+	atomic_t guc_sched_state_no_lock;
+
+	/*
+	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
+	 * in the series will.
+	 */
+	u16 guc_id;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 35783558d261..8c7b92f699f1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -30,6 +30,10 @@ struct intel_guc {
 	struct intel_guc_log log;
 	struct intel_guc_ct ct;
 
+	/* Global engine used to submit requests to GuC */
+	struct i915_sched_engine *sched_engine;
+	struct i915_request *stalled_request;
+
 	/* intel_guc_recv interrupt related state */
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a94a896a0b..ca0717166a27 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,31 @@
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
+/*
+ * Below is a set of functions which control the GuC scheduling state which do
+ * not require a lock as all state transitions are mutually exclusive. i.e. It
+ * is not possible for the context pinning code and submission, for the same
+ * context, to be executing simultaneously. We still need an atomic as it is
+ * possible for some of the bits to changing at the same time though.
+ */
+#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
+static inline bool context_enabled(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_ENABLED);
+}
+
+static inline void set_context_enabled(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_enabled(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
 }
 
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
-	/* Leaving stub as this function will be used in future patches */
-}
+	int err;
+	struct intel_context *ce = rq->context;
+	u32 action[3];
+	int len = 0;
+	bool enabled = context_enabled(ce);
 
-/*
- * When we're doing submissions using regular execlists backend, writing to
- * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- * pinned in mappable aperture portion of GGTT are visible to command streamer.
- * Writes done by GuC on our behalf are not guaranteeing such ordering,
- * therefore, to ensure the flush, we're issuing a POSTING READ.
- */
-static void flush_ggtt_writes(struct i915_vma *vma)
-{
-	if (i915_vma_is_map_and_fenceable(vma))
-		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
-					     GUC_STATUS);
-}
+	if (!enabled) {
+		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
+		action[len++] = ce->guc_id;
+		action[len++] = GUC_CONTEXT_ENABLE;
+	} else {
+		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
+		action[len++] = ce->guc_id;
+	}
 
-static void guc_submit(struct intel_engine_cs *engine,
-		       struct i915_request **out,
-		       struct i915_request **end)
-{
-	struct intel_guc *guc = &engine->gt->uc.guc;
+	err = intel_guc_send_nb(guc, action, len);
 
-	do {
-		struct i915_request *rq = *out++;
+	if (!enabled && !err)
+		set_context_enabled(ce);
 
-		flush_ggtt_writes(rq->ring->vma);
-		guc_add_request(guc, rq);
-	} while (out != end);
+	return err;
 }
 
 static inline int rq_prio(const struct i915_request *rq)
@@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
-static struct i915_request *schedule_in(struct i915_request *rq, int idx)
+static int guc_dequeue_one_context(struct intel_guc *guc)
 {
-	trace_i915_request_in(rq, idx);
-
-	/*
-	 * Currently we are not tracking the rq->context being inflight
-	 * (ce->inflight = rq->engine). It is only used by the execlists
-	 * backend at the moment, a similar counting strategy would be
-	 * required if we generalise the inflight tracking.
-	 */
-
-	__intel_gt_pm_get(rq->engine->gt);
-	return i915_request_get(rq);
-}
-
-static void schedule_out(struct i915_request *rq)
-{
-	trace_i915_request_out(rq);
-
-	intel_gt_pm_put_async(rq->engine->gt);
-	i915_request_put(rq);
-}
-
-static void __guc_dequeue(struct intel_engine_cs *engine)
-{
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
-	struct i915_request **first = execlists->inflight;
-	struct i915_request ** const last_port = first + execlists->port_mask;
-	struct i915_request *last = first[0];
-	struct i915_request **port;
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	struct i915_request *last = NULL;
 	bool submit = false;
 	struct rb_node *rb;
+	int ret;
 
 	lockdep_assert_held(&sched_engine->lock);
 
-	if (last) {
-		if (*++first)
-			return;
-
-		last = NULL;
+	if (guc->stalled_request) {
+		submit = true;
+		last = guc->stalled_request;
+		goto resubmit;
 	}
 
-	/*
-	 * We write directly into the execlists->inflight queue and don't use
-	 * the execlists->pending queue, as we don't have a distinct switch
-	 * event.
-	 */
-	port = first;
 	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
 
 		priolist_for_each_request_consume(rq, rn, p) {
-			if (last && rq->context != last->context) {
-				if (port == last_port)
-					goto done;
-
-				*port = schedule_in(last,
-						    port - execlists->inflight);
-				port++;
-			}
+			if (last && rq->context != last->context)
+				goto done;
 
 			list_del_init(&rq->sched.link);
+
 			__i915_request_submit(rq);
-			submit = true;
+
+			trace_i915_request_in(rq, 0);
 			last = rq;
+			submit = true;
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
 		i915_priolist_free(p);
 	}
 done:
-	sched_engine->queue_priority_hint =
-		rb ? to_priolist(rb)->priority : INT_MIN;
 	if (submit) {
-		*port = schedule_in(last, port - execlists->inflight);
-		*++port = NULL;
-		guc_submit(engine, first, port);
+		last->context->lrc_reg_state[CTX_RING_TAIL] =
+			intel_ring_set_tail(last->ring, last->tail);
+resubmit:
+		/*
+		 * We only check for -EBUSY here even though it is possible for
+		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
+		 * died and a full GT reset needs to be done. The hangcheck will
+		 * eventually detect that the GuC has died and trigger this
+		 * reset so no need to handle -EDEADLK here.
+		 */
+		ret = guc_add_request(guc, last);
+		if (ret == -EBUSY) {
+			tasklet_schedule(&sched_engine->tasklet);
+			guc->stalled_request = last;
+			return false;
+		}
 	}
-	execlists->active = execlists->inflight;
+
+	guc->stalled_request = NULL;
+	return submit;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
 {
 	struct i915_sched_engine *sched_engine =
 		from_tasklet(sched_engine, t, tasklet);
-	struct intel_engine_cs * const engine = sched_engine->private_data;
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request **port, *rq;
 	unsigned long flags;
+	bool loop;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-
-	for (port = execlists->inflight; (rq = *port); port++) {
-		if (!i915_request_completed(rq))
-			break;
-
-		schedule_out(rq);
-	}
-	if (port != execlists->inflight) {
-		int idx = port - execlists->inflight;
-		int rem = ARRAY_SIZE(execlists->inflight) - idx;
-		memmove(execlists->inflight, port, rem * sizeof(*port));
-	}
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	__guc_dequeue(engine);
+	do {
+		loop = guc_dequeue_one_context(sched_engine->private_data);
+	} while (loop);
 
-	i915_sched_engine_reset_on_empty(engine->sched_engine);
+	i915_sched_engine_reset_on_empty(sched_engine);
 
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
 static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 {
-	if (iir & GT_RENDER_USER_INTERRUPT) {
+	if (iir & GT_RENDER_USER_INTERRUPT)
 		intel_engine_signal_breadcrumbs(engine);
-		tasklet_hi_schedule(&engine->sched_engine->tasklet);
-	}
 }
 
 static void guc_reset_prepare(struct intel_engine_cs *engine)
@@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	struct rb_node *rb;
 	unsigned long flags;
 
+	/* Can be called during boot if GuC fails to load */
+	if (!engine->gt)
+		return;
+
 	ENGINE_TRACE(engine, "\n");
 
 	/*
@@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
-	if (guc->lrc_desc_pool)
-		guc_lrc_desc_pool_destroy(guc);
+	if (!guc->lrc_desc_pool)
+		return;
+
+	guc_lrc_desc_pool_destroy(guc);
+	i915_sched_engine_put(guc->sched_engine);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request)
 	return 0;
 }
 
-static inline void queue_request(struct intel_engine_cs *engine,
+static inline void queue_request(struct i915_sched_engine *sched_engine,
 				 struct i915_request *rq,
 				 int prio)
 {
 	GEM_BUG_ON(!list_empty(&rq->sched.link));
 	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(engine->sched_engine, prio));
+		      i915_sched_lookup_priolist(sched_engine, prio));
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
 static void guc_submit_request(struct i915_request *rq)
 {
-	struct intel_engine_cs *engine = rq->engine;
+	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	queue_request(engine, rq, rq_prio(rq));
+	queue_request(sched_engine, rq, rq_prio(rq));
 
-	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
+	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
 	GEM_BUG_ON(list_empty(&rq->sched.link));
 
-	tasklet_hi_schedule(&engine->sched_engine->tasklet);
+	tasklet_hi_schedule(&sched_engine->tasklet);
 
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
 static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine)
 {
 	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
 
-	tasklet_kill(&engine->sched_engine->tasklet);
-
 	intel_engine_cleanup_common(engine);
 	lrc_fini_wa_ctx(engine);
 }
@@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
 int intel_guc_submission_setup(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
+	struct intel_guc *guc = &engine->gt->uc.guc;
 
 	/*
 	 * The setup relies on several assumptions (e.g. irqs always enabled)
@@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	 */
 	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
 
-	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
+	if (!guc->sched_engine) {
+		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
+		if (!guc->sched_engine)
+			return -ENOMEM;
+
+		guc->sched_engine->schedule = i915_schedule;
+		guc->sched_engine->private_data = guc;
+		tasklet_setup(&guc->sched_engine->tasklet,
+			      guc_submission_tasklet);
+	}
+	i915_sched_engine_put(engine->sched_engine);
+	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet submits is used for the submission path.

Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.

In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.

v2:
 (John Harrison)
  - Clean up some comments

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++---------
 3 files changed, 127 insertions(+), 117 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 90026c177105..6d99631d19b9 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -137,6 +137,15 @@ struct intel_context {
 	struct intel_sseu sseu;
 
 	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
+
+	/* GuC scheduling state flags that do not require a lock. */
+	atomic_t guc_sched_state_no_lock;
+
+	/*
+	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
+	 * in the series will.
+	 */
+	u16 guc_id;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 35783558d261..8c7b92f699f1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -30,6 +30,10 @@ struct intel_guc {
 	struct intel_guc_log log;
 	struct intel_guc_ct ct;
 
+	/* Global engine used to submit requests to GuC */
+	struct i915_sched_engine *sched_engine;
+	struct i915_request *stalled_request;
+
 	/* intel_guc_recv interrupt related state */
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 23a94a896a0b..ca0717166a27 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,31 @@
 
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
+/*
+ * Below is a set of functions which control the GuC scheduling state which do
+ * not require a lock as all state transitions are mutually exclusive. i.e. It
+ * is not possible for the context pinning code and submission, for the same
+ * context, to be executing simultaneously. We still need an atomic as it is
+ * possible for some of the bits to changing at the same time though.
+ */
+#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
+static inline bool context_enabled(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_ENABLED);
+}
+
+static inline void set_context_enabled(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_enabled(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
 }
 
-static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
+static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
-	/* Leaving stub as this function will be used in future patches */
-}
+	int err;
+	struct intel_context *ce = rq->context;
+	u32 action[3];
+	int len = 0;
+	bool enabled = context_enabled(ce);
 
-/*
- * When we're doing submissions using regular execlists backend, writing to
- * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
- * pinned in mappable aperture portion of GGTT are visible to command streamer.
- * Writes done by GuC on our behalf are not guaranteeing such ordering,
- * therefore, to ensure the flush, we're issuing a POSTING READ.
- */
-static void flush_ggtt_writes(struct i915_vma *vma)
-{
-	if (i915_vma_is_map_and_fenceable(vma))
-		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
-					     GUC_STATUS);
-}
+	if (!enabled) {
+		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
+		action[len++] = ce->guc_id;
+		action[len++] = GUC_CONTEXT_ENABLE;
+	} else {
+		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
+		action[len++] = ce->guc_id;
+	}
 
-static void guc_submit(struct intel_engine_cs *engine,
-		       struct i915_request **out,
-		       struct i915_request **end)
-{
-	struct intel_guc *guc = &engine->gt->uc.guc;
+	err = intel_guc_send_nb(guc, action, len);
 
-	do {
-		struct i915_request *rq = *out++;
+	if (!enabled && !err)
+		set_context_enabled(ce);
 
-		flush_ggtt_writes(rq->ring->vma);
-		guc_add_request(guc, rq);
-	} while (out != end);
+	return err;
 }
 
 static inline int rq_prio(const struct i915_request *rq)
@@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
-static struct i915_request *schedule_in(struct i915_request *rq, int idx)
+static int guc_dequeue_one_context(struct intel_guc *guc)
 {
-	trace_i915_request_in(rq, idx);
-
-	/*
-	 * Currently we are not tracking the rq->context being inflight
-	 * (ce->inflight = rq->engine). It is only used by the execlists
-	 * backend at the moment, a similar counting strategy would be
-	 * required if we generalise the inflight tracking.
-	 */
-
-	__intel_gt_pm_get(rq->engine->gt);
-	return i915_request_get(rq);
-}
-
-static void schedule_out(struct i915_request *rq)
-{
-	trace_i915_request_out(rq);
-
-	intel_gt_pm_put_async(rq->engine->gt);
-	i915_request_put(rq);
-}
-
-static void __guc_dequeue(struct intel_engine_cs *engine)
-{
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
-	struct i915_request **first = execlists->inflight;
-	struct i915_request ** const last_port = first + execlists->port_mask;
-	struct i915_request *last = first[0];
-	struct i915_request **port;
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	struct i915_request *last = NULL;
 	bool submit = false;
 	struct rb_node *rb;
+	int ret;
 
 	lockdep_assert_held(&sched_engine->lock);
 
-	if (last) {
-		if (*++first)
-			return;
-
-		last = NULL;
+	if (guc->stalled_request) {
+		submit = true;
+		last = guc->stalled_request;
+		goto resubmit;
 	}
 
-	/*
-	 * We write directly into the execlists->inflight queue and don't use
-	 * the execlists->pending queue, as we don't have a distinct switch
-	 * event.
-	 */
-	port = first;
 	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
 
 		priolist_for_each_request_consume(rq, rn, p) {
-			if (last && rq->context != last->context) {
-				if (port == last_port)
-					goto done;
-
-				*port = schedule_in(last,
-						    port - execlists->inflight);
-				port++;
-			}
+			if (last && rq->context != last->context)
+				goto done;
 
 			list_del_init(&rq->sched.link);
+
 			__i915_request_submit(rq);
-			submit = true;
+
+			trace_i915_request_in(rq, 0);
 			last = rq;
+			submit = true;
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
 		i915_priolist_free(p);
 	}
 done:
-	sched_engine->queue_priority_hint =
-		rb ? to_priolist(rb)->priority : INT_MIN;
 	if (submit) {
-		*port = schedule_in(last, port - execlists->inflight);
-		*++port = NULL;
-		guc_submit(engine, first, port);
+		last->context->lrc_reg_state[CTX_RING_TAIL] =
+			intel_ring_set_tail(last->ring, last->tail);
+resubmit:
+		/*
+		 * We only check for -EBUSY here even though it is possible for
+		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
+		 * died and a full GT reset needs to be done. The hangcheck will
+		 * eventually detect that the GuC has died and trigger this
+		 * reset so no need to handle -EDEADLK here.
+		 */
+		ret = guc_add_request(guc, last);
+		if (ret == -EBUSY) {
+			tasklet_schedule(&sched_engine->tasklet);
+			guc->stalled_request = last;
+			return false;
+		}
 	}
-	execlists->active = execlists->inflight;
+
+	guc->stalled_request = NULL;
+	return submit;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
 {
 	struct i915_sched_engine *sched_engine =
 		from_tasklet(sched_engine, t, tasklet);
-	struct intel_engine_cs * const engine = sched_engine->private_data;
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request **port, *rq;
 	unsigned long flags;
+	bool loop;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-
-	for (port = execlists->inflight; (rq = *port); port++) {
-		if (!i915_request_completed(rq))
-			break;
-
-		schedule_out(rq);
-	}
-	if (port != execlists->inflight) {
-		int idx = port - execlists->inflight;
-		int rem = ARRAY_SIZE(execlists->inflight) - idx;
-		memmove(execlists->inflight, port, rem * sizeof(*port));
-	}
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	__guc_dequeue(engine);
+	do {
+		loop = guc_dequeue_one_context(sched_engine->private_data);
+	} while (loop);
 
-	i915_sched_engine_reset_on_empty(engine->sched_engine);
+	i915_sched_engine_reset_on_empty(sched_engine);
 
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
 static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 {
-	if (iir & GT_RENDER_USER_INTERRUPT) {
+	if (iir & GT_RENDER_USER_INTERRUPT)
 		intel_engine_signal_breadcrumbs(engine);
-		tasklet_hi_schedule(&engine->sched_engine->tasklet);
-	}
 }
 
 static void guc_reset_prepare(struct intel_engine_cs *engine)
@@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	struct rb_node *rb;
 	unsigned long flags;
 
+	/* Can be called during boot if GuC fails to load */
+	if (!engine->gt)
+		return;
+
 	ENGINE_TRACE(engine, "\n");
 
 	/*
@@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
 
 void intel_guc_submission_fini(struct intel_guc *guc)
 {
-	if (guc->lrc_desc_pool)
-		guc_lrc_desc_pool_destroy(guc);
+	if (!guc->lrc_desc_pool)
+		return;
+
+	guc_lrc_desc_pool_destroy(guc);
+	i915_sched_engine_put(guc->sched_engine);
 }
 
 static int guc_context_alloc(struct intel_context *ce)
@@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request)
 	return 0;
 }
 
-static inline void queue_request(struct intel_engine_cs *engine,
+static inline void queue_request(struct i915_sched_engine *sched_engine,
 				 struct i915_request *rq,
 				 int prio)
 {
 	GEM_BUG_ON(!list_empty(&rq->sched.link));
 	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(engine->sched_engine, prio));
+		      i915_sched_lookup_priolist(sched_engine, prio));
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
 static void guc_submit_request(struct i915_request *rq)
 {
-	struct intel_engine_cs *engine = rq->engine;
+	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
+	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	queue_request(engine, rq, rq_prio(rq));
+	queue_request(sched_engine, rq, rq_prio(rq));
 
-	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
+	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
 	GEM_BUG_ON(list_empty(&rq->sched.link));
 
-	tasklet_hi_schedule(&engine->sched_engine->tasklet);
+	tasklet_hi_schedule(&sched_engine->tasklet);
 
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
 static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine)
 {
 	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
 
-	tasklet_kill(&engine->sched_engine->tasklet);
-
 	intel_engine_cleanup_common(engine);
 	lrc_fini_wa_ctx(engine);
 }
@@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
 int intel_guc_submission_setup(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
+	struct intel_guc *guc = &engine->gt->uc.guc;
 
 	/*
 	 * The setup relies on several assumptions (e.g. irqs always enabled)
@@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 	 */
 	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
 
-	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
+	if (!guc->sched_engine) {
+		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
+		if (!guc->sched_engine)
+			return -ENOMEM;
+
+		guc->sched_engine->schedule = i915_schedule;
+		guc->sched_engine->private_data = guc;
+		tasklet_setup(&guc->sched_engine->tasklet,
+			      guc_submission_tasklet);
+	}
+	i915_sched_engine_put(engine->sched_engine);
+	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 05/51] drm/i915/guc: Add bypass tasklet submission path to GuC
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Add bypass tasklet submission path to GuC. The tasklet is only used if H2G
channel has backpresure.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++----
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ca0717166a27..53b4a5eb4a85 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -172,6 +172,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	return err;
 }
 
+static inline void guc_set_lrc_tail(struct i915_request *rq)
+{
+	rq->context->lrc_reg_state[CTX_RING_TAIL] =
+		intel_ring_set_tail(rq->ring, rq->tail);
+}
+
 static inline int rq_prio(const struct i915_request *rq)
 {
 	return rq->sched.attr.priority;
@@ -215,8 +221,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	}
 done:
 	if (submit) {
-		last->context->lrc_reg_state[CTX_RING_TAIL] =
-			intel_ring_set_tail(last->ring, last->tail);
+		guc_set_lrc_tail(last);
 resubmit:
 		/*
 		 * We only check for -EBUSY here even though it is possible for
@@ -496,20 +501,36 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+				     struct i915_request *rq)
+{
+	int ret;
+
+	__i915_request_submit(rq);
+
+	trace_i915_request_in(rq, 0);
+
+	guc_set_lrc_tail(rq);
+	ret = guc_add_request(guc, rq);
+	if (ret == -EBUSY)
+		guc->stalled_request = rq;
+
+	return ret;
+}
+
 static void guc_submit_request(struct i915_request *rq)
 {
 	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+	struct intel_guc *guc = &rq->engine->gt->uc.guc;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	queue_request(sched_engine, rq, rq_prio(rq));
-
-	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
-	GEM_BUG_ON(list_empty(&rq->sched.link));
-
-	tasklet_hi_schedule(&sched_engine->tasklet);
+	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+		queue_request(sched_engine, rq, rq_prio(rq));
+	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
+		tasklet_hi_schedule(&sched_engine->tasklet);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 05/51] drm/i915/guc: Add bypass tasklet submission path to GuC
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add bypass tasklet submission path to GuC. The tasklet is only used if H2G
channel has backpresure.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 37 +++++++++++++++----
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index ca0717166a27..53b4a5eb4a85 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -172,6 +172,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	return err;
 }
 
+static inline void guc_set_lrc_tail(struct i915_request *rq)
+{
+	rq->context->lrc_reg_state[CTX_RING_TAIL] =
+		intel_ring_set_tail(rq->ring, rq->tail);
+}
+
 static inline int rq_prio(const struct i915_request *rq)
 {
 	return rq->sched.attr.priority;
@@ -215,8 +221,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	}
 done:
 	if (submit) {
-		last->context->lrc_reg_state[CTX_RING_TAIL] =
-			intel_ring_set_tail(last->ring, last->tail);
+		guc_set_lrc_tail(last);
 resubmit:
 		/*
 		 * We only check for -EBUSY here even though it is possible for
@@ -496,20 +501,36 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+				     struct i915_request *rq)
+{
+	int ret;
+
+	__i915_request_submit(rq);
+
+	trace_i915_request_in(rq, 0);
+
+	guc_set_lrc_tail(rq);
+	ret = guc_add_request(guc, rq);
+	if (ret == -EBUSY)
+		guc->stalled_request = rq;
+
+	return ret;
+}
+
 static void guc_submit_request(struct i915_request *rq)
 {
 	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+	struct intel_guc *guc = &rq->engine->gt->uc.guc;
 	unsigned long flags;
 
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	queue_request(sched_engine, rq, rq_prio(rq));
-
-	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
-	GEM_BUG_ON(list_empty(&rq->sched.link));
-
-	tasklet_hi_schedule(&sched_engine->tasklet);
+	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+		queue_request(sched_engine, rq, rq_prio(rq));
+	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
+		tasklet_hi_schedule(&sched_engine->tasklet);
 
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.

v2:
 (Daniel Vetter)
  - Use msleep_interruptible rather than cond_resched in busy loop
 (Michal)
  - Remove C++ style comment

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h               |   1 +
 drivers/gpu/drm/i915/i915_request.c           |   1 +
 8 files changed, 685 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index bd63813c8a80..32fd6647154b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 
 	mutex_init(&ce->pin_mutex);
 
+	spin_lock_init(&ce->guc_state.lock);
+
+	ce->guc_id = GUC_INVALID_LRC_ID;
+	INIT_LIST_HEAD(&ce->guc_id_link);
+
 	i915_active_init(&ce->active,
 			 __intel_context_active, __intel_context_retire, 0);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 6d99631d19b9..606c480aec26 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -96,6 +96,7 @@ struct intel_context {
 #define CONTEXT_BANNED			6
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
 #define CONTEXT_NOPREEMPT		8
+#define CONTEXT_LRCA_DIRTY		9
 
 	struct {
 		u64 timeout_us;
@@ -138,14 +139,29 @@ struct intel_context {
 
 	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
 
+	struct {
+		/** lock: protects everything in guc_state */
+		spinlock_t lock;
+		/**
+		 * sched_state: scheduling state of this context using GuC
+		 * submission
+		 */
+		u8 sched_state;
+	} guc_state;
+
 	/* GuC scheduling state flags that do not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
+	/* GuC LRC descriptor ID */
+	u16 guc_id;
+
+	/* GuC LRC descriptor reference count */
+	atomic_t guc_id_ref;
+
 	/*
-	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
-	 * in the series will.
+	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
-	u16 guc_id;
+	struct list_head guc_id_link;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
index 41e5350a7a05..49d4857ad9b7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
@@ -87,7 +87,6 @@
 #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
 
 #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
-#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
 #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
 /* in Gen12 ID 0x7FF is reserved to indicate idle */
 #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 8c7b92f699f1..30773cd699f5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -7,6 +7,7 @@
 #define _INTEL_GUC_H_
 
 #include <linux/xarray.h>
+#include <linux/delay.h>
 
 #include "intel_uncore.h"
 #include "intel_guc_fw.h"
@@ -44,6 +45,14 @@ struct intel_guc {
 		void (*disable)(struct intel_guc *guc);
 	} interrupts;
 
+	/*
+	 * contexts_lock protects the pool of free guc ids and a linked list of
+	 * guc ids available to be stolen
+	 */
+	spinlock_t contexts_lock;
+	struct ida guc_ids;
+	struct list_head guc_id_list;
+
 	bool submission_selected;
 
 	struct i915_vma *ads_vma;
@@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 				 response_buf, response_buf_size, 0);
 }
 
+static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
+					   const u32 *action,
+					   u32 len,
+					   bool loop)
+{
+	int err;
+	unsigned int sleep_period_ms = 1;
+	bool not_atomic = !in_atomic() && !irqs_disabled();
+
+	/* No sleeping with spin locks, just busy loop */
+	might_sleep_if(loop && not_atomic);
+
+retry:
+	err = intel_guc_send_nb(guc, action, len);
+	if (unlikely(err == -EBUSY && loop)) {
+		if (likely(not_atomic)) {
+			if (msleep_interruptible(sleep_period_ms))
+				return -EINTR;
+			sleep_period_ms = sleep_period_ms << 1;
+		} else {
+			cpu_relax();
+		}
+		goto retry;
+	}
+
+	return err;
+}
+
 static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
 {
 	intel_guc_ct_event_handler(&guc->ct);
@@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 int intel_guc_reset_engine(struct intel_guc *guc,
 			   struct intel_engine_cs *engine);
 
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
+					  const u32 *msg, u32 len);
+
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 83ec60ea3f89..28ff82c5be45 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_DEFAULT:
 		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
+		ret = intel_guc_deregister_done_process_msg(guc, payload,
+							    len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 53b4a5eb4a85..a47b3813b4d0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -13,7 +13,9 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_gt_requests.h"
 #include "gt/intel_lrc.h"
+#include "gt/intel_lrc_reg.h"
 #include "gt/intel_mocs.h"
 #include "gt/intel_ring.h"
 
@@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+/*
+ * Below is a set of functions which control the GuC scheduling state which
+ * require a lock, aside from the special case where the functions are called
+ * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
+ * path to be executing on the context.
+ */
+#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
+#define SCHED_STATE_DESTROYED				BIT(1)
+static inline void init_sched_state(struct intel_context *ce)
+{
+	/* Only should be called from guc_lrc_desc_pin() */
+	atomic_set(&ce->guc_sched_state_no_lock, 0);
+	ce->guc_state.sched_state = 0;
+}
+
+static inline bool
+context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state &
+		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+
+static inline void
+set_context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	/* Only should be called from guc_lrc_desc_pin() */
+	ce->guc_state.sched_state |=
+		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
+}
+
+static inline void
+clr_context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state =
+		(ce->guc_state.sched_state &
+		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+
+static inline bool
+context_destroyed(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
+}
+
+static inline void
+set_context_destroyed(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
+}
+
+static inline bool context_guc_id_invalid(struct intel_context *ce)
+{
+	return (ce->guc_id == GUC_INVALID_LRC_ID);
+}
+
+static inline void set_context_guc_id_invalid(struct intel_context *ce)
+{
+	ce->guc_id = GUC_INVALID_LRC_ID;
+}
+
+static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
+{
+	return &ce->engine->gt->uc.guc;
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	int len = 0;
 	bool enabled = context_enabled(ce);
 
+	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
+	GEM_BUG_ON(context_guc_id_invalid(ce));
+
 	if (!enabled) {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
@@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
 
 	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
 
+	spin_lock_init(&guc->contexts_lock);
+	INIT_LIST_HEAD(&guc->guc_id_list);
+	ida_init(&guc->guc_ids);
+
 	return 0;
 }
 
@@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 	i915_sched_engine_put(guc->sched_engine);
 }
 
-static int guc_context_alloc(struct intel_context *ce)
+static inline void queue_request(struct i915_sched_engine *sched_engine,
+				 struct i915_request *rq,
+				 int prio)
 {
-	return lrc_alloc(ce, ce->engine);
+	GEM_BUG_ON(!list_empty(&rq->sched.link));
+	list_add_tail(&rq->sched.link,
+		      i915_sched_lookup_priolist(sched_engine, prio));
+	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+}
+
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+				     struct i915_request *rq)
+{
+	int ret;
+
+	__i915_request_submit(rq);
+
+	trace_i915_request_in(rq, 0);
+
+	guc_set_lrc_tail(rq);
+	ret = guc_add_request(guc, rq);
+	if (ret == -EBUSY)
+		guc->stalled_request = rq;
+
+	return ret;
+}
+
+static void guc_submit_request(struct i915_request *rq)
+{
+	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+	struct intel_guc *guc = &rq->engine->gt->uc.guc;
+	unsigned long flags;
+
+	/* Will be called from irq-context when using foreign fences. */
+	spin_lock_irqsave(&sched_engine->lock, flags);
+
+	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+		queue_request(sched_engine, rq, rq_prio(rq));
+	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
+		tasklet_hi_schedule(&sched_engine->tasklet);
+
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+#define GUC_ID_START	64	/* First 64 guc_ids reserved */
+static int new_guc_id(struct intel_guc *guc)
+{
+	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
+			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
+			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+}
+
+static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	if (!context_guc_id_invalid(ce)) {
+		ida_simple_remove(&guc->guc_ids, ce->guc_id);
+		reset_lrc_desc(guc, ce->guc_id);
+		set_context_guc_id_invalid(ce);
+	}
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+}
+
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	__release_guc_id(guc, ce);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+
+static int steal_guc_id(struct intel_guc *guc)
+{
+	struct intel_context *ce;
+	int guc_id;
+
+	if (!list_empty(&guc->guc_id_list)) {
+		ce = list_first_entry(&guc->guc_id_list,
+				      struct intel_context,
+				      guc_id_link);
+
+		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+		GEM_BUG_ON(context_guc_id_invalid(ce));
+
+		list_del_init(&ce->guc_id_link);
+		guc_id = ce->guc_id;
+		set_context_guc_id_invalid(ce);
+		return guc_id;
+	} else {
+		return -EAGAIN;
+	}
+}
+
+static int assign_guc_id(struct intel_guc *guc, u16 *out)
+{
+	int ret;
+
+	ret = new_guc_id(guc);
+	if (unlikely(ret < 0)) {
+		ret = steal_guc_id(guc);
+		if (ret < 0)
+			return ret;
+	}
+
+	*out = ret;
+	return 0;
+}
+
+#define PIN_GUC_ID_TRIES	4
+static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	int ret = 0;
+	unsigned long flags, tries = PIN_GUC_ID_TRIES;
+
+	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+
+try_again:
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+
+	if (context_guc_id_invalid(ce)) {
+		ret = assign_guc_id(guc, &ce->guc_id);
+		if (ret)
+			goto out_unlock;
+		ret = 1;	/* Indidcates newly assigned guc_id */
+	}
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+	atomic_inc(&ce->guc_id_ref);
+
+out_unlock:
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+
+	/*
+	 * -EAGAIN indicates no guc_ids are available, let's retire any
+	 * outstanding requests to see if that frees up a guc_id. If the first
+	 * retire didn't help, insert a sleep with the timeslice duration before
+	 * attempting to retire more requests. Double the sleep period each
+	 * subsequent pass before finally giving up. The sleep period has max of
+	 * 100ms and minimum of 1ms.
+	 */
+	if (ret == -EAGAIN && --tries) {
+		if (PIN_GUC_ID_TRIES - tries > 1) {
+			unsigned int timeslice_shifted =
+				ce->engine->props.timeslice_duration_ms <<
+				(PIN_GUC_ID_TRIES - tries - 2);
+			unsigned int max = min_t(unsigned int, 100,
+						 timeslice_shifted);
+
+			msleep(max_t(unsigned int, max, 1));
+		}
+		intel_gt_retire_requests(guc_to_gt(guc));
+		goto try_again;
+	}
+
+	return ret;
+}
+
+static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
+
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
+	    !atomic_read(&ce->guc_id_ref))
+		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+
+static int __guc_action_register_context(struct intel_guc *guc,
+					 u32 guc_id,
+					 u32 offset)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_REGISTER_CONTEXT,
+		guc_id,
+		offset,
+	};
+
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static int register_context(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
+		ce->guc_id * sizeof(struct guc_lrc_desc);
+
+	return __guc_action_register_context(guc, ce->guc_id, offset);
+}
+
+static int __guc_action_deregister_context(struct intel_guc *guc,
+					   u32 guc_id)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
+		guc_id,
+	};
+
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static int deregister_context(struct intel_context *ce, u32 guc_id)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	return __guc_action_deregister_context(guc, guc_id);
+}
+
+static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
+{
+	switch (class) {
+	case RENDER_CLASS:
+		return mask >> RCS0;
+	case VIDEO_ENHANCEMENT_CLASS:
+		return mask >> VECS0;
+	case VIDEO_DECODE_CLASS:
+		return mask >> VCS0;
+	case COPY_ENGINE_CLASS:
+		return mask >> BCS0;
+	default:
+		GEM_BUG_ON("Invalid Class");
+		return 0;
+	}
+}
+
+static void guc_context_policy_init(struct intel_engine_cs *engine,
+				    struct guc_lrc_desc *desc)
+{
+	desc->policy_flags = 0;
+
+	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
+	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+}
+
+static int guc_lrc_desc_pin(struct intel_context *ce)
+{
+	struct intel_runtime_pm *runtime_pm =
+		&ce->engine->gt->i915->runtime_pm;
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	u32 desc_idx = ce->guc_id;
+	struct guc_lrc_desc *desc;
+	bool context_registered;
+	intel_wakeref_t wakeref;
+	int ret = 0;
+
+	GEM_BUG_ON(!engine->mask);
+
+	/*
+	 * Ensure LRC + CT vmas are is same region as write barrier is done
+	 * based on CT vma region.
+	 */
+	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
+		   i915_gem_object_is_lmem(ce->ring->vma->obj));
+
+	context_registered = lrc_desc_registered(guc, desc_idx);
+
+	reset_lrc_desc(guc, desc_idx);
+	set_lrc_desc_registered(guc, desc_idx, ce);
+
+	desc = __get_lrc_desc(guc, desc_idx);
+	desc->engine_class = engine_class_to_guc_class(engine->class);
+	desc->engine_submit_mask = adjust_engine_mask(engine->class,
+						      engine->mask);
+	desc->hw_context_desc = ce->lrc.lrca;
+	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
+	guc_context_policy_init(engine, desc);
+	init_sched_state(ce);
+
+	/*
+	 * The context_lookup xarray is used to determine if the hardware
+	 * context is currently registered. There are two cases in which it
+	 * could be regisgered either the guc_id has been stole from from
+	 * another context or the lrc descriptor address of this context has
+	 * changed. In either case the context needs to be deregistered with the
+	 * GuC before registering this context.
+	 */
+	if (context_registered) {
+		set_context_wait_for_deregister_to_register(ce);
+		intel_context_get(ce);
+
+		/*
+		 * If stealing the guc_id, this ce has the same guc_id as the
+		 * context whos guc_id was stole.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			ret = deregister_context(ce, ce->guc_id);
+	} else {
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			ret = register_context(ce);
+	}
+
+	return ret;
 }
 
 static int guc_context_pre_pin(struct intel_context *ce,
@@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
 
 static int guc_context_pin(struct intel_context *ce, void *vaddr)
 {
+	if (i915_ggtt_offset(ce->state) !=
+	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
+		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+
 	return lrc_pin(ce, ce->engine, vaddr);
 }
 
+static void guc_context_unpin(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	unpin_guc_id(guc, ce);
+	lrc_unpin(ce);
+}
+
+static void guc_context_post_unpin(struct intel_context *ce)
+{
+	lrc_post_unpin(ce);
+}
+
+static inline void guc_lrc_desc_unpin(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	unsigned long flags;
+
+	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
+	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	set_context_destroyed(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	deregister_context(ce, ce->guc_id);
+}
+
+static void guc_context_destroy(struct kref *kref)
+{
+	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	struct intel_guc *guc = &ce->engine->gt->uc.guc;
+	intel_wakeref_t wakeref;
+	unsigned long flags;
+
+	/*
+	 * If the guc_id is invalid this context has been stolen and we can free
+	 * it immediately. Also can be freed immediately if the context is not
+	 * registered with the GuC.
+	 */
+	if (context_guc_id_invalid(ce) ||
+	    !lrc_desc_registered(guc, ce->guc_id)) {
+		release_guc_id(guc, ce);
+		lrc_destroy(kref);
+		return;
+	}
+
+	/*
+	 * We have to acquire the context spinlock and check guc_id again, if it
+	 * is valid it hasn't been stolen and needs to be deregistered. We
+	 * delete this context from the list of unpinned guc_ids available to
+	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
+	 * returns indicating this context has been deregistered the guc_id is
+	 * returned to the pool of available guc_ids.
+	 */
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	if (context_guc_id_invalid(ce)) {
+		__release_guc_id(guc, ce);
+		spin_unlock_irqrestore(&guc->contexts_lock, flags);
+		lrc_destroy(kref);
+		return;
+	}
+
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+
+	/*
+	 * We defer GuC context deregistration until the context is destroyed
+	 * in order to save on CTBs. With this optimization ideally we only need
+	 * 1 CTB to register the context during the first pin and 1 CTB to
+	 * deregister the context when the context is destroyed. Without this
+	 * optimization, a CTB would be needed every pin & unpin.
+	 *
+	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
+	 * from context_free_worker when not runtime wakeref is held.
+	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
+	 * in H2G CTB to deregister the context. A future patch may defer this
+	 * H2G CTB if the runtime wakeref is zero.
+	 */
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		guc_lrc_desc_unpin(ce);
+}
+
+static int guc_context_alloc(struct intel_context *ce)
+{
+	return lrc_alloc(ce, ce->engine);
+}
+
 static const struct intel_context_ops guc_context_ops = {
 	.alloc = guc_context_alloc,
 
 	.pre_pin = guc_context_pre_pin,
 	.pin = guc_context_pin,
-	.unpin = lrc_unpin,
-	.post_unpin = lrc_post_unpin,
+	.unpin = guc_context_unpin,
+	.post_unpin = guc_context_post_unpin,
 
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
 	.reset = lrc_reset,
-	.destroy = lrc_destroy,
+	.destroy = guc_context_destroy,
 };
 
-static int guc_request_alloc(struct i915_request *request)
+static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
+	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
+		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+}
+
+static int guc_request_alloc(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+	struct intel_guc *guc = ce_to_guc(ce);
 	int ret;
 
-	GEM_BUG_ON(!intel_context_is_pinned(request->context));
+	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
 
 	/*
 	 * Flush enough space to reduce the likelihood of waiting after
 	 * we start building the request - in which case we will just
 	 * have to repeat work.
 	 */
-	request->reserved_space += GUC_REQUEST_SIZE;
+	rq->reserved_space += GUC_REQUEST_SIZE;
 
 	/*
 	 * Note that after this point, we have committed to using
@@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
 	 */
 
 	/* Unconditionally invalidate GPU caches and TLBs. */
-	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
+	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
 	if (ret)
 		return ret;
 
-	request->reserved_space -= GUC_REQUEST_SIZE;
-	return 0;
-}
+	rq->reserved_space -= GUC_REQUEST_SIZE;
 
-static inline void queue_request(struct i915_sched_engine *sched_engine,
-				 struct i915_request *rq,
-				 int prio)
-{
-	GEM_BUG_ON(!list_empty(&rq->sched.link));
-	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(sched_engine, prio));
-	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-}
-
-static int guc_bypass_tasklet_submit(struct intel_guc *guc,
-				     struct i915_request *rq)
-{
-	int ret;
-
-	__i915_request_submit(rq);
-
-	trace_i915_request_in(rq, 0);
-
-	guc_set_lrc_tail(rq);
-	ret = guc_add_request(guc, rq);
-	if (ret == -EBUSY)
-		guc->stalled_request = rq;
-
-	return ret;
-}
-
-static void guc_submit_request(struct i915_request *rq)
-{
-	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
-	struct intel_guc *guc = &rq->engine->gt->uc.guc;
-	unsigned long flags;
+	/*
+	 * Call pin_guc_id here rather than in the pinning step as with
+	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
+	 * guc_ids and creating horrible race conditions. This is especially bad
+	 * when guc_ids are being stolen due to over subscription. By the time
+	 * this function is reached, it is guaranteed that the guc_id will be
+	 * persistent until the generated request is retired. Thus, sealing these
+	 * race conditions. It is still safe to fail here if guc_ids are
+	 * exhausted and return -EAGAIN to the user indicating that they can try
+	 * again in the future.
+	 *
+	 * There is no need for a lock here as the timeline mutex ensures at
+	 * most one context can be executing this code path at once. The
+	 * guc_id_ref is incremented once for every request in flight and
+	 * decremented on each retire. When it is zero, a lock around the
+	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
+	 */
+	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
+		return 0;
 
-	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&sched_engine->lock, flags);
+	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
+	if (unlikely(ret < 0))
+		return ret;;
+	if (context_needs_register(ce, !!ret)) {
+		ret = guc_lrc_desc_pin(ce);
+		if (unlikely(ret)) {	/* unwind */
+			atomic_dec(&ce->guc_id_ref);
+			unpin_guc_id(guc, ce);
+			return ret;
+		}
+	}
 
-	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
-		queue_request(sched_engine, rq, rq_prio(rq));
-	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
-		tasklet_hi_schedule(&sched_engine->tasklet);
+	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
-	spin_unlock_irqrestore(&sched_engine->lock, flags);
+	return 0;
 }
 
 static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
 	engine->submit_request = guc_submit_request;
 }
 
+static inline void guc_kernel_context_pin(struct intel_guc *guc,
+					  struct intel_context *ce)
+{
+	if (context_guc_id_invalid(ce))
+		pin_guc_id(guc, ce);
+	guc_lrc_desc_pin(ce);
+}
+
+static inline void guc_init_lrc_mapping(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	/* make sure all descriptors are clean... */
+	xa_destroy(&guc->context_lookup);
+
+	/*
+	 * Some contexts might have been pinned before we enabled GuC
+	 * submission, so we need to add them to the GuC bookeeping.
+	 * Also, after a reset the GuC we want to make sure that the information
+	 * shared with GuC is properly reset. The kernel lrcs are not attached
+	 * to the gem_context, so they need to be added separately.
+	 *
+	 * Note: we purposely do not check the error return of
+	 * guc_lrc_desc_pin, because that function can only fail in two cases.
+	 * One, if there aren't enough free IDs, but we're guaranteed to have
+	 * enough here (we're either only pinning a handful of lrc on first boot
+	 * or we're re-pinning lrcs that were already pinned before the reset).
+	 * Two, if the GuC has died and CTBs can't make forward progress.
+	 * Presumably, the GuC should be alive as this function is called on
+	 * driver load or after a reset. Even if it is dead, another full GPU
+	 * reset will be triggered and this function would be called again.
+	 */
+
+	for_each_engine(engine, gt, id)
+		if (engine->kernel_context)
+			guc_kernel_context_pin(guc, engine->kernel_context);
+}
+
 static void guc_release(struct intel_engine_cs *engine)
 {
 	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
@@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 void intel_guc_submission_enable(struct intel_guc *guc)
 {
+	guc_init_lrc_mapping(guc);
 }
 
 void intel_guc_submission_disable(struct intel_guc *guc)
@@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
 {
 	guc->submission_selected = __guc_submission_selected(guc);
 }
+
+static inline struct intel_context *
+g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
+{
+	struct intel_context *ce;
+
+	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Invalid desc_idx %u", desc_idx);
+		return NULL;
+	}
+
+	ce = __get_context(guc, desc_idx);
+	if (unlikely(!ce)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Context is NULL, desc_idx %u", desc_idx);
+		return NULL;
+	}
+
+	return ce;
+}
+
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
+					  const u32 *msg,
+					  u32 len)
+{
+	struct intel_context *ce;
+	u32 desc_idx = msg[0];
+
+	if (unlikely(len < 1)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	if (context_wait_for_deregister_to_register(ce)) {
+		struct intel_runtime_pm *runtime_pm =
+			&ce->engine->gt->i915->runtime_pm;
+		intel_wakeref_t wakeref;
+
+		/*
+		 * Previous owner of this guc_id has been deregistered, now safe
+		 * register this context.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			register_context(ce);
+		clr_context_wait_for_deregister_to_register(ce);
+		intel_context_put(ce);
+	} else if (context_destroyed(ce)) {
+		/* Context has been destroyed */
+		release_guc_id(guc, ce);
+		lrc_destroy(&ce->ref);
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 943fe485c662..204c95c39353 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -4142,6 +4142,7 @@ enum {
 	FAULT_AND_CONTINUE /* Unsupported */
 };
 
+#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
 #define GEN8_CTX_VALID (1 << 0)
 #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
 #define GEN8_CTX_FORCE_RESTORE (1 << 2)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 86b4c9f2613d..b48c4905d3fc 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
 	 */
 	if (!list_empty(&rq->sched.link))
 		remove_from_engine(rq);
+	atomic_dec(&rq->context->guc_id_ref);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
 	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.

v2:
 (Daniel Vetter)
  - Use msleep_interruptible rather than cond_resched in busy loop
 (Michal)
  - Remove C++ style comment

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
 drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h               |   1 +
 drivers/gpu/drm/i915/i915_request.c           |   1 +
 8 files changed, 685 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index bd63813c8a80..32fd6647154b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 
 	mutex_init(&ce->pin_mutex);
 
+	spin_lock_init(&ce->guc_state.lock);
+
+	ce->guc_id = GUC_INVALID_LRC_ID;
+	INIT_LIST_HEAD(&ce->guc_id_link);
+
 	i915_active_init(&ce->active,
 			 __intel_context_active, __intel_context_retire, 0);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 6d99631d19b9..606c480aec26 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -96,6 +96,7 @@ struct intel_context {
 #define CONTEXT_BANNED			6
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
 #define CONTEXT_NOPREEMPT		8
+#define CONTEXT_LRCA_DIRTY		9
 
 	struct {
 		u64 timeout_us;
@@ -138,14 +139,29 @@ struct intel_context {
 
 	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
 
+	struct {
+		/** lock: protects everything in guc_state */
+		spinlock_t lock;
+		/**
+		 * sched_state: scheduling state of this context using GuC
+		 * submission
+		 */
+		u8 sched_state;
+	} guc_state;
+
 	/* GuC scheduling state flags that do not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
+	/* GuC LRC descriptor ID */
+	u16 guc_id;
+
+	/* GuC LRC descriptor reference count */
+	atomic_t guc_id_ref;
+
 	/*
-	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
-	 * in the series will.
+	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
-	u16 guc_id;
+	struct list_head guc_id_link;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
index 41e5350a7a05..49d4857ad9b7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
@@ -87,7 +87,6 @@
 #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
 
 #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
-#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
 #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
 /* in Gen12 ID 0x7FF is reserved to indicate idle */
 #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 8c7b92f699f1..30773cd699f5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -7,6 +7,7 @@
 #define _INTEL_GUC_H_
 
 #include <linux/xarray.h>
+#include <linux/delay.h>
 
 #include "intel_uncore.h"
 #include "intel_guc_fw.h"
@@ -44,6 +45,14 @@ struct intel_guc {
 		void (*disable)(struct intel_guc *guc);
 	} interrupts;
 
+	/*
+	 * contexts_lock protects the pool of free guc ids and a linked list of
+	 * guc ids available to be stolen
+	 */
+	spinlock_t contexts_lock;
+	struct ida guc_ids;
+	struct list_head guc_id_list;
+
 	bool submission_selected;
 
 	struct i915_vma *ads_vma;
@@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 				 response_buf, response_buf_size, 0);
 }
 
+static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
+					   const u32 *action,
+					   u32 len,
+					   bool loop)
+{
+	int err;
+	unsigned int sleep_period_ms = 1;
+	bool not_atomic = !in_atomic() && !irqs_disabled();
+
+	/* No sleeping with spin locks, just busy loop */
+	might_sleep_if(loop && not_atomic);
+
+retry:
+	err = intel_guc_send_nb(guc, action, len);
+	if (unlikely(err == -EBUSY && loop)) {
+		if (likely(not_atomic)) {
+			if (msleep_interruptible(sleep_period_ms))
+				return -EINTR;
+			sleep_period_ms = sleep_period_ms << 1;
+		} else {
+			cpu_relax();
+		}
+		goto retry;
+	}
+
+	return err;
+}
+
 static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
 {
 	intel_guc_ct_event_handler(&guc->ct);
@@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 int intel_guc_reset_engine(struct intel_guc *guc,
 			   struct intel_engine_cs *engine);
 
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
+					  const u32 *msg, u32 len);
+
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 83ec60ea3f89..28ff82c5be45 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_DEFAULT:
 		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
+		ret = intel_guc_deregister_done_process_msg(guc, payload,
+							    len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 53b4a5eb4a85..a47b3813b4d0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -13,7 +13,9 @@
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
+#include "gt/intel_gt_requests.h"
 #include "gt/intel_lrc.h"
+#include "gt/intel_lrc_reg.h"
 #include "gt/intel_mocs.h"
 #include "gt/intel_ring.h"
 
@@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+/*
+ * Below is a set of functions which control the GuC scheduling state which
+ * require a lock, aside from the special case where the functions are called
+ * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
+ * path to be executing on the context.
+ */
+#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
+#define SCHED_STATE_DESTROYED				BIT(1)
+static inline void init_sched_state(struct intel_context *ce)
+{
+	/* Only should be called from guc_lrc_desc_pin() */
+	atomic_set(&ce->guc_sched_state_no_lock, 0);
+	ce->guc_state.sched_state = 0;
+}
+
+static inline bool
+context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state &
+		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+
+static inline void
+set_context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	/* Only should be called from guc_lrc_desc_pin() */
+	ce->guc_state.sched_state |=
+		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
+}
+
+static inline void
+clr_context_wait_for_deregister_to_register(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state =
+		(ce->guc_state.sched_state &
+		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
+}
+
+static inline bool
+context_destroyed(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
+}
+
+static inline void
+set_context_destroyed(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
+}
+
+static inline bool context_guc_id_invalid(struct intel_context *ce)
+{
+	return (ce->guc_id == GUC_INVALID_LRC_ID);
+}
+
+static inline void set_context_guc_id_invalid(struct intel_context *ce)
+{
+	ce->guc_id = GUC_INVALID_LRC_ID;
+}
+
+static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
+{
+	return &ce->engine->gt->uc.guc;
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	int len = 0;
 	bool enabled = context_enabled(ce);
 
+	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
+	GEM_BUG_ON(context_guc_id_invalid(ce));
+
 	if (!enabled) {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
@@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
 
 	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
 
+	spin_lock_init(&guc->contexts_lock);
+	INIT_LIST_HEAD(&guc->guc_id_list);
+	ida_init(&guc->guc_ids);
+
 	return 0;
 }
 
@@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
 	i915_sched_engine_put(guc->sched_engine);
 }
 
-static int guc_context_alloc(struct intel_context *ce)
+static inline void queue_request(struct i915_sched_engine *sched_engine,
+				 struct i915_request *rq,
+				 int prio)
 {
-	return lrc_alloc(ce, ce->engine);
+	GEM_BUG_ON(!list_empty(&rq->sched.link));
+	list_add_tail(&rq->sched.link,
+		      i915_sched_lookup_priolist(sched_engine, prio));
+	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+}
+
+static int guc_bypass_tasklet_submit(struct intel_guc *guc,
+				     struct i915_request *rq)
+{
+	int ret;
+
+	__i915_request_submit(rq);
+
+	trace_i915_request_in(rq, 0);
+
+	guc_set_lrc_tail(rq);
+	ret = guc_add_request(guc, rq);
+	if (ret == -EBUSY)
+		guc->stalled_request = rq;
+
+	return ret;
+}
+
+static void guc_submit_request(struct i915_request *rq)
+{
+	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
+	struct intel_guc *guc = &rq->engine->gt->uc.guc;
+	unsigned long flags;
+
+	/* Will be called from irq-context when using foreign fences. */
+	spin_lock_irqsave(&sched_engine->lock, flags);
+
+	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+		queue_request(sched_engine, rq, rq_prio(rq));
+	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
+		tasklet_hi_schedule(&sched_engine->tasklet);
+
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+#define GUC_ID_START	64	/* First 64 guc_ids reserved */
+static int new_guc_id(struct intel_guc *guc)
+{
+	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
+			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
+			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
+}
+
+static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	if (!context_guc_id_invalid(ce)) {
+		ida_simple_remove(&guc->guc_ids, ce->guc_id);
+		reset_lrc_desc(guc, ce->guc_id);
+		set_context_guc_id_invalid(ce);
+	}
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+}
+
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	__release_guc_id(guc, ce);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+
+static int steal_guc_id(struct intel_guc *guc)
+{
+	struct intel_context *ce;
+	int guc_id;
+
+	if (!list_empty(&guc->guc_id_list)) {
+		ce = list_first_entry(&guc->guc_id_list,
+				      struct intel_context,
+				      guc_id_link);
+
+		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+		GEM_BUG_ON(context_guc_id_invalid(ce));
+
+		list_del_init(&ce->guc_id_link);
+		guc_id = ce->guc_id;
+		set_context_guc_id_invalid(ce);
+		return guc_id;
+	} else {
+		return -EAGAIN;
+	}
+}
+
+static int assign_guc_id(struct intel_guc *guc, u16 *out)
+{
+	int ret;
+
+	ret = new_guc_id(guc);
+	if (unlikely(ret < 0)) {
+		ret = steal_guc_id(guc);
+		if (ret < 0)
+			return ret;
+	}
+
+	*out = ret;
+	return 0;
+}
+
+#define PIN_GUC_ID_TRIES	4
+static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	int ret = 0;
+	unsigned long flags, tries = PIN_GUC_ID_TRIES;
+
+	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
+
+try_again:
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+
+	if (context_guc_id_invalid(ce)) {
+		ret = assign_guc_id(guc, &ce->guc_id);
+		if (ret)
+			goto out_unlock;
+		ret = 1;	/* Indidcates newly assigned guc_id */
+	}
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+	atomic_inc(&ce->guc_id_ref);
+
+out_unlock:
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+
+	/*
+	 * -EAGAIN indicates no guc_ids are available, let's retire any
+	 * outstanding requests to see if that frees up a guc_id. If the first
+	 * retire didn't help, insert a sleep with the timeslice duration before
+	 * attempting to retire more requests. Double the sleep period each
+	 * subsequent pass before finally giving up. The sleep period has max of
+	 * 100ms and minimum of 1ms.
+	 */
+	if (ret == -EAGAIN && --tries) {
+		if (PIN_GUC_ID_TRIES - tries > 1) {
+			unsigned int timeslice_shifted =
+				ce->engine->props.timeslice_duration_ms <<
+				(PIN_GUC_ID_TRIES - tries - 2);
+			unsigned int max = min_t(unsigned int, 100,
+						 timeslice_shifted);
+
+			msleep(max_t(unsigned int, max, 1));
+		}
+		intel_gt_retire_requests(guc_to_gt(guc));
+		goto try_again;
+	}
+
+	return ret;
+}
+
+static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
+
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
+	    !atomic_read(&ce->guc_id_ref))
+		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+}
+
+static int __guc_action_register_context(struct intel_guc *guc,
+					 u32 guc_id,
+					 u32 offset)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_REGISTER_CONTEXT,
+		guc_id,
+		offset,
+	};
+
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static int register_context(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
+		ce->guc_id * sizeof(struct guc_lrc_desc);
+
+	return __guc_action_register_context(guc, ce->guc_id, offset);
+}
+
+static int __guc_action_deregister_context(struct intel_guc *guc,
+					   u32 guc_id)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
+		guc_id,
+	};
+
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static int deregister_context(struct intel_context *ce, u32 guc_id)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	return __guc_action_deregister_context(guc, guc_id);
+}
+
+static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
+{
+	switch (class) {
+	case RENDER_CLASS:
+		return mask >> RCS0;
+	case VIDEO_ENHANCEMENT_CLASS:
+		return mask >> VECS0;
+	case VIDEO_DECODE_CLASS:
+		return mask >> VCS0;
+	case COPY_ENGINE_CLASS:
+		return mask >> BCS0;
+	default:
+		GEM_BUG_ON("Invalid Class");
+		return 0;
+	}
+}
+
+static void guc_context_policy_init(struct intel_engine_cs *engine,
+				    struct guc_lrc_desc *desc)
+{
+	desc->policy_flags = 0;
+
+	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
+	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+}
+
+static int guc_lrc_desc_pin(struct intel_context *ce)
+{
+	struct intel_runtime_pm *runtime_pm =
+		&ce->engine->gt->i915->runtime_pm;
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	u32 desc_idx = ce->guc_id;
+	struct guc_lrc_desc *desc;
+	bool context_registered;
+	intel_wakeref_t wakeref;
+	int ret = 0;
+
+	GEM_BUG_ON(!engine->mask);
+
+	/*
+	 * Ensure LRC + CT vmas are is same region as write barrier is done
+	 * based on CT vma region.
+	 */
+	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
+		   i915_gem_object_is_lmem(ce->ring->vma->obj));
+
+	context_registered = lrc_desc_registered(guc, desc_idx);
+
+	reset_lrc_desc(guc, desc_idx);
+	set_lrc_desc_registered(guc, desc_idx, ce);
+
+	desc = __get_lrc_desc(guc, desc_idx);
+	desc->engine_class = engine_class_to_guc_class(engine->class);
+	desc->engine_submit_mask = adjust_engine_mask(engine->class,
+						      engine->mask);
+	desc->hw_context_desc = ce->lrc.lrca;
+	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
+	guc_context_policy_init(engine, desc);
+	init_sched_state(ce);
+
+	/*
+	 * The context_lookup xarray is used to determine if the hardware
+	 * context is currently registered. There are two cases in which it
+	 * could be regisgered either the guc_id has been stole from from
+	 * another context or the lrc descriptor address of this context has
+	 * changed. In either case the context needs to be deregistered with the
+	 * GuC before registering this context.
+	 */
+	if (context_registered) {
+		set_context_wait_for_deregister_to_register(ce);
+		intel_context_get(ce);
+
+		/*
+		 * If stealing the guc_id, this ce has the same guc_id as the
+		 * context whos guc_id was stole.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			ret = deregister_context(ce, ce->guc_id);
+	} else {
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			ret = register_context(ce);
+	}
+
+	return ret;
 }
 
 static int guc_context_pre_pin(struct intel_context *ce,
@@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
 
 static int guc_context_pin(struct intel_context *ce, void *vaddr)
 {
+	if (i915_ggtt_offset(ce->state) !=
+	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
+		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
+
 	return lrc_pin(ce, ce->engine, vaddr);
 }
 
+static void guc_context_unpin(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+
+	unpin_guc_id(guc, ce);
+	lrc_unpin(ce);
+}
+
+static void guc_context_post_unpin(struct intel_context *ce)
+{
+	lrc_post_unpin(ce);
+}
+
+static inline void guc_lrc_desc_unpin(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	unsigned long flags;
+
+	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
+	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	set_context_destroyed(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	deregister_context(ce, ce->guc_id);
+}
+
+static void guc_context_destroy(struct kref *kref)
+{
+	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	struct intel_guc *guc = &ce->engine->gt->uc.guc;
+	intel_wakeref_t wakeref;
+	unsigned long flags;
+
+	/*
+	 * If the guc_id is invalid this context has been stolen and we can free
+	 * it immediately. Also can be freed immediately if the context is not
+	 * registered with the GuC.
+	 */
+	if (context_guc_id_invalid(ce) ||
+	    !lrc_desc_registered(guc, ce->guc_id)) {
+		release_guc_id(guc, ce);
+		lrc_destroy(kref);
+		return;
+	}
+
+	/*
+	 * We have to acquire the context spinlock and check guc_id again, if it
+	 * is valid it hasn't been stolen and needs to be deregistered. We
+	 * delete this context from the list of unpinned guc_ids available to
+	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
+	 * returns indicating this context has been deregistered the guc_id is
+	 * returned to the pool of available guc_ids.
+	 */
+	spin_lock_irqsave(&guc->contexts_lock, flags);
+	if (context_guc_id_invalid(ce)) {
+		__release_guc_id(guc, ce);
+		spin_unlock_irqrestore(&guc->contexts_lock, flags);
+		lrc_destroy(kref);
+		return;
+	}
+
+	if (!list_empty(&ce->guc_id_link))
+		list_del_init(&ce->guc_id_link);
+	spin_unlock_irqrestore(&guc->contexts_lock, flags);
+
+	/*
+	 * We defer GuC context deregistration until the context is destroyed
+	 * in order to save on CTBs. With this optimization ideally we only need
+	 * 1 CTB to register the context during the first pin and 1 CTB to
+	 * deregister the context when the context is destroyed. Without this
+	 * optimization, a CTB would be needed every pin & unpin.
+	 *
+	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
+	 * from context_free_worker when not runtime wakeref is held.
+	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
+	 * in H2G CTB to deregister the context. A future patch may defer this
+	 * H2G CTB if the runtime wakeref is zero.
+	 */
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		guc_lrc_desc_unpin(ce);
+}
+
+static int guc_context_alloc(struct intel_context *ce)
+{
+	return lrc_alloc(ce, ce->engine);
+}
+
 static const struct intel_context_ops guc_context_ops = {
 	.alloc = guc_context_alloc,
 
 	.pre_pin = guc_context_pre_pin,
 	.pin = guc_context_pin,
-	.unpin = lrc_unpin,
-	.post_unpin = lrc_post_unpin,
+	.unpin = guc_context_unpin,
+	.post_unpin = guc_context_post_unpin,
 
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
 	.reset = lrc_reset,
-	.destroy = lrc_destroy,
+	.destroy = guc_context_destroy,
 };
 
-static int guc_request_alloc(struct i915_request *request)
+static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
+	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
+		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+}
+
+static int guc_request_alloc(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+	struct intel_guc *guc = ce_to_guc(ce);
 	int ret;
 
-	GEM_BUG_ON(!intel_context_is_pinned(request->context));
+	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
 
 	/*
 	 * Flush enough space to reduce the likelihood of waiting after
 	 * we start building the request - in which case we will just
 	 * have to repeat work.
 	 */
-	request->reserved_space += GUC_REQUEST_SIZE;
+	rq->reserved_space += GUC_REQUEST_SIZE;
 
 	/*
 	 * Note that after this point, we have committed to using
@@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
 	 */
 
 	/* Unconditionally invalidate GPU caches and TLBs. */
-	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
+	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
 	if (ret)
 		return ret;
 
-	request->reserved_space -= GUC_REQUEST_SIZE;
-	return 0;
-}
+	rq->reserved_space -= GUC_REQUEST_SIZE;
 
-static inline void queue_request(struct i915_sched_engine *sched_engine,
-				 struct i915_request *rq,
-				 int prio)
-{
-	GEM_BUG_ON(!list_empty(&rq->sched.link));
-	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(sched_engine, prio));
-	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-}
-
-static int guc_bypass_tasklet_submit(struct intel_guc *guc,
-				     struct i915_request *rq)
-{
-	int ret;
-
-	__i915_request_submit(rq);
-
-	trace_i915_request_in(rq, 0);
-
-	guc_set_lrc_tail(rq);
-	ret = guc_add_request(guc, rq);
-	if (ret == -EBUSY)
-		guc->stalled_request = rq;
-
-	return ret;
-}
-
-static void guc_submit_request(struct i915_request *rq)
-{
-	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
-	struct intel_guc *guc = &rq->engine->gt->uc.guc;
-	unsigned long flags;
+	/*
+	 * Call pin_guc_id here rather than in the pinning step as with
+	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
+	 * guc_ids and creating horrible race conditions. This is especially bad
+	 * when guc_ids are being stolen due to over subscription. By the time
+	 * this function is reached, it is guaranteed that the guc_id will be
+	 * persistent until the generated request is retired. Thus, sealing these
+	 * race conditions. It is still safe to fail here if guc_ids are
+	 * exhausted and return -EAGAIN to the user indicating that they can try
+	 * again in the future.
+	 *
+	 * There is no need for a lock here as the timeline mutex ensures at
+	 * most one context can be executing this code path at once. The
+	 * guc_id_ref is incremented once for every request in flight and
+	 * decremented on each retire. When it is zero, a lock around the
+	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
+	 */
+	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
+		return 0;
 
-	/* Will be called from irq-context when using foreign fences. */
-	spin_lock_irqsave(&sched_engine->lock, flags);
+	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
+	if (unlikely(ret < 0))
+		return ret;;
+	if (context_needs_register(ce, !!ret)) {
+		ret = guc_lrc_desc_pin(ce);
+		if (unlikely(ret)) {	/* unwind */
+			atomic_dec(&ce->guc_id_ref);
+			unpin_guc_id(guc, ce);
+			return ret;
+		}
+	}
 
-	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
-		queue_request(sched_engine, rq, rq_prio(rq));
-	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
-		tasklet_hi_schedule(&sched_engine->tasklet);
+	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
-	spin_unlock_irqrestore(&sched_engine->lock, flags);
+	return 0;
 }
 
 static void sanitize_hwsp(struct intel_engine_cs *engine)
@@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
 	engine->submit_request = guc_submit_request;
 }
 
+static inline void guc_kernel_context_pin(struct intel_guc *guc,
+					  struct intel_context *ce)
+{
+	if (context_guc_id_invalid(ce))
+		pin_guc_id(guc, ce);
+	guc_lrc_desc_pin(ce);
+}
+
+static inline void guc_init_lrc_mapping(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	/* make sure all descriptors are clean... */
+	xa_destroy(&guc->context_lookup);
+
+	/*
+	 * Some contexts might have been pinned before we enabled GuC
+	 * submission, so we need to add them to the GuC bookeeping.
+	 * Also, after a reset the GuC we want to make sure that the information
+	 * shared with GuC is properly reset. The kernel lrcs are not attached
+	 * to the gem_context, so they need to be added separately.
+	 *
+	 * Note: we purposely do not check the error return of
+	 * guc_lrc_desc_pin, because that function can only fail in two cases.
+	 * One, if there aren't enough free IDs, but we're guaranteed to have
+	 * enough here (we're either only pinning a handful of lrc on first boot
+	 * or we're re-pinning lrcs that were already pinned before the reset).
+	 * Two, if the GuC has died and CTBs can't make forward progress.
+	 * Presumably, the GuC should be alive as this function is called on
+	 * driver load or after a reset. Even if it is dead, another full GPU
+	 * reset will be triggered and this function would be called again.
+	 */
+
+	for_each_engine(engine, gt, id)
+		if (engine->kernel_context)
+			guc_kernel_context_pin(guc, engine->kernel_context);
+}
+
 static void guc_release(struct intel_engine_cs *engine)
 {
 	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
@@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 void intel_guc_submission_enable(struct intel_guc *guc)
 {
+	guc_init_lrc_mapping(guc);
 }
 
 void intel_guc_submission_disable(struct intel_guc *guc)
@@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
 {
 	guc->submission_selected = __guc_submission_selected(guc);
 }
+
+static inline struct intel_context *
+g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
+{
+	struct intel_context *ce;
+
+	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Invalid desc_idx %u", desc_idx);
+		return NULL;
+	}
+
+	ce = __get_context(guc, desc_idx);
+	if (unlikely(!ce)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Context is NULL, desc_idx %u", desc_idx);
+		return NULL;
+	}
+
+	return ce;
+}
+
+int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
+					  const u32 *msg,
+					  u32 len)
+{
+	struct intel_context *ce;
+	u32 desc_idx = msg[0];
+
+	if (unlikely(len < 1)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	if (context_wait_for_deregister_to_register(ce)) {
+		struct intel_runtime_pm *runtime_pm =
+			&ce->engine->gt->i915->runtime_pm;
+		intel_wakeref_t wakeref;
+
+		/*
+		 * Previous owner of this guc_id has been deregistered, now safe
+		 * register this context.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref)
+			register_context(ce);
+		clr_context_wait_for_deregister_to_register(ce);
+		intel_context_put(ce);
+	} else if (context_destroyed(ce)) {
+		/* Context has been destroyed */
+		release_guc_id(guc, ce);
+		lrc_destroy(&ce->ref);
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 943fe485c662..204c95c39353 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -4142,6 +4142,7 @@ enum {
 	FAULT_AND_CONTINUE /* Unsupported */
 };
 
+#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
 #define GEN8_CTX_VALID (1 << 0)
 #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
 #define GEN8_CTX_FORCE_RESTORE (1 << 2)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 86b4c9f2613d..b48c4905d3fc 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
 	 */
 	if (!list_empty(&rq->sched.link))
 		remove_from_engine(rq);
+	atomic_dec(&rq->context->guc_id_ref);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
 	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 07/51] drm/i915/guc: Insert fence on context when deregistering
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Sometimes during context pinning a context with the same guc_id is
registered with the GuC. In this a case deregister must be done before
the context can be registered. A fence is inserted on all requests while
the deregister is in flight. Once the G2H is received indicating the
deregistration is complete the context is registered and the fence is
released.

v2:
 (John H)
  - Fix commit message

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |  1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h           |  8 +++
 4 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 32fd6647154b..ad7197c5910f 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -385,6 +385,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	mutex_init(&ce->pin_mutex);
 
 	spin_lock_init(&ce->guc_state.lock);
+	INIT_LIST_HEAD(&ce->guc_state.fences);
 
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 606c480aec26..e0e3a937f709 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -147,6 +147,11 @@ struct intel_context {
 		 * submission
 		 */
 		u8 sched_state;
+		/*
+		 * fences: maintains of list of requests that have a submit
+		 * fence related to GuC submission
+		 */
+		struct list_head fences;
 	} guc_state;
 
 	/* GuC scheduling state flags that do not require a lock. */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a47b3813b4d0..a438aecfe93f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -926,6 +926,30 @@ static const struct intel_context_ops guc_context_ops = {
 	.destroy = guc_context_destroy,
 };
 
+static void __guc_signal_context_fence(struct intel_context *ce)
+{
+	struct i915_request *rq;
+
+	lockdep_assert_held(&ce->guc_state.lock);
+
+	list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
+		i915_sw_fence_complete(&rq->submit);
+
+	INIT_LIST_HEAD(&ce->guc_state.fences);
+}
+
+static void guc_signal_context_fence(struct intel_context *ce)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	clr_context_wait_for_deregister_to_register(ce);
+	__guc_signal_context_fence(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+}
+
 static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
 	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
@@ -936,6 +960,7 @@ static int guc_request_alloc(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
 	struct intel_guc *guc = ce_to_guc(ce);
+	unsigned long flags;
 	int ret;
 
 	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
@@ -980,7 +1005,7 @@ static int guc_request_alloc(struct i915_request *rq)
 	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
 	 */
 	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
-		return 0;
+		goto out;
 
 	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
 	if (unlikely(ret < 0))
@@ -996,6 +1021,28 @@ static int guc_request_alloc(struct i915_request *rq)
 
 	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
+out:
+	/*
+	 * We block all requests on this context if a G2H is pending for a
+	 * context deregistration as the GuC will fail a context registration
+	 * while this G2H is pending. Once a G2H returns, the fence is released
+	 * that is blocking these requests (see guc_signal_context_fence).
+	 *
+	 * We can safely check the below field outside of the lock as it isn't
+	 * possible for this field to transition from being clear to set but
+	 * converse is possible, hence the need for the check within the lock.
+	 */
+	if (likely(!context_wait_for_deregister_to_register(ce)))
+		return 0;
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	if (context_wait_for_deregister_to_register(ce)) {
+		i915_sw_fence_await(&rq->submit);
+
+		list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
+	}
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
 	return 0;
 }
 
@@ -1297,7 +1344,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			register_context(ce);
-		clr_context_wait_for_deregister_to_register(ce);
+		guc_signal_context_fence(ce);
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 5deb65ec5fa5..717e5b292046 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -285,6 +285,14 @@ struct i915_request {
 		struct hrtimer timer;
 	} watchdog;
 
+	/*
+	 * Requests may need to be stalled when using GuC submission waiting for
+	 * certain GuC operations to complete. If that is the case, stalled
+	 * requests are added to a per context list of stalled requests. The
+	 * below list_head is the link in that list.
+	 */
+	struct list_head guc_fence_link;
+
 	I915_SELFTEST_DECLARE(struct {
 		struct list_head link;
 		unsigned long delay;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 07/51] drm/i915/guc: Insert fence on context when deregistering
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Sometimes during context pinning a context with the same guc_id is
registered with the GuC. In this a case deregister must be done before
the context can be registered. A fence is inserted on all requests while
the deregister is in flight. Once the G2H is received indicating the
deregistration is complete the context is registered and the fence is
released.

v2:
 (John H)
  - Fix commit message

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |  1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  5 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 51 ++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h           |  8 +++
 4 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 32fd6647154b..ad7197c5910f 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -385,6 +385,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	mutex_init(&ce->pin_mutex);
 
 	spin_lock_init(&ce->guc_state.lock);
+	INIT_LIST_HEAD(&ce->guc_state.fences);
 
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 606c480aec26..e0e3a937f709 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -147,6 +147,11 @@ struct intel_context {
 		 * submission
 		 */
 		u8 sched_state;
+		/*
+		 * fences: maintains of list of requests that have a submit
+		 * fence related to GuC submission
+		 */
+		struct list_head fences;
 	} guc_state;
 
 	/* GuC scheduling state flags that do not require a lock. */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a47b3813b4d0..a438aecfe93f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -926,6 +926,30 @@ static const struct intel_context_ops guc_context_ops = {
 	.destroy = guc_context_destroy,
 };
 
+static void __guc_signal_context_fence(struct intel_context *ce)
+{
+	struct i915_request *rq;
+
+	lockdep_assert_held(&ce->guc_state.lock);
+
+	list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
+		i915_sw_fence_complete(&rq->submit);
+
+	INIT_LIST_HEAD(&ce->guc_state.fences);
+}
+
+static void guc_signal_context_fence(struct intel_context *ce)
+{
+	unsigned long flags;
+
+	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	clr_context_wait_for_deregister_to_register(ce);
+	__guc_signal_context_fence(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+}
+
 static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
 	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
@@ -936,6 +960,7 @@ static int guc_request_alloc(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
 	struct intel_guc *guc = ce_to_guc(ce);
+	unsigned long flags;
 	int ret;
 
 	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
@@ -980,7 +1005,7 @@ static int guc_request_alloc(struct i915_request *rq)
 	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
 	 */
 	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
-		return 0;
+		goto out;
 
 	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
 	if (unlikely(ret < 0))
@@ -996,6 +1021,28 @@ static int guc_request_alloc(struct i915_request *rq)
 
 	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
+out:
+	/*
+	 * We block all requests on this context if a G2H is pending for a
+	 * context deregistration as the GuC will fail a context registration
+	 * while this G2H is pending. Once a G2H returns, the fence is released
+	 * that is blocking these requests (see guc_signal_context_fence).
+	 *
+	 * We can safely check the below field outside of the lock as it isn't
+	 * possible for this field to transition from being clear to set but
+	 * converse is possible, hence the need for the check within the lock.
+	 */
+	if (likely(!context_wait_for_deregister_to_register(ce)))
+		return 0;
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	if (context_wait_for_deregister_to_register(ce)) {
+		i915_sw_fence_await(&rq->submit);
+
+		list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
+	}
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
 	return 0;
 }
 
@@ -1297,7 +1344,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
 			register_context(ce);
-		clr_context_wait_for_deregister_to_register(ce);
+		guc_signal_context_fence(ce);
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 5deb65ec5fa5..717e5b292046 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -285,6 +285,14 @@ struct i915_request {
 		struct hrtimer timer;
 	} watchdog;
 
+	/*
+	 * Requests may need to be stalled when using GuC submission waiting for
+	 * certain GuC operations to complete. If that is the case, stalled
+	 * requests are added to a per context list of stalled requests. The
+	 * below list_head is the link in that list.
+	 */
+	struct list_head guc_fence_link;
+
 	I915_SELFTEST_DECLARE(struct {
 		struct list_head link;
 		unsigned long delay;
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 08/51] drm/i915/guc: Defer context unpin until scheduling is disabled
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

With GuC scheduling, it isn't safe to unpin a context while scheduling
is enabled for that context as the GuC may touch some of the pinned
state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is
done, a call back is added to intel_context_unpin when pin count == 1
to disable scheduling for that context. When the response CTB is
received it is safe to do the final unpin.

Future patches may add a heuristic / delay to schedule the disable
call back to avoid thrashing on schedule enable / disable.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   4 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |  27 +++-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   3 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 147 +++++++++++++++++-
 6 files changed, 181 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index ad7197c5910f..3d5b4116617f 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -306,9 +306,9 @@ int __intel_context_do_pin(struct intel_context *ce)
 	return err;
 }
 
-void intel_context_unpin(struct intel_context *ce)
+void __intel_context_do_unpin(struct intel_context *ce, int sub)
 {
-	if (!atomic_dec_and_test(&ce->pin_count))
+	if (!atomic_sub_and_test(sub, &ce->pin_count))
 		return;
 
 	CE_TRACE(ce, "unpin\n");
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index b10cbe8fee99..974ef85320c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -113,7 +113,32 @@ static inline void __intel_context_pin(struct intel_context *ce)
 	atomic_inc(&ce->pin_count);
 }
 
-void intel_context_unpin(struct intel_context *ce);
+void __intel_context_do_unpin(struct intel_context *ce, int sub);
+
+static inline void intel_context_sched_disable_unpin(struct intel_context *ce)
+{
+	__intel_context_do_unpin(ce, 2);
+}
+
+static inline void intel_context_unpin(struct intel_context *ce)
+{
+	if (!ce->ops->sched_disable) {
+		__intel_context_do_unpin(ce, 1);
+	} else {
+		/*
+		 * Move ownership of this pin to the scheduling disable which is
+		 * an async operation. When that operation completes the above
+		 * intel_context_sched_disable_unpin is called potentially
+		 * unpinning the context.
+		 */
+		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
+			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
+				ce->ops->sched_disable(ce);
+				break;
+			}
+		}
+	}
+}
 
 void intel_context_enter_engine(struct intel_context *ce);
 void intel_context_exit_engine(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e0e3a937f709..4a5518d295c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -43,6 +43,8 @@ struct intel_context_ops {
 	void (*enter)(struct intel_context *ce);
 	void (*exit)(struct intel_context *ce);
 
+	void (*sched_disable)(struct intel_context *ce);
+
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
 };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 30773cd699f5..03b7222b04a2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -241,6 +241,8 @@ int intel_guc_reset_engine(struct intel_guc *guc,
 
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+				     const u32 *msg, u32 len);
 
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 28ff82c5be45..019b25ff1888 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -932,6 +932,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 		ret = intel_guc_deregister_done_process_msg(guc, payload,
 							    len);
 		break;
+	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
+		ret = intel_guc_sched_done_process_msg(guc, payload, len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a438aecfe93f..5519c988a6ca 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -70,6 +70,7 @@
  * possible for some of the bits to changing at the same time though.
  */
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
+#define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -87,6 +88,24 @@ static inline void clr_context_enabled(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline bool context_pending_enable(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_PENDING_ENABLE);
+}
+
+static inline void set_context_pending_enable(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_PENDING_ENABLE,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_pending_enable(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_PENDING_ENABLE,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -95,6 +114,7 @@ static inline void clr_context_enabled(struct intel_context *ce)
  */
 #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
 #define SCHED_STATE_DESTROYED				BIT(1)
+#define SCHED_STATE_PENDING_DISABLE			BIT(2)
 static inline void init_sched_state(struct intel_context *ce)
 {
 	/* Only should be called from guc_lrc_desc_pin() */
@@ -139,6 +159,24 @@ set_context_destroyed(struct intel_context *ce)
 	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline bool context_pending_disable(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE);
+}
+
+static inline void set_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_PENDING_DISABLE;
+}
+
+static inline void clr_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state =
+		(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
+}
+
 static inline bool context_guc_id_invalid(struct intel_context *ce)
 {
 	return (ce->guc_id == GUC_INVALID_LRC_ID);
@@ -231,6 +269,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
 		action[len++] = GUC_CONTEXT_ENABLE;
+		set_context_pending_enable(ce);
+		intel_context_get(ce);
 	} else {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
 		action[len++] = ce->guc_id;
@@ -238,8 +278,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len);
 
-	if (!enabled && !err)
+	if (!enabled && !err) {
 		set_context_enabled(ce);
+	} else if (!enabled) {
+		clr_context_pending_enable(ce);
+		intel_context_put(ce);
+	}
 
 	return err;
 }
@@ -833,6 +877,62 @@ static void guc_context_post_unpin(struct intel_context *ce)
 	lrc_post_unpin(ce);
 }
 
+static void __guc_context_sched_disable(struct intel_guc *guc,
+					struct intel_context *ce,
+					u16 guc_id)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
+		guc_id,	/* ce->guc_id not stable */
+		GUC_CONTEXT_DISABLE
+	};
+
+	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+
+	intel_context_get(ce);
+
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static u16 prep_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+
+	set_context_pending_disable(ce);
+	clr_context_enabled(ce);
+
+	return ce->guc_id;
+}
+
+static void guc_context_sched_disable(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	unsigned long flags;
+	u16 guc_id;
+	intel_wakeref_t wakeref;
+
+	if (context_guc_id_invalid(ce) ||
+	    !lrc_desc_registered(guc, ce->guc_id)) {
+		clr_context_enabled(ce);
+		goto unpin;
+	}
+
+	if (!context_enabled(ce))
+		goto unpin;
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	guc_id = prep_context_pending_disable(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_disable(guc, ce, guc_id);
+
+	return;
+unpin:
+	intel_context_sched_disable_unpin(ce);
+}
+
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
 	struct intel_engine_cs *engine = ce->engine;
@@ -841,6 +941,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
+	GEM_BUG_ON(context_enabled(ce));
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	set_context_destroyed(ce);
@@ -922,6 +1023,8 @@ static const struct intel_context_ops guc_context_ops = {
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
+	.sched_disable = guc_context_sched_disable,
+
 	.reset = lrc_reset,
 	.destroy = guc_context_destroy,
 };
@@ -1354,3 +1457,45 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 
 	return 0;
 }
+
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+				     const u32 *msg,
+				     u32 len)
+{
+	struct intel_context *ce;
+	unsigned long flags;
+	u32 desc_idx = msg[0];
+
+	if (unlikely(len < 2)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	if (unlikely(context_destroyed(ce) ||
+		     (!context_pending_enable(ce) &&
+		     !context_pending_disable(ce)))) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Bad context sched_state 0x%x, 0x%x, desc_idx %u",
+			atomic_read(&ce->guc_sched_state_no_lock),
+			ce->guc_state.sched_state, desc_idx);
+		return -EPROTO;
+	}
+
+	if (context_pending_enable(ce)) {
+		clr_context_pending_enable(ce);
+	} else if (context_pending_disable(ce)) {
+		intel_context_sched_disable_unpin(ce);
+
+		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		clr_context_pending_disable(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	}
+
+	intel_context_put(ce);
+
+	return 0;
+}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 08/51] drm/i915/guc: Defer context unpin until scheduling is disabled
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

With GuC scheduling, it isn't safe to unpin a context while scheduling
is enabled for that context as the GuC may touch some of the pinned
state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is
done, a call back is added to intel_context_unpin when pin count == 1
to disable scheduling for that context. When the response CTB is
received it is safe to do the final unpin.

Future patches may add a heuristic / delay to schedule the disable
call back to avoid thrashing on schedule enable / disable.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   4 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |  27 +++-
 drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   3 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 147 +++++++++++++++++-
 6 files changed, 181 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index ad7197c5910f..3d5b4116617f 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -306,9 +306,9 @@ int __intel_context_do_pin(struct intel_context *ce)
 	return err;
 }
 
-void intel_context_unpin(struct intel_context *ce)
+void __intel_context_do_unpin(struct intel_context *ce, int sub)
 {
-	if (!atomic_dec_and_test(&ce->pin_count))
+	if (!atomic_sub_and_test(sub, &ce->pin_count))
 		return;
 
 	CE_TRACE(ce, "unpin\n");
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index b10cbe8fee99..974ef85320c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -113,7 +113,32 @@ static inline void __intel_context_pin(struct intel_context *ce)
 	atomic_inc(&ce->pin_count);
 }
 
-void intel_context_unpin(struct intel_context *ce);
+void __intel_context_do_unpin(struct intel_context *ce, int sub);
+
+static inline void intel_context_sched_disable_unpin(struct intel_context *ce)
+{
+	__intel_context_do_unpin(ce, 2);
+}
+
+static inline void intel_context_unpin(struct intel_context *ce)
+{
+	if (!ce->ops->sched_disable) {
+		__intel_context_do_unpin(ce, 1);
+	} else {
+		/*
+		 * Move ownership of this pin to the scheduling disable which is
+		 * an async operation. When that operation completes the above
+		 * intel_context_sched_disable_unpin is called potentially
+		 * unpinning the context.
+		 */
+		while (!atomic_add_unless(&ce->pin_count, -1, 1)) {
+			if (atomic_cmpxchg(&ce->pin_count, 1, 2) == 1) {
+				ce->ops->sched_disable(ce);
+				break;
+			}
+		}
+	}
+}
 
 void intel_context_enter_engine(struct intel_context *ce);
 void intel_context_exit_engine(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index e0e3a937f709..4a5518d295c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -43,6 +43,8 @@ struct intel_context_ops {
 	void (*enter)(struct intel_context *ce);
 	void (*exit)(struct intel_context *ce);
 
+	void (*sched_disable)(struct intel_context *ce);
+
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
 };
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 30773cd699f5..03b7222b04a2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -241,6 +241,8 @@ int intel_guc_reset_engine(struct intel_guc *guc,
 
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+				     const u32 *msg, u32 len);
 
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 28ff82c5be45..019b25ff1888 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -932,6 +932,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 		ret = intel_guc_deregister_done_process_msg(guc, payload,
 							    len);
 		break;
+	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
+		ret = intel_guc_sched_done_process_msg(guc, payload, len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a438aecfe93f..5519c988a6ca 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -70,6 +70,7 @@
  * possible for some of the bits to changing at the same time though.
  */
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
+#define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -87,6 +88,24 @@ static inline void clr_context_enabled(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline bool context_pending_enable(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_PENDING_ENABLE);
+}
+
+static inline void set_context_pending_enable(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_PENDING_ENABLE,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_pending_enable(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_PENDING_ENABLE,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -95,6 +114,7 @@ static inline void clr_context_enabled(struct intel_context *ce)
  */
 #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
 #define SCHED_STATE_DESTROYED				BIT(1)
+#define SCHED_STATE_PENDING_DISABLE			BIT(2)
 static inline void init_sched_state(struct intel_context *ce)
 {
 	/* Only should be called from guc_lrc_desc_pin() */
@@ -139,6 +159,24 @@ set_context_destroyed(struct intel_context *ce)
 	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
 }
 
+static inline bool context_pending_disable(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_PENDING_DISABLE);
+}
+
+static inline void set_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_PENDING_DISABLE;
+}
+
+static inline void clr_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state =
+		(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
+}
+
 static inline bool context_guc_id_invalid(struct intel_context *ce)
 {
 	return (ce->guc_id == GUC_INVALID_LRC_ID);
@@ -231,6 +269,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
 		action[len++] = GUC_CONTEXT_ENABLE;
+		set_context_pending_enable(ce);
+		intel_context_get(ce);
 	} else {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
 		action[len++] = ce->guc_id;
@@ -238,8 +278,12 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len);
 
-	if (!enabled && !err)
+	if (!enabled && !err) {
 		set_context_enabled(ce);
+	} else if (!enabled) {
+		clr_context_pending_enable(ce);
+		intel_context_put(ce);
+	}
 
 	return err;
 }
@@ -833,6 +877,62 @@ static void guc_context_post_unpin(struct intel_context *ce)
 	lrc_post_unpin(ce);
 }
 
+static void __guc_context_sched_disable(struct intel_guc *guc,
+					struct intel_context *ce,
+					u16 guc_id)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
+		guc_id,	/* ce->guc_id not stable */
+		GUC_CONTEXT_DISABLE
+	};
+
+	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
+
+	intel_context_get(ce);
+
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+}
+
+static u16 prep_context_pending_disable(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+
+	set_context_pending_disable(ce);
+	clr_context_enabled(ce);
+
+	return ce->guc_id;
+}
+
+static void guc_context_sched_disable(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	unsigned long flags;
+	u16 guc_id;
+	intel_wakeref_t wakeref;
+
+	if (context_guc_id_invalid(ce) ||
+	    !lrc_desc_registered(guc, ce->guc_id)) {
+		clr_context_enabled(ce);
+		goto unpin;
+	}
+
+	if (!context_enabled(ce))
+		goto unpin;
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	guc_id = prep_context_pending_disable(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_disable(guc, ce, guc_id);
+
+	return;
+unpin:
+	intel_context_sched_disable_unpin(ce);
+}
+
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
 	struct intel_engine_cs *engine = ce->engine;
@@ -841,6 +941,7 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
+	GEM_BUG_ON(context_enabled(ce));
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	set_context_destroyed(ce);
@@ -922,6 +1023,8 @@ static const struct intel_context_ops guc_context_ops = {
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
+	.sched_disable = guc_context_sched_disable,
+
 	.reset = lrc_reset,
 	.destroy = guc_context_destroy,
 };
@@ -1354,3 +1457,45 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 
 	return 0;
 }
+
+int intel_guc_sched_done_process_msg(struct intel_guc *guc,
+				     const u32 *msg,
+				     u32 len)
+{
+	struct intel_context *ce;
+	unsigned long flags;
+	u32 desc_idx = msg[0];
+
+	if (unlikely(len < 2)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	if (unlikely(context_destroyed(ce) ||
+		     (!context_pending_enable(ce) &&
+		     !context_pending_disable(ce)))) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Bad context sched_state 0x%x, 0x%x, desc_idx %u",
+			atomic_read(&ce->guc_sched_state_no_lock),
+			ce->guc_state.sched_state, desc_idx);
+		return -EPROTO;
+	}
+
+	if (context_pending_enable(ce)) {
+		clr_context_pending_enable(ce);
+	} else if (context_pending_disable(ce)) {
+		intel_context_sched_disable_unpin(ce);
+
+		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		clr_context_pending_disable(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	}
+
+	intel_context_put(ce);
+
+	return 0;
+}
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 09/51] drm/i915/guc: Disable engine barriers with GuC during unpin
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Disable engine barriers for unpinning with GuC. This feature isn't
needed with the GuC as it disables context scheduling before unpinning
which guarantees the HW will not reference the context. Hence it is
not necessary to defer unpinning until a kernel context request
completes on each engine in the context engine mask.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c    |  2 +-
 drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 3d5b4116617f..91349d071e0e 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
 
 	__i915_active_acquire(&ce->active);
 
-	if (intel_context_is_barrier(ce))
+	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
 		return 0;
 
 	/* Preallocate tracking nodes */
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
index 26685b927169..fa7b99a671dd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine)
 	 * This test makes sure that the context is kept alive until a
 	 * subsequent idle-barrier (emitted when the engine wakeref hits 0
 	 * with no more outstanding requests).
+	 *
+	 * In GuC submission mode we don't use idle barriers and we instead
+	 * get a message from the GuC to signal that it is safe to unpin the
+	 * context from memory.
 	 */
+	if (intel_engine_uses_guc(engine))
+		return 0;
 
 	if (intel_engine_pm_is_awake(engine)) {
 		pr_err("%s is awake before starting %s!\n",
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine)
 	 * on the context image remotely (intel_context_prepare_remote_request),
 	 * which inserts foreign fences into intel_context.active, does not
 	 * clobber the idle-barrier.
+	 *
+	 * In GuC submission mode we don't use idle barriers.
 	 */
+	if (intel_engine_uses_guc(engine))
+		return 0;
 
 	if (intel_engine_pm_is_awake(engine)) {
 		pr_err("%s is awake before starting %s!\n",
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 09/51] drm/i915/guc: Disable engine barriers with GuC during unpin
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Disable engine barriers for unpinning with GuC. This feature isn't
needed with the GuC as it disables context scheduling before unpinning
which guarantees the HW will not reference the context. Hence it is
not necessary to defer unpinning until a kernel context request
completes on each engine in the context engine mask.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c    |  2 +-
 drivers/gpu/drm/i915/gt/selftest_context.c | 10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 3d5b4116617f..91349d071e0e 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -80,7 +80,7 @@ static int intel_context_active_acquire(struct intel_context *ce)
 
 	__i915_active_acquire(&ce->active);
 
-	if (intel_context_is_barrier(ce))
+	if (intel_context_is_barrier(ce) || intel_engine_uses_guc(ce->engine))
 		return 0;
 
 	/* Preallocate tracking nodes */
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
index 26685b927169..fa7b99a671dd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -209,7 +209,13 @@ static int __live_active_context(struct intel_engine_cs *engine)
 	 * This test makes sure that the context is kept alive until a
 	 * subsequent idle-barrier (emitted when the engine wakeref hits 0
 	 * with no more outstanding requests).
+	 *
+	 * In GuC submission mode we don't use idle barriers and we instead
+	 * get a message from the GuC to signal that it is safe to unpin the
+	 * context from memory.
 	 */
+	if (intel_engine_uses_guc(engine))
+		return 0;
 
 	if (intel_engine_pm_is_awake(engine)) {
 		pr_err("%s is awake before starting %s!\n",
@@ -357,7 +363,11 @@ static int __live_remote_context(struct intel_engine_cs *engine)
 	 * on the context image remotely (intel_context_prepare_remote_request),
 	 * which inserts foreign fences into intel_context.active, does not
 	 * clobber the idle-barrier.
+	 *
+	 * In GuC submission mode we don't use idle barriers.
 	 */
+	if (intel_engine_uses_guc(engine))
+		return 0;
 
 	if (intel_engine_pm_is_awake(engine)) {
 		pr_err("%s is awake before starting %s!\n",
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 10/51] drm/i915/guc: Extend deregistration fence to schedule disable
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Extend the deregistration context fence to fence whne a GuC context has
scheduling disable pending.

v2:
 (John H)
  - Update comment why we check the pin count within spin lock

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 40 +++++++++++++++----
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 5519c988a6ca..9dc1a256e185 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -922,7 +922,22 @@ static void guc_context_sched_disable(struct intel_context *ce)
 		goto unpin;
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
+
+	/*
+	 * We have to check if the context has been pinned again as another pin
+	 * operation is allowed to pass this function. Checking the pin count,
+	 * within ce->guc_state.lock, synchronizes this function with
+	 * guc_request_alloc ensuring a request doesn't slip through the
+	 * 'context_pending_disable' fence. Checking within the spin lock (can't
+	 * sleep) ensures another process doesn't pin this context and generate
+	 * a request before we set the 'context_pending_disable' flag here.
+	 */
+	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		return;
+	}
 	guc_id = prep_context_pending_disable(ce);
+
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 	with_intel_runtime_pm(runtime_pm, wakeref)
@@ -1127,19 +1142,22 @@ static int guc_request_alloc(struct i915_request *rq)
 out:
 	/*
 	 * We block all requests on this context if a G2H is pending for a
-	 * context deregistration as the GuC will fail a context registration
-	 * while this G2H is pending. Once a G2H returns, the fence is released
-	 * that is blocking these requests (see guc_signal_context_fence).
+	 * schedule disable or context deregistration as the GuC will fail a
+	 * schedule enable or context registration if either G2H is pending
+	 * respectfully. Once a G2H returns, the fence is released that is
+	 * blocking these requests (see guc_signal_context_fence).
 	 *
-	 * We can safely check the below field outside of the lock as it isn't
-	 * possible for this field to transition from being clear to set but
+	 * We can safely check the below fields outside of the lock as it isn't
+	 * possible for these fields to transition from being clear to set but
 	 * converse is possible, hence the need for the check within the lock.
 	 */
-	if (likely(!context_wait_for_deregister_to_register(ce)))
+	if (likely(!context_wait_for_deregister_to_register(ce) &&
+		   !context_pending_disable(ce)))
 		return 0;
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
-	if (context_wait_for_deregister_to_register(ce)) {
+	if (context_wait_for_deregister_to_register(ce) ||
+	    context_pending_disable(ce)) {
 		i915_sw_fence_await(&rq->submit);
 
 		list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
@@ -1488,10 +1506,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
+		/*
+		 * Unpin must be done before __guc_signal_context_fence,
+		 * otherwise a race exists between the requests getting
+		 * submitted + retired before this unpin completes resulting in
+		 * the pin_count going to zero and the context still being
+		 * enabled.
+		 */
 		intel_context_sched_disable_unpin(ce);
 
 		spin_lock_irqsave(&ce->guc_state.lock, flags);
 		clr_context_pending_disable(ce);
+		__guc_signal_context_fence(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	}
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 10/51] drm/i915/guc: Extend deregistration fence to schedule disable
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Extend the deregistration context fence to fence whne a GuC context has
scheduling disable pending.

v2:
 (John H)
  - Update comment why we check the pin count within spin lock

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 40 +++++++++++++++----
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 5519c988a6ca..9dc1a256e185 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -922,7 +922,22 @@ static void guc_context_sched_disable(struct intel_context *ce)
 		goto unpin;
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
+
+	/*
+	 * We have to check if the context has been pinned again as another pin
+	 * operation is allowed to pass this function. Checking the pin count,
+	 * within ce->guc_state.lock, synchronizes this function with
+	 * guc_request_alloc ensuring a request doesn't slip through the
+	 * 'context_pending_disable' fence. Checking within the spin lock (can't
+	 * sleep) ensures another process doesn't pin this context and generate
+	 * a request before we set the 'context_pending_disable' flag here.
+	 */
+	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		return;
+	}
 	guc_id = prep_context_pending_disable(ce);
+
 	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 	with_intel_runtime_pm(runtime_pm, wakeref)
@@ -1127,19 +1142,22 @@ static int guc_request_alloc(struct i915_request *rq)
 out:
 	/*
 	 * We block all requests on this context if a G2H is pending for a
-	 * context deregistration as the GuC will fail a context registration
-	 * while this G2H is pending. Once a G2H returns, the fence is released
-	 * that is blocking these requests (see guc_signal_context_fence).
+	 * schedule disable or context deregistration as the GuC will fail a
+	 * schedule enable or context registration if either G2H is pending
+	 * respectfully. Once a G2H returns, the fence is released that is
+	 * blocking these requests (see guc_signal_context_fence).
 	 *
-	 * We can safely check the below field outside of the lock as it isn't
-	 * possible for this field to transition from being clear to set but
+	 * We can safely check the below fields outside of the lock as it isn't
+	 * possible for these fields to transition from being clear to set but
 	 * converse is possible, hence the need for the check within the lock.
 	 */
-	if (likely(!context_wait_for_deregister_to_register(ce)))
+	if (likely(!context_wait_for_deregister_to_register(ce) &&
+		   !context_pending_disable(ce)))
 		return 0;
 
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
-	if (context_wait_for_deregister_to_register(ce)) {
+	if (context_wait_for_deregister_to_register(ce) ||
+	    context_pending_disable(ce)) {
 		i915_sw_fence_await(&rq->submit);
 
 		list_add_tail(&rq->guc_fence_link, &ce->guc_state.fences);
@@ -1488,10 +1506,18 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
+		/*
+		 * Unpin must be done before __guc_signal_context_fence,
+		 * otherwise a race exists between the requests getting
+		 * submitted + retired before this unpin completes resulting in
+		 * the pin_count going to zero and the context still being
+		 * enabled.
+		 */
 		intel_context_sched_disable_unpin(ce);
 
 		spin_lock_irqsave(&ce->guc_state.lock, flags);
 		clr_context_pending_disable(ce);
+		__guc_signal_context_fence(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	}
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 11/51] drm/i915: Disable preempt busywait when using GuC scheduling
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Disable preempt busywait when using GuC scheduling. This isn't needed as
the GuC controls preemption when scheduling.

v2:
 (John H):
  - Fix commit message

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index 87b06572fd2e..f7aae502ec3d 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -506,7 +506,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
 	*cs++ = MI_USER_INTERRUPT;
 
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	if (intel_engine_has_semaphores(rq->engine))
+	if (intel_engine_has_semaphores(rq->engine) &&
+	    !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
 		cs = emit_preempt_busywait(rq, cs);
 
 	rq->tail = intel_ring_offset(rq, cs);
@@ -598,7 +599,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
 	*cs++ = MI_USER_INTERRUPT;
 
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	if (intel_engine_has_semaphores(rq->engine))
+	if (intel_engine_has_semaphores(rq->engine) &&
+	    !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
 		cs = gen12_emit_preempt_busywait(rq, cs);
 
 	rq->tail = intel_ring_offset(rq, cs);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 11/51] drm/i915: Disable preempt busywait when using GuC scheduling
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Disable preempt busywait when using GuC scheduling. This isn't needed as
the GuC controls preemption when scheduling.

v2:
 (John H):
  - Fix commit message

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
index 87b06572fd2e..f7aae502ec3d 100644
--- a/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/gen8_engine_cs.c
@@ -506,7 +506,8 @@ gen8_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
 	*cs++ = MI_USER_INTERRUPT;
 
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	if (intel_engine_has_semaphores(rq->engine))
+	if (intel_engine_has_semaphores(rq->engine) &&
+	    !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
 		cs = emit_preempt_busywait(rq, cs);
 
 	rq->tail = intel_ring_offset(rq, cs);
@@ -598,7 +599,8 @@ gen12_emit_fini_breadcrumb_tail(struct i915_request *rq, u32 *cs)
 	*cs++ = MI_USER_INTERRUPT;
 
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
-	if (intel_engine_has_semaphores(rq->engine))
+	if (intel_engine_has_semaphores(rq->engine) &&
+	    !intel_uc_uses_guc_submission(&rq->engine->gt->uc))
 		cs = gen12_emit_preempt_busywait(rq, cs);
 
 	rq->tail = intel_ring_offset(rq, cs);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 12/51] drm/i915/guc: Ensure request ordering via completion fences
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

If two requests are on the same ring, they are explicitly ordered by the
HW. So, a submission fence is sufficient to ensure ordering when using
the new GuC submission interface. Conversely, if two requests share a
timeline and are on the same physical engine but different context this
doesn't ensure ordering on the new GuC submission interface. So, a
completion fence needs to be used to ensure ordering.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  1 -
 drivers/gpu/drm/i915/i915_request.c               | 12 ++++++++++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9dc1a256e185..4443cc6f5320 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -933,7 +933,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	 * a request before we set the 'context_pending_disable' flag here.
 	 */
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
-		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
 	}
 	guc_id = prep_context_pending_disable(ce);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index b48c4905d3fc..2b2b63cba06c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -432,6 +432,7 @@ void i915_request_retire_upto(struct i915_request *rq)
 
 	do {
 		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
+		GEM_BUG_ON(!i915_request_completed(tmp));
 	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
@@ -1380,6 +1381,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
 	return err;
 }
 
+static int
+i915_request_await_request(struct i915_request *to, struct i915_request *from);
+
 int
 i915_request_await_execution(struct i915_request *rq,
 			     struct dma_fence *fence)
@@ -1465,7 +1469,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 			return ret;
 	}
 
-	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
+	if (!intel_engine_uses_guc(to->engine) &&
+	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
 		ret = await_request_submit(to, from);
 	else
 		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
@@ -1626,6 +1631,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 	prev = to_request(__i915_active_fence_set(&timeline->last_request,
 						  &rq->fence));
 	if (prev && !__i915_request_is_complete(prev)) {
+		bool uses_guc = intel_engine_uses_guc(rq->engine);
+
 		/*
 		 * The requests are supposed to be kept in order. However,
 		 * we need to be wary in case the timeline->last_request
@@ -1636,7 +1643,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 			   i915_seqno_passed(prev->fence.seqno,
 					     rq->fence.seqno));
 
-		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
+		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
+		    (uses_guc && prev->context == rq->context))
 			i915_sw_fence_await_sw_fence(&rq->submit,
 						     &prev->submit,
 						     &rq->submitq);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 12/51] drm/i915/guc: Ensure request ordering via completion fences
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

If two requests are on the same ring, they are explicitly ordered by the
HW. So, a submission fence is sufficient to ensure ordering when using
the new GuC submission interface. Conversely, if two requests share a
timeline and are on the same physical engine but different context this
doesn't ensure ordering on the new GuC submission interface. So, a
completion fence needs to be used to ensure ordering.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  1 -
 drivers/gpu/drm/i915/i915_request.c               | 12 ++++++++++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9dc1a256e185..4443cc6f5320 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -933,7 +933,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	 * a request before we set the 'context_pending_disable' flag here.
 	 */
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
-		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
 	}
 	guc_id = prep_context_pending_disable(ce);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index b48c4905d3fc..2b2b63cba06c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -432,6 +432,7 @@ void i915_request_retire_upto(struct i915_request *rq)
 
 	do {
 		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
+		GEM_BUG_ON(!i915_request_completed(tmp));
 	} while (i915_request_retire(tmp) && tmp != rq);
 }
 
@@ -1380,6 +1381,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
 	return err;
 }
 
+static int
+i915_request_await_request(struct i915_request *to, struct i915_request *from);
+
 int
 i915_request_await_execution(struct i915_request *rq,
 			     struct dma_fence *fence)
@@ -1465,7 +1469,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 			return ret;
 	}
 
-	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
+	if (!intel_engine_uses_guc(to->engine) &&
+	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
 		ret = await_request_submit(to, from);
 	else
 		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
@@ -1626,6 +1631,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 	prev = to_request(__i915_active_fence_set(&timeline->last_request,
 						  &rq->fence));
 	if (prev && !__i915_request_is_complete(prev)) {
+		bool uses_guc = intel_engine_uses_guc(rq->engine);
+
 		/*
 		 * The requests are supposed to be kept in order. However,
 		 * we need to be wary in case the timeline->last_request
@@ -1636,7 +1643,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 			   i915_seqno_passed(prev->fence.seqno,
 					     rq->fence.seqno));
 
-		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
+		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
+		    (uses_guc && prev->context == rq->context))
 			i915_sw_fence_await_sw_fence(&rq->submit,
 						     &prev->submit,
 						     &rq->submitq);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 13/51] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Semaphores are an optimization and not required for basic GuC submission
to work properly. Disable until we have time to do the implementation to
enable semaphores and tune them for performance. Also long direction is
just to delete semaphores from the i915 so another reason to not enable
these for GuC submission.

This patch fixes an existing bug where I915_ENGINE_HAS_SEMAPHORES was
not honored correctly.

v2: Reword commit message
v3:
 (John H)
  - Add text to commit indicating this also fixing an existing bug

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 7d6f52d8a801..64659802d4df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -799,7 +799,8 @@ static int intel_context_set_gem(struct intel_context *ce,
 	}
 
 	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
-	    intel_engine_has_timeslices(ce->engine))
+	    intel_engine_has_timeslices(ce->engine) &&
+	    intel_engine_has_semaphores(ce->engine))
 		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
 
 	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
@@ -1778,7 +1779,8 @@ static void __apply_priority(struct intel_context *ce, void *arg)
 	if (!intel_engine_has_timeslices(ce->engine))
 		return;
 
-	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
+	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
+	    intel_engine_has_semaphores(ce->engine))
 		intel_context_set_use_semaphores(ce);
 	else
 		intel_context_clear_use_semaphores(ce);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 13/51] drm/i915/guc: Disable semaphores when using GuC scheduling
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Semaphores are an optimization and not required for basic GuC submission
to work properly. Disable until we have time to do the implementation to
enable semaphores and tune them for performance. Also long direction is
just to delete semaphores from the i915 so another reason to not enable
these for GuC submission.

This patch fixes an existing bug where I915_ENGINE_HAS_SEMAPHORES was
not honored correctly.

v2: Reword commit message
v3:
 (John H)
  - Add text to commit indicating this also fixing an existing bug

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 7d6f52d8a801..64659802d4df 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -799,7 +799,8 @@ static int intel_context_set_gem(struct intel_context *ce,
 	}
 
 	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
-	    intel_engine_has_timeslices(ce->engine))
+	    intel_engine_has_timeslices(ce->engine) &&
+	    intel_engine_has_semaphores(ce->engine))
 		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
 
 	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
@@ -1778,7 +1779,8 @@ static void __apply_priority(struct intel_context *ce, void *arg)
 	if (!intel_engine_has_timeslices(ce->engine))
 		return;
 
-	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
+	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
+	    intel_engine_has_semaphores(ce->engine))
 		intel_context_set_use_semaphores(ce);
 	else
 		intel_context_clear_use_semaphores(ce);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 14/51] drm/i915/guc: Ensure G2H response has space in buffer
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Ensure G2H response has space in the buffer before sending H2G CTB as
the GuC can't handle any backpressure on the G2H interface.

v2:
 (Matthew)
  - s/INTEL_GUC_SEND/INTEL_GUC_CT_SEND
v3:
 (Matthew)
  - Add G2H credit accounting to blocking path, add g2h_release_space
    helper
 (John H)
  - CTB_G2H_BUFFER_SIZE / 4 == G2H_ROOM_BUFFER_SIZE

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  8 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 91 +++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  9 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++-
 5 files changed, 99 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 03b7222b04a2..80b88bae5f24 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -96,10 +96,11 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
 }
 
 static
-inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len,
+			     u32 g2h_len_dw)
 {
 	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
-				 INTEL_GUC_CT_SEND_NB);
+				 MAKE_SEND_FLAGS(g2h_len_dw));
 }
 
 static inline int
@@ -113,6 +114,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
 					   const u32 *action,
 					   u32 len,
+					   u32 g2h_len_dw,
 					   bool loop)
 {
 	int err;
@@ -123,7 +125,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
 	might_sleep_if(loop && not_atomic);
 
 retry:
-	err = intel_guc_send_nb(guc, action, len);
+	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (unlikely(err == -EBUSY && loop)) {
 		if (likely(not_atomic)) {
 			if (msleep_interruptible(sleep_period_ms))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 019b25ff1888..c33906ec478d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
 #define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
+#define G2H_ROOM_BUFFER_SIZE	(CTB_G2H_BUFFER_SIZE / 4)
 
 struct ct_request {
 	struct list_head link;
@@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
+	u32 space;
+
 	ctb->broken = false;
 	ctb->tail = 0;
 	ctb->head = 0;
-	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+	space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space;
+	atomic_set(&ctb->space, space);
 
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
 static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
 			       struct guc_ct_buffer_desc *desc,
-			       u32 *cmds, u32 size_in_bytes)
+			       u32 *cmds, u32 size_in_bytes, u32 resv_space)
 {
 	GEM_BUG_ON(size_in_bytes % 4);
 
 	ctb->desc = desc;
 	ctb->cmds = cmds;
 	ctb->size = size_in_bytes / 4;
+	ctb->resv_space = resv_space / 4;
 
 	guc_ct_buffer_reset(ctb);
 }
@@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	struct guc_ct_buffer_desc *desc;
 	u32 blob_size;
 	u32 cmds_size;
+	u32 resv_space;
 	void *blob;
 	u32 *cmds;
 	int err;
@@ -250,19 +256,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	desc = blob;
 	cmds = blob + 2 * CTB_DESC_SIZE;
 	cmds_size = CTB_H2G_BUFFER_SIZE;
-	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "send",
-		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
+	resv_space = 0;
+	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "send",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
+		 resv_space);
 
-	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
+	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size, resv_space);
 
 	/* store pointers to desc and cmds for recv ctb */
 	desc = blob + CTB_DESC_SIZE;
 	cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE;
 	cmds_size = CTB_G2H_BUFFER_SIZE;
-	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "recv",
-		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
+	resv_space = G2H_ROOM_BUFFER_SIZE;
+	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "recv",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
+		 resv_space);
 
-	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size);
+	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size, resv_space);
 
 	return 0;
 }
@@ -461,8 +471,8 @@ static int ct_write(struct intel_guc_ct *ct,
 
 	/* update local copies */
 	ctb->tail = tail;
-	GEM_BUG_ON(ctb->space < len + GUC_CTB_HDR_LEN);
-	ctb->space -= len + GUC_CTB_HDR_LEN;
+	GEM_BUG_ON(atomic_read(&ctb->space) < len + GUC_CTB_HDR_LEN);
+	atomic_sub(len + GUC_CTB_HDR_LEN, &ctb->space);
 
 	/* now update descriptor */
 	WRITE_ONCE(desc->tail, tail);
@@ -537,6 +547,32 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
+static inline bool g2h_has_room(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
+
+	/*
+	 * We leave a certain amount of space in the G2H CTB buffer for
+	 * unexpected G2H CTBs (e.g. logging, engine hang, etc...)
+	 */
+	return !g2h_len_dw || atomic_read(&ctb->space) >= g2h_len_dw;
+}
+
+static inline void g2h_reserve_space(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	lockdep_assert_held(&ct->ctbs.send.lock);
+
+	GEM_BUG_ON(!g2h_has_room(ct, g2h_len_dw));
+
+	if (g2h_len_dw)
+		atomic_sub(g2h_len_dw, &ct->ctbs.recv.space);
+}
+
+static inline void g2h_release_space(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	atomic_add(g2h_len_dw, &ct->ctbs.recv.space);
+}
+
 static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
@@ -544,7 +580,7 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 	u32 head;
 	u32 space;
 
-	if (ctb->space >= len_dw)
+	if (atomic_read(&ctb->space) >= len_dw)
 		return true;
 
 	head = READ_ONCE(desc->head);
@@ -557,16 +593,16 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 	}
 
 	space = CIRC_SPACE(ctb->tail, head, ctb->size);
-	ctb->space = space;
+	atomic_set(&ctb->space, space);
 
 	return space >= len_dw;
 }
 
-static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
+static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw)
 {
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ct, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -580,6 +616,9 @@ static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 	return 0;
 }
 
+#define G2H_LEN_DW(f) \
+	FIELD_GET(INTEL_GUC_CT_SEND_G2H_DW_MASK, f) ? \
+	FIELD_GET(INTEL_GUC_CT_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0
 static int ct_send_nb(struct intel_guc_ct *ct,
 		      const u32 *action,
 		      u32 len,
@@ -587,12 +626,13 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	unsigned long spin_flags;
+	u32 g2h_len_dw = G2H_LEN_DW(flags);
 	u32 fence;
 	int ret;
 
 	spin_lock_irqsave(&ctb->lock, spin_flags);
 
-	ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN);
+	ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN, g2h_len_dw);
 	if (unlikely(ret))
 		goto out;
 
@@ -601,6 +641,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 	if (unlikely(ret))
 		goto out;
 
+	g2h_reserve_space(ct, g2h_len_dw);
 	intel_guc_notify(ct_to_guc(ct));
 
 out:
@@ -632,11 +673,13 @@ static int ct_send(struct intel_guc_ct *ct,
 	/*
 	 * We use a lazy spin wait loop here as we believe that if the CT
 	 * buffers are sized correctly the flow control condition should be
-	 * rare.
+	 * rare. Reserving the maximum size in the G2H credits as we don't know
+	 * how big the response is going to be.
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN) ||
+		     !g2h_has_room(ct, GUC_CTB_HXG_MSG_MAX_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -664,6 +707,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	spin_unlock(&ct->requests.lock);
 
 	err = ct_write(ct, action, len, fence, 0);
+	g2h_reserve_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
 
 	spin_unlock_irqrestore(&ctb->lock, flags);
 
@@ -673,6 +717,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	intel_guc_notify(ct_to_guc(ct));
 
 	err = wait_for_ct_request_update(&request, status);
+	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
 	if (unlikely(err))
 		goto unlink;
 
@@ -992,10 +1037,22 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
 static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
 {
 	const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN];
+	u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]);
 	unsigned long flags;
 
 	GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT);
 
+	/*
+	 * Adjusting the space must be done in IRQ or deadlock can occur as the
+	 * CTB processing in the below workqueue can send CTBs which creates a
+	 * circular dependency if the space was returned there.
+	 */
+	switch (action) {
+	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
+	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
+		g2h_release_space(ct, request->size);
+	}
+
 	spin_lock_irqsave(&ct->requests.lock, flags);
 	list_add_tail(&request->link, &ct->requests.incoming);
 	spin_unlock_irqrestore(&ct->requests.lock, flags);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index edd1bba0445d..785dfc5c6efb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,7 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @resv_space: reserved space in buffer in dwords
  * @head: local shadow copy of head in dwords
  * @tail: local shadow copy of tail in dwords
  * @space: local shadow copy of space in dwords
@@ -43,9 +44,10 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 resv_space;
 	u32 tail;
 	u32 head;
-	u32 space;
+	atomic_t space;
 	bool broken;
 };
 
@@ -97,6 +99,11 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
 }
 
 #define INTEL_GUC_CT_SEND_NB		BIT(31)
+#define INTEL_GUC_CT_SEND_G2H_DW_SHIFT	0
+#define INTEL_GUC_CT_SEND_G2H_DW_MASK	(0xff << INTEL_GUC_CT_SEND_G2H_DW_SHIFT)
+#define MAKE_SEND_FLAGS(len) \
+	({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_CT_SEND_G2H_DW_MASK, len)); \
+	(FIELD_PREP(INTEL_GUC_CT_SEND_G2H_DW_MASK, len) | INTEL_GUC_CT_SEND_NB);})
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 4e4edc368b77..94bb1ca6f889 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -17,6 +17,10 @@
 #include "abi/guc_communication_ctb_abi.h"
 #include "abi/guc_messages_abi.h"
 
+/* Payload length only i.e. don't include G2H header length */
+#define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET	2
+#define G2H_LEN_DW_DEREGISTER_CONTEXT		1
+
 #define GUC_CONTEXT_DISABLE		0
 #define GUC_CONTEXT_ENABLE		1
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 4443cc6f5320..f7e34baa9506 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -260,6 +260,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	struct intel_context *ce = rq->context;
 	u32 action[3];
 	int len = 0;
+	u32 g2h_len_dw = 0;
 	bool enabled = context_enabled(ce);
 
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
@@ -271,13 +272,13 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		action[len++] = GUC_CONTEXT_ENABLE;
 		set_context_pending_enable(ce);
 		intel_context_get(ce);
+		g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
 	} else {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
 		action[len++] = ce->guc_id;
 	}
 
-	err = intel_guc_send_nb(guc, action, len);
-
+	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
 		set_context_enabled(ce);
 	} else if (!enabled) {
@@ -730,7 +731,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 static int register_context(struct intel_context *ce)
@@ -750,7 +751,8 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 		guc_id,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
 }
 
 static int deregister_context(struct intel_context *ce, u32 guc_id)
@@ -891,7 +893,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	intel_context_get(ce);
 
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
 static u16 prep_context_pending_disable(struct intel_context *ce)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 14/51] drm/i915/guc: Ensure G2H response has space in buffer
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Ensure G2H response has space in the buffer before sending H2G CTB as
the GuC can't handle any backpressure on the G2H interface.

v2:
 (Matthew)
  - s/INTEL_GUC_SEND/INTEL_GUC_CT_SEND
v3:
 (Matthew)
  - Add G2H credit accounting to blocking path, add g2h_release_space
    helper
 (John H)
  - CTB_G2H_BUFFER_SIZE / 4 == G2H_ROOM_BUFFER_SIZE

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  8 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 91 +++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  9 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 13 ++-
 5 files changed, 99 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 03b7222b04a2..80b88bae5f24 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -96,10 +96,11 @@ inline int intel_guc_send(struct intel_guc *guc, const u32 *action, u32 len)
 }
 
 static
-inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len)
+inline int intel_guc_send_nb(struct intel_guc *guc, const u32 *action, u32 len,
+			     u32 g2h_len_dw)
 {
 	return intel_guc_ct_send(&guc->ct, action, len, NULL, 0,
-				 INTEL_GUC_CT_SEND_NB);
+				 MAKE_SEND_FLAGS(g2h_len_dw));
 }
 
 static inline int
@@ -113,6 +114,7 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
 static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
 					   const u32 *action,
 					   u32 len,
+					   u32 g2h_len_dw,
 					   bool loop)
 {
 	int err;
@@ -123,7 +125,7 @@ static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
 	might_sleep_if(loop && not_atomic);
 
 retry:
-	err = intel_guc_send_nb(guc, action, len);
+	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (unlikely(err == -EBUSY && loop)) {
 		if (likely(not_atomic)) {
 			if (msleep_interruptible(sleep_period_ms))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 019b25ff1888..c33906ec478d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -73,6 +73,7 @@ static inline struct drm_device *ct_to_drm(struct intel_guc_ct *ct)
 #define CTB_DESC_SIZE		ALIGN(sizeof(struct guc_ct_buffer_desc), SZ_2K)
 #define CTB_H2G_BUFFER_SIZE	(SZ_4K)
 #define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
+#define G2H_ROOM_BUFFER_SIZE	(CTB_G2H_BUFFER_SIZE / 4)
 
 struct ct_request {
 	struct list_head link;
@@ -129,23 +130,27 @@ static void guc_ct_buffer_desc_init(struct guc_ct_buffer_desc *desc)
 
 static void guc_ct_buffer_reset(struct intel_guc_ct_buffer *ctb)
 {
+	u32 space;
+
 	ctb->broken = false;
 	ctb->tail = 0;
 	ctb->head = 0;
-	ctb->space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size);
+	space = CIRC_SPACE(ctb->tail, ctb->head, ctb->size) - ctb->resv_space;
+	atomic_set(&ctb->space, space);
 
 	guc_ct_buffer_desc_init(ctb->desc);
 }
 
 static void guc_ct_buffer_init(struct intel_guc_ct_buffer *ctb,
 			       struct guc_ct_buffer_desc *desc,
-			       u32 *cmds, u32 size_in_bytes)
+			       u32 *cmds, u32 size_in_bytes, u32 resv_space)
 {
 	GEM_BUG_ON(size_in_bytes % 4);
 
 	ctb->desc = desc;
 	ctb->cmds = cmds;
 	ctb->size = size_in_bytes / 4;
+	ctb->resv_space = resv_space / 4;
 
 	guc_ct_buffer_reset(ctb);
 }
@@ -226,6 +231,7 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	struct guc_ct_buffer_desc *desc;
 	u32 blob_size;
 	u32 cmds_size;
+	u32 resv_space;
 	void *blob;
 	u32 *cmds;
 	int err;
@@ -250,19 +256,23 @@ int intel_guc_ct_init(struct intel_guc_ct *ct)
 	desc = blob;
 	cmds = blob + 2 * CTB_DESC_SIZE;
 	cmds_size = CTB_H2G_BUFFER_SIZE;
-	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "send",
-		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
+	resv_space = 0;
+	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "send",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
+		 resv_space);
 
-	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size);
+	guc_ct_buffer_init(&ct->ctbs.send, desc, cmds, cmds_size, resv_space);
 
 	/* store pointers to desc and cmds for recv ctb */
 	desc = blob + CTB_DESC_SIZE;
 	cmds = blob + 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE;
 	cmds_size = CTB_G2H_BUFFER_SIZE;
-	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u\n", "recv",
-		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size);
+	resv_space = G2H_ROOM_BUFFER_SIZE;
+	CT_DEBUG(ct, "%s desc %#tx cmds %#tx size %u/%u\n", "recv",
+		 ptrdiff(desc, blob), ptrdiff(cmds, blob), cmds_size,
+		 resv_space);
 
-	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size);
+	guc_ct_buffer_init(&ct->ctbs.recv, desc, cmds, cmds_size, resv_space);
 
 	return 0;
 }
@@ -461,8 +471,8 @@ static int ct_write(struct intel_guc_ct *ct,
 
 	/* update local copies */
 	ctb->tail = tail;
-	GEM_BUG_ON(ctb->space < len + GUC_CTB_HDR_LEN);
-	ctb->space -= len + GUC_CTB_HDR_LEN;
+	GEM_BUG_ON(atomic_read(&ctb->space) < len + GUC_CTB_HDR_LEN);
+	atomic_sub(len + GUC_CTB_HDR_LEN, &ctb->space);
 
 	/* now update descriptor */
 	WRITE_ONCE(desc->tail, tail);
@@ -537,6 +547,32 @@ static inline bool ct_deadlocked(struct intel_guc_ct *ct)
 	return ret;
 }
 
+static inline bool g2h_has_room(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	struct intel_guc_ct_buffer *ctb = &ct->ctbs.recv;
+
+	/*
+	 * We leave a certain amount of space in the G2H CTB buffer for
+	 * unexpected G2H CTBs (e.g. logging, engine hang, etc...)
+	 */
+	return !g2h_len_dw || atomic_read(&ctb->space) >= g2h_len_dw;
+}
+
+static inline void g2h_reserve_space(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	lockdep_assert_held(&ct->ctbs.send.lock);
+
+	GEM_BUG_ON(!g2h_has_room(ct, g2h_len_dw));
+
+	if (g2h_len_dw)
+		atomic_sub(g2h_len_dw, &ct->ctbs.recv.space);
+}
+
+static inline void g2h_release_space(struct intel_guc_ct *ct, u32 g2h_len_dw)
+{
+	atomic_add(g2h_len_dw, &ct->ctbs.recv.space);
+}
+
 static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
@@ -544,7 +580,7 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 	u32 head;
 	u32 space;
 
-	if (ctb->space >= len_dw)
+	if (atomic_read(&ctb->space) >= len_dw)
 		return true;
 
 	head = READ_ONCE(desc->head);
@@ -557,16 +593,16 @@ static inline bool h2g_has_room(struct intel_guc_ct *ct, u32 len_dw)
 	}
 
 	space = CIRC_SPACE(ctb->tail, head, ctb->size);
-	ctb->space = space;
+	atomic_set(&ctb->space, space);
 
 	return space >= len_dw;
 }
 
-static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
+static int has_room_nb(struct intel_guc_ct *ct, u32 h2g_dw, u32 g2h_dw)
 {
 	lockdep_assert_held(&ct->ctbs.send.lock);
 
-	if (unlikely(!h2g_has_room(ct, len_dw))) {
+	if (unlikely(!h2g_has_room(ct, h2g_dw) || !g2h_has_room(ct, g2h_dw))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 
@@ -580,6 +616,9 @@ static int has_room_nb(struct intel_guc_ct *ct, u32 len_dw)
 	return 0;
 }
 
+#define G2H_LEN_DW(f) \
+	FIELD_GET(INTEL_GUC_CT_SEND_G2H_DW_MASK, f) ? \
+	FIELD_GET(INTEL_GUC_CT_SEND_G2H_DW_MASK, f) + GUC_CTB_HXG_MSG_MIN_LEN : 0
 static int ct_send_nb(struct intel_guc_ct *ct,
 		      const u32 *action,
 		      u32 len,
@@ -587,12 +626,13 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 {
 	struct intel_guc_ct_buffer *ctb = &ct->ctbs.send;
 	unsigned long spin_flags;
+	u32 g2h_len_dw = G2H_LEN_DW(flags);
 	u32 fence;
 	int ret;
 
 	spin_lock_irqsave(&ctb->lock, spin_flags);
 
-	ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN);
+	ret = has_room_nb(ct, len + GUC_CTB_HDR_LEN, g2h_len_dw);
 	if (unlikely(ret))
 		goto out;
 
@@ -601,6 +641,7 @@ static int ct_send_nb(struct intel_guc_ct *ct,
 	if (unlikely(ret))
 		goto out;
 
+	g2h_reserve_space(ct, g2h_len_dw);
 	intel_guc_notify(ct_to_guc(ct));
 
 out:
@@ -632,11 +673,13 @@ static int ct_send(struct intel_guc_ct *ct,
 	/*
 	 * We use a lazy spin wait loop here as we believe that if the CT
 	 * buffers are sized correctly the flow control condition should be
-	 * rare.
+	 * rare. Reserving the maximum size in the G2H credits as we don't know
+	 * how big the response is going to be.
 	 */
 retry:
 	spin_lock_irqsave(&ctb->lock, flags);
-	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN))) {
+	if (unlikely(!h2g_has_room(ct, len + GUC_CTB_HDR_LEN) ||
+		     !g2h_has_room(ct, GUC_CTB_HXG_MSG_MAX_LEN))) {
 		if (ct->stall_time == KTIME_MAX)
 			ct->stall_time = ktime_get();
 		spin_unlock_irqrestore(&ctb->lock, flags);
@@ -664,6 +707,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	spin_unlock(&ct->requests.lock);
 
 	err = ct_write(ct, action, len, fence, 0);
+	g2h_reserve_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
 
 	spin_unlock_irqrestore(&ctb->lock, flags);
 
@@ -673,6 +717,7 @@ static int ct_send(struct intel_guc_ct *ct,
 	intel_guc_notify(ct_to_guc(ct));
 
 	err = wait_for_ct_request_update(&request, status);
+	g2h_release_space(ct, GUC_CTB_HXG_MSG_MAX_LEN);
 	if (unlikely(err))
 		goto unlink;
 
@@ -992,10 +1037,22 @@ static void ct_incoming_request_worker_func(struct work_struct *w)
 static int ct_handle_event(struct intel_guc_ct *ct, struct ct_incoming_msg *request)
 {
 	const u32 *hxg = &request->msg[GUC_CTB_MSG_MIN_LEN];
+	u32 action = FIELD_GET(GUC_HXG_EVENT_MSG_0_ACTION, hxg[0]);
 	unsigned long flags;
 
 	GEM_BUG_ON(FIELD_GET(GUC_HXG_MSG_0_TYPE, hxg[0]) != GUC_HXG_TYPE_EVENT);
 
+	/*
+	 * Adjusting the space must be done in IRQ or deadlock can occur as the
+	 * CTB processing in the below workqueue can send CTBs which creates a
+	 * circular dependency if the space was returned there.
+	 */
+	switch (action) {
+	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
+	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
+		g2h_release_space(ct, request->size);
+	}
+
 	spin_lock_irqsave(&ct->requests.lock, flags);
 	list_add_tail(&request->link, &ct->requests.incoming);
 	spin_unlock_irqrestore(&ct->requests.lock, flags);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index edd1bba0445d..785dfc5c6efb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -33,6 +33,7 @@ struct intel_guc;
  * @desc: pointer to the buffer descriptor
  * @cmds: pointer to the commands buffer
  * @size: size of the commands buffer in dwords
+ * @resv_space: reserved space in buffer in dwords
  * @head: local shadow copy of head in dwords
  * @tail: local shadow copy of tail in dwords
  * @space: local shadow copy of space in dwords
@@ -43,9 +44,10 @@ struct intel_guc_ct_buffer {
 	struct guc_ct_buffer_desc *desc;
 	u32 *cmds;
 	u32 size;
+	u32 resv_space;
 	u32 tail;
 	u32 head;
-	u32 space;
+	atomic_t space;
 	bool broken;
 };
 
@@ -97,6 +99,11 @@ static inline bool intel_guc_ct_enabled(struct intel_guc_ct *ct)
 }
 
 #define INTEL_GUC_CT_SEND_NB		BIT(31)
+#define INTEL_GUC_CT_SEND_G2H_DW_SHIFT	0
+#define INTEL_GUC_CT_SEND_G2H_DW_MASK	(0xff << INTEL_GUC_CT_SEND_G2H_DW_SHIFT)
+#define MAKE_SEND_FLAGS(len) \
+	({GEM_BUG_ON(!FIELD_FIT(INTEL_GUC_CT_SEND_G2H_DW_MASK, len)); \
+	(FIELD_PREP(INTEL_GUC_CT_SEND_G2H_DW_MASK, len) | INTEL_GUC_CT_SEND_NB);})
 int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
index 4e4edc368b77..94bb1ca6f889 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h
@@ -17,6 +17,10 @@
 #include "abi/guc_communication_ctb_abi.h"
 #include "abi/guc_messages_abi.h"
 
+/* Payload length only i.e. don't include G2H header length */
+#define G2H_LEN_DW_SCHED_CONTEXT_MODE_SET	2
+#define G2H_LEN_DW_DEREGISTER_CONTEXT		1
+
 #define GUC_CONTEXT_DISABLE		0
 #define GUC_CONTEXT_ENABLE		1
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 4443cc6f5320..f7e34baa9506 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -260,6 +260,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	struct intel_context *ce = rq->context;
 	u32 action[3];
 	int len = 0;
+	u32 g2h_len_dw = 0;
 	bool enabled = context_enabled(ce);
 
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
@@ -271,13 +272,13 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		action[len++] = GUC_CONTEXT_ENABLE;
 		set_context_pending_enable(ce);
 		intel_context_get(ce);
+		g2h_len_dw = G2H_LEN_DW_SCHED_CONTEXT_MODE_SET;
 	} else {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
 		action[len++] = ce->guc_id;
 	}
 
-	err = intel_guc_send_nb(guc, action, len);
-
+	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
 		set_context_enabled(ce);
 	} else if (!enabled) {
@@ -730,7 +731,7 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 static int register_context(struct intel_context *ce)
@@ -750,7 +751,8 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 		guc_id,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
 }
 
 static int deregister_context(struct intel_context *ce, u32 guc_id)
@@ -891,7 +893,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	intel_context_get(ce);
 
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
+				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
 static u16 prep_context_pending_disable(struct intel_context *ce)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.

v2: rtimeout -> remaining_timeout
v3: Drop unnecessary includes, guc_submission_busy_loop ->
guc_submission_send_busy_loop, drop negatie timeout trick, move a
refactor of guc_context_unpin to earlier path (John H)

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            | 19 +++++
 drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
 drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +++++++++++++++++--
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 ++
 drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
 .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
 13 files changed, 129 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index a90f796e85c0..6fffd4d377c2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
 		goto insert;
 
 	/* Attempt to reap some mmap space from dead objects */
-	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
+	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
+					       NULL);
 	if (err)
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index e714e21c0a4d..acfdd53b2678 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
 	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
 }
 
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
+{
+	long remaining_timeout;
+
+	/* If the device is asleep, we have no requests outstanding */
+	if (!intel_gt_pm_is_awake(gt))
+		return 0;
+
+	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
+							   &remaining_timeout)) > 0) {
+		cond_resched();
+		if (signal_pending(current))
+			return -EINTR;
+	}
+
+	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
+							  remaining_timeout);
+}
+
 int intel_gt_init(struct intel_gt *gt)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index e7aabe0cc5bf..74e771871a9b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
 
 void intel_gt_driver_late_release(struct intel_gt *gt);
 
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
+
 void intel_gt_check_and_clear_faults(struct intel_gt *gt);
 void intel_gt_clear_error_registers(struct intel_gt *gt,
 				    intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 647eca9d867a..edb881d75630 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
 	GEM_BUG_ON(engine->retire);
 }
 
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+				      long *remaining_timeout)
 {
 	struct intel_gt_timelines *timelines = &gt->timelines;
 	struct intel_timeline *tl, *tn;
@@ -195,22 +196,10 @@ out_active:	spin_lock(&timelines->lock);
 	if (flush_submission(gt, timeout)) /* Wait, there's more! */
 		active_count++;
 
-	return active_count ? timeout : 0;
-}
-
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
-{
-	/* If the device is asleep, we have no requests outstanding */
-	if (!intel_gt_pm_is_awake(gt))
-		return 0;
-
-	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
-		cond_resched();
-		if (signal_pending(current))
-			return -EINTR;
-	}
+	if (remaining_timeout)
+		*remaining_timeout = timeout;
 
-	return timeout;
+	return active_count ? timeout : 0;
 }
 
 static void retire_work_handler(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
index fcc30a6e4fe9..83ff5280c06e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
@@ -10,10 +10,11 @@ struct intel_engine_cs;
 struct intel_gt;
 struct intel_timeline;
 
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+				      long *remaining_timeout);
 static inline void intel_gt_retire_requests(struct intel_gt *gt)
 {
-	intel_gt_retire_requests_timeout(gt, 0);
+	intel_gt_retire_requests_timeout(gt, 0, NULL);
 }
 
 void intel_engine_init_retire(struct intel_engine_cs *engine);
@@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
 			     struct intel_timeline *tl);
 void intel_engine_fini_retire(struct intel_engine_cs *engine);
 
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
-
 void intel_gt_init_requests(struct intel_gt *gt);
 void intel_gt_park_requests(struct intel_gt *gt);
 void intel_gt_unpark_requests(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 80b88bae5f24..3cc566565224 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -39,6 +39,8 @@ struct intel_guc {
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
 
+	atomic_t outstanding_submission_g2h;
+
 	struct {
 		void (*reset)(struct intel_guc *guc);
 		void (*enable)(struct intel_guc *guc);
@@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 	spin_unlock_irq(&guc->irq_lock);
 }
 
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
+
 int intel_guc_reset_engine(struct intel_guc *guc,
 			   struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index c33906ec478d..f1cbed6b9f0a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 	INIT_LIST_HEAD(&ct->requests.incoming);
 	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
 	tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
+	init_waitqueue_head(&ct->wq);
 }
 
 static inline const char *guc_ct_buffer_type_to_str(u32 type)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 785dfc5c6efb..4b30a562ae63 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -10,6 +10,7 @@
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
 #include <linux/ktime.h>
+#include <linux/wait.h>
 
 #include "intel_guc_fwif.h"
 
@@ -68,6 +69,9 @@ struct intel_guc_ct {
 
 	struct tasklet_struct receive_tasklet;
 
+	/** @wq: wait queue for g2h chanenl */
+	wait_queue_head_t wq;
+
 	struct {
 		u16 last_fence; /* last fence used to send request */
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f7e34baa9506..088d11e2e497 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -254,6 +254,69 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
 }
 
+static int guc_submission_send_busy_loop(struct intel_guc* guc,
+					 const u32 *action,
+					 u32 len,
+					 u32 g2h_len_dw,
+					 bool loop)
+{
+	int err;
+
+	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+
+	if (!err && g2h_len_dw)
+		atomic_inc(&guc->outstanding_submission_g2h);
+
+	return err;
+}
+
+static int guc_wait_for_pending_msg(struct intel_guc *guc,
+				    atomic_t *wait_var,
+				    bool interruptible,
+				    long timeout)
+{
+	const int state = interruptible ?
+		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	DEFINE_WAIT(wait);
+
+	might_sleep();
+	GEM_BUG_ON(timeout < 0);
+
+	if (!atomic_read(wait_var))
+		return 0;
+
+	if (!timeout)
+		return -ETIME;
+
+	for (;;) {
+		prepare_to_wait(&guc->ct.wq, &wait, state);
+
+		if (!atomic_read(wait_var))
+			break;
+
+		if (signal_pending_state(state, current)) {
+			timeout = -EINTR;
+			break;
+		}
+
+		if (!timeout) {
+			timeout = -ETIME;
+			break;
+		}
+
+		timeout = io_schedule_timeout(timeout);
+	}
+	finish_wait(&guc->ct.wq, &wait);
+
+	return (timeout < 0) ? timeout : 0;
+}
+
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
+{
+	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+					true, timeout);
+}
+
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err;
@@ -280,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
+		atomic_inc(&guc->outstanding_submission_g2h);
 		set_context_enabled(ce);
 	} else if (!enabled) {
 		clr_context_pending_enable(ce);
@@ -731,7 +795,8 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
+					     0, true);
 }
 
 static int register_context(struct intel_context *ce)
@@ -751,8 +816,9 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 		guc_id,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
-					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
+	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
+					     G2H_LEN_DW_DEREGISTER_CONTEXT,
+					     true);
 }
 
 static int deregister_context(struct intel_context *ce, u32 guc_id)
@@ -893,8 +959,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	intel_context_get(ce);
 
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
-				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
+	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
+				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
 static u16 prep_context_pending_disable(struct intel_context *ce)
@@ -1440,6 +1506,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 	return ce;
 }
 
+static void decr_outstanding_submission_g2h(struct intel_guc *guc)
+{
+	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
+		wake_up_all(&guc->ct.wq);
+}
+
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg,
 					  u32 len)
@@ -1475,6 +1547,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		lrc_destroy(&ce->ref);
 	}
 
+	decr_outstanding_submission_g2h(guc);
+
 	return 0;
 }
 
@@ -1523,6 +1597,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	}
 
+	decr_outstanding_submission_g2h(guc);
 	intel_context_put(ce);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index 9c954c589edf..c4cef885e984 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
 #undef uc_state_checkers
 #undef __uc_state_checker
 
+static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
+{
+	return intel_guc_wait_for_idle(&uc->guc, timeout);
+}
+
 #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
 static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
 { \
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 4d2d59a9942b..2b73ddb11c66 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -27,6 +27,7 @@
  */
 
 #include "gem/i915_gem_context.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_gt_requests.h"
 
 #include "i915_drv.h"
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
index c130010a7033..1c721542e277 100644
--- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
@@ -5,7 +5,7 @@
  */
 
 #include "i915_drv.h"
-#include "gt/intel_gt_requests.h"
+#include "gt/intel_gt.h"
 
 #include "../i915_selftest.h"
 #include "igt_flush_test.h"
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index d189c4bd4bef..4f8180146888 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915)
 	do {
 		for_each_engine(engine, gt, id)
 			mock_engine_flush(engine);
-	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
+	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
+						  NULL));
 }
 
 static void mock_device_release(struct drm_device *dev)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.

v2: rtimeout -> remaining_timeout
v3: Drop unnecessary includes, guc_submission_busy_loop ->
guc_submission_send_busy_loop, drop negatie timeout trick, move a
refactor of guc_context_unpin to earlier path (John H)

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
 drivers/gpu/drm/i915/gt/intel_gt.c            | 19 +++++
 drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
 drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
 drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +++++++++++++++++--
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 ++
 drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
 .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
 13 files changed, 129 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index a90f796e85c0..6fffd4d377c2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
 		goto insert;
 
 	/* Attempt to reap some mmap space from dead objects */
-	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
+	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
+					       NULL);
 	if (err)
 		goto err;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index e714e21c0a4d..acfdd53b2678 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
 	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
 }
 
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
+{
+	long remaining_timeout;
+
+	/* If the device is asleep, we have no requests outstanding */
+	if (!intel_gt_pm_is_awake(gt))
+		return 0;
+
+	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
+							   &remaining_timeout)) > 0) {
+		cond_resched();
+		if (signal_pending(current))
+			return -EINTR;
+	}
+
+	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
+							  remaining_timeout);
+}
+
 int intel_gt_init(struct intel_gt *gt)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
index e7aabe0cc5bf..74e771871a9b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
 
 void intel_gt_driver_late_release(struct intel_gt *gt);
 
+int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
+
 void intel_gt_check_and_clear_faults(struct intel_gt *gt);
 void intel_gt_clear_error_registers(struct intel_gt *gt,
 				    intel_engine_mask_t engine_mask);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 647eca9d867a..edb881d75630 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
 	GEM_BUG_ON(engine->retire);
 }
 
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+				      long *remaining_timeout)
 {
 	struct intel_gt_timelines *timelines = &gt->timelines;
 	struct intel_timeline *tl, *tn;
@@ -195,22 +196,10 @@ out_active:	spin_lock(&timelines->lock);
 	if (flush_submission(gt, timeout)) /* Wait, there's more! */
 		active_count++;
 
-	return active_count ? timeout : 0;
-}
-
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
-{
-	/* If the device is asleep, we have no requests outstanding */
-	if (!intel_gt_pm_is_awake(gt))
-		return 0;
-
-	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
-		cond_resched();
-		if (signal_pending(current))
-			return -EINTR;
-	}
+	if (remaining_timeout)
+		*remaining_timeout = timeout;
 
-	return timeout;
+	return active_count ? timeout : 0;
 }
 
 static void retire_work_handler(struct work_struct *work)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
index fcc30a6e4fe9..83ff5280c06e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
@@ -10,10 +10,11 @@ struct intel_engine_cs;
 struct intel_gt;
 struct intel_timeline;
 
-long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
+long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
+				      long *remaining_timeout);
 static inline void intel_gt_retire_requests(struct intel_gt *gt)
 {
-	intel_gt_retire_requests_timeout(gt, 0);
+	intel_gt_retire_requests_timeout(gt, 0, NULL);
 }
 
 void intel_engine_init_retire(struct intel_engine_cs *engine);
@@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
 			     struct intel_timeline *tl);
 void intel_engine_fini_retire(struct intel_engine_cs *engine);
 
-int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
-
 void intel_gt_init_requests(struct intel_gt *gt);
 void intel_gt_park_requests(struct intel_gt *gt);
 void intel_gt_unpark_requests(struct intel_gt *gt);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 80b88bae5f24..3cc566565224 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -39,6 +39,8 @@ struct intel_guc {
 	spinlock_t irq_lock;
 	unsigned int msg_enabled_mask;
 
+	atomic_t outstanding_submission_g2h;
+
 	struct {
 		void (*reset)(struct intel_guc *guc);
 		void (*enable)(struct intel_guc *guc);
@@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 	spin_unlock_irq(&guc->irq_lock);
 }
 
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
+
 int intel_guc_reset_engine(struct intel_guc *guc,
 			   struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index c33906ec478d..f1cbed6b9f0a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
 	INIT_LIST_HEAD(&ct->requests.incoming);
 	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
 	tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
+	init_waitqueue_head(&ct->wq);
 }
 
 static inline const char *guc_ct_buffer_type_to_str(u32 type)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 785dfc5c6efb..4b30a562ae63 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -10,6 +10,7 @@
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
 #include <linux/ktime.h>
+#include <linux/wait.h>
 
 #include "intel_guc_fwif.h"
 
@@ -68,6 +69,9 @@ struct intel_guc_ct {
 
 	struct tasklet_struct receive_tasklet;
 
+	/** @wq: wait queue for g2h chanenl */
+	wait_queue_head_t wq;
+
 	struct {
 		u16 last_fence; /* last fence used to send request */
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f7e34baa9506..088d11e2e497 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -254,6 +254,69 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
 }
 
+static int guc_submission_send_busy_loop(struct intel_guc* guc,
+					 const u32 *action,
+					 u32 len,
+					 u32 g2h_len_dw,
+					 bool loop)
+{
+	int err;
+
+	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
+
+	if (!err && g2h_len_dw)
+		atomic_inc(&guc->outstanding_submission_g2h);
+
+	return err;
+}
+
+static int guc_wait_for_pending_msg(struct intel_guc *guc,
+				    atomic_t *wait_var,
+				    bool interruptible,
+				    long timeout)
+{
+	const int state = interruptible ?
+		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	DEFINE_WAIT(wait);
+
+	might_sleep();
+	GEM_BUG_ON(timeout < 0);
+
+	if (!atomic_read(wait_var))
+		return 0;
+
+	if (!timeout)
+		return -ETIME;
+
+	for (;;) {
+		prepare_to_wait(&guc->ct.wq, &wait, state);
+
+		if (!atomic_read(wait_var))
+			break;
+
+		if (signal_pending_state(state, current)) {
+			timeout = -EINTR;
+			break;
+		}
+
+		if (!timeout) {
+			timeout = -ETIME;
+			break;
+		}
+
+		timeout = io_schedule_timeout(timeout);
+	}
+	finish_wait(&guc->ct.wq, &wait);
+
+	return (timeout < 0) ? timeout : 0;
+}
+
+int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
+{
+	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+					true, timeout);
+}
+
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err;
@@ -280,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
+		atomic_inc(&guc->outstanding_submission_g2h);
 		set_context_enabled(ce);
 	} else if (!enabled) {
 		clr_context_pending_enable(ce);
@@ -731,7 +795,8 @@ static int __guc_action_register_context(struct intel_guc *guc,
 		offset,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
+					     0, true);
 }
 
 static int register_context(struct intel_context *ce)
@@ -751,8 +816,9 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 		guc_id,
 	};
 
-	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
-					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
+	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
+					     G2H_LEN_DW_DEREGISTER_CONTEXT,
+					     true);
 }
 
 static int deregister_context(struct intel_context *ce, u32 guc_id)
@@ -893,8 +959,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	intel_context_get(ce);
 
-	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
-				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
+	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
+				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
 static u16 prep_context_pending_disable(struct intel_context *ce)
@@ -1440,6 +1506,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
 	return ce;
 }
 
+static void decr_outstanding_submission_g2h(struct intel_guc *guc)
+{
+	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
+		wake_up_all(&guc->ct.wq);
+}
+
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg,
 					  u32 len)
@@ -1475,6 +1547,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		lrc_destroy(&ce->ref);
 	}
 
+	decr_outstanding_submission_g2h(guc);
+
 	return 0;
 }
 
@@ -1523,6 +1597,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 	}
 
+	decr_outstanding_submission_g2h(guc);
 	intel_context_put(ce);
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index 9c954c589edf..c4cef885e984 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
 #undef uc_state_checkers
 #undef __uc_state_checker
 
+static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
+{
+	return intel_guc_wait_for_idle(&uc->guc, timeout);
+}
+
 #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
 static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
 { \
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 4d2d59a9942b..2b73ddb11c66 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -27,6 +27,7 @@
  */
 
 #include "gem/i915_gem_context.h"
+#include "gt/intel_gt.h"
 #include "gt/intel_gt_requests.h"
 
 #include "i915_drv.h"
diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
index c130010a7033..1c721542e277 100644
--- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
@@ -5,7 +5,7 @@
  */
 
 #include "i915_drv.h"
-#include "gt/intel_gt_requests.h"
+#include "gt/intel_gt.h"
 
 #include "../i915_selftest.h"
 #include "igt_flush_test.h"
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index d189c4bd4bef..4f8180146888 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915)
 	do {
 		for_each_engine(engine, gt, id)
 			mock_engine_flush(engine);
-	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
+	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
+						  NULL));
 }
 
 static void mock_device_release(struct drm_device *dev)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 16/51] drm/i915/guc: Update GuC debugfs to support new GuC
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Update GuC debugfs to support the new GuC structures.

v2:
 (John Harrison)
  - Remove intel_lrc_reg.h include from i915_debugfs.c
 (Michal)
  - Rename GuC debugfs functions

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 22 ++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  3 +
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    | 23 +++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 55 +++++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
 5 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index f1cbed6b9f0a..503a78517610 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -1171,3 +1171,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
 
 	ct_try_receive_message(ct);
 }
+
+void intel_guc_ct_print_info(struct intel_guc_ct *ct,
+			     struct drm_printer *p)
+{
+	drm_printf(p, "CT %s\n", enableddisabled(ct->enabled));
+
+	if (!ct->enabled)
+		return;
+
+	drm_printf(p, "H2G Space: %u\n",
+		   atomic_read(&ct->ctbs.send.space) * 4);
+	drm_printf(p, "Head: %u\n",
+		   ct->ctbs.send.desc->head);
+	drm_printf(p, "Tail: %u\n",
+		   ct->ctbs.send.desc->tail);
+	drm_printf(p, "G2H Space: %u\n",
+		   atomic_read(&ct->ctbs.recv.space) * 4);
+	drm_printf(p, "Head: %u\n",
+		   ct->ctbs.recv.desc->head);
+	drm_printf(p, "Tail: %u\n",
+		   ct->ctbs.recv.desc->tail);
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 4b30a562ae63..7b34026d264a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -16,6 +16,7 @@
 
 struct i915_vma;
 struct intel_guc;
+struct drm_printer;
 
 /**
  * DOC: Command Transport (CT).
@@ -112,4 +113,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
+void intel_guc_ct_print_info(struct intel_guc_ct *ct, struct drm_printer *p);
+
 #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index fe7cb7b29a1e..7a454c91a736 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -9,6 +9,8 @@
 #include "intel_guc.h"
 #include "intel_guc_debugfs.h"
 #include "intel_guc_log_debugfs.h"
+#include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_submission.h"
 
 static int guc_info_show(struct seq_file *m, void *data)
 {
@@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data)
 	drm_puts(&p, "\n");
 	intel_guc_log_info(&guc->log, &p);
 
-	/* Add more as required ... */
+	if (!intel_guc_submission_is_used(guc))
+		return 0;
+
+	intel_guc_ct_print_info(&guc->ct, &p);
+	intel_guc_submission_print_info(guc, &p);
 
 	return 0;
 }
 DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
 
+static int guc_registered_contexts_show(struct seq_file *m, void *data)
+{
+	struct intel_guc *guc = m->private;
+	struct drm_printer p = drm_seq_file_printer(m);
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	intel_guc_submission_print_context_info(guc, &p);
+
+	return 0;
+}
+DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
+
 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
 {
 	static const struct debugfs_gt_file files[] = {
 		{ "guc_info", &guc_info_fops, NULL },
+		{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
 	};
 
 	if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 088d11e2e497..a2af7e17dcc2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1602,3 +1602,58 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 
 	return 0;
 }
+
+void intel_guc_submission_print_info(struct intel_guc *guc,
+				     struct drm_printer *p)
+{
+	struct i915_sched_engine *sched_engine = guc->sched_engine;
+	struct rb_node *rb;
+	unsigned long flags;
+
+	if (!sched_engine)
+		return;
+
+	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
+		   atomic_read(&guc->outstanding_submission_g2h));
+	drm_printf(p, "GuC tasklet count: %u\n\n",
+		   atomic_read(&sched_engine->tasklet.count));
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	drm_printf(p, "Requests in GuC submit tasklet:\n");
+	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
+		struct i915_priolist *pl = to_priolist(rb);
+		struct i915_request *rq;
+
+		priolist_for_each_request(rq, pl)
+			drm_printf(p, "guc_id=%u, seqno=%llu\n",
+				   rq->context->guc_id,
+				   rq->fence.seqno);
+	}
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+	drm_printf(p, "\n");
+}
+
+void intel_guc_submission_print_context_info(struct intel_guc *guc,
+					     struct drm_printer *p)
+{
+	struct intel_context *ce;
+	unsigned long index;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
+		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
+		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
+			   ce->ring->head,
+			   ce->lrc_reg_state[CTX_RING_HEAD]);
+		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
+			   ce->ring->tail,
+			   ce->lrc_reg_state[CTX_RING_TAIL]);
+		drm_printf(p, "\t\tContext Pin Count: %u\n",
+			   atomic_read(&ce->pin_count));
+		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
+			   atomic_read(&ce->guc_id_ref));
+		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
+			   ce->guc_state.sched_state,
+			   atomic_read(&ce->guc_sched_state_no_lock));
+	}
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 3f7005018939..2b9470c90558 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -10,6 +10,7 @@
 
 #include "intel_guc.h"
 
+struct drm_printer;
 struct intel_engine_cs;
 
 void intel_guc_submission_init_early(struct intel_guc *guc);
@@ -20,6 +21,10 @@ void intel_guc_submission_fini(struct intel_guc *guc);
 int intel_guc_preempt_work_create(struct intel_guc *guc);
 void intel_guc_preempt_work_destroy(struct intel_guc *guc);
 int intel_guc_submission_setup(struct intel_engine_cs *engine);
+void intel_guc_submission_print_info(struct intel_guc *guc,
+				     struct drm_printer *p);
+void intel_guc_submission_print_context_info(struct intel_guc *guc,
+					     struct drm_printer *p);
 
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 16/51] drm/i915/guc: Update GuC debugfs to support new GuC
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Update GuC debugfs to support the new GuC structures.

v2:
 (John Harrison)
  - Remove intel_lrc_reg.h include from i915_debugfs.c
 (Michal)
  - Rename GuC debugfs functions

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 22 ++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  3 +
 .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    | 23 +++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 55 +++++++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
 5 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index f1cbed6b9f0a..503a78517610 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -1171,3 +1171,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
 
 	ct_try_receive_message(ct);
 }
+
+void intel_guc_ct_print_info(struct intel_guc_ct *ct,
+			     struct drm_printer *p)
+{
+	drm_printf(p, "CT %s\n", enableddisabled(ct->enabled));
+
+	if (!ct->enabled)
+		return;
+
+	drm_printf(p, "H2G Space: %u\n",
+		   atomic_read(&ct->ctbs.send.space) * 4);
+	drm_printf(p, "Head: %u\n",
+		   ct->ctbs.send.desc->head);
+	drm_printf(p, "Tail: %u\n",
+		   ct->ctbs.send.desc->tail);
+	drm_printf(p, "G2H Space: %u\n",
+		   atomic_read(&ct->ctbs.recv.space) * 4);
+	drm_printf(p, "Head: %u\n",
+		   ct->ctbs.recv.desc->head);
+	drm_printf(p, "Tail: %u\n",
+		   ct->ctbs.recv.desc->tail);
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
index 4b30a562ae63..7b34026d264a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
@@ -16,6 +16,7 @@
 
 struct i915_vma;
 struct intel_guc;
+struct drm_printer;
 
 /**
  * DOC: Command Transport (CT).
@@ -112,4 +113,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 		      u32 *response_buf, u32 response_buf_size, u32 flags);
 void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
 
+void intel_guc_ct_print_info(struct intel_guc_ct *ct, struct drm_printer *p);
+
 #endif /* _INTEL_GUC_CT_H_ */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index fe7cb7b29a1e..7a454c91a736 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -9,6 +9,8 @@
 #include "intel_guc.h"
 #include "intel_guc_debugfs.h"
 #include "intel_guc_log_debugfs.h"
+#include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_submission.h"
 
 static int guc_info_show(struct seq_file *m, void *data)
 {
@@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data)
 	drm_puts(&p, "\n");
 	intel_guc_log_info(&guc->log, &p);
 
-	/* Add more as required ... */
+	if (!intel_guc_submission_is_used(guc))
+		return 0;
+
+	intel_guc_ct_print_info(&guc->ct, &p);
+	intel_guc_submission_print_info(guc, &p);
 
 	return 0;
 }
 DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
 
+static int guc_registered_contexts_show(struct seq_file *m, void *data)
+{
+	struct intel_guc *guc = m->private;
+	struct drm_printer p = drm_seq_file_printer(m);
+
+	if (!intel_guc_submission_is_used(guc))
+		return -ENODEV;
+
+	intel_guc_submission_print_context_info(guc, &p);
+
+	return 0;
+}
+DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
+
 void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
 {
 	static const struct debugfs_gt_file files[] = {
 		{ "guc_info", &guc_info_fops, NULL },
+		{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
 	};
 
 	if (!intel_guc_is_supported(guc))
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 088d11e2e497..a2af7e17dcc2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1602,3 +1602,58 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 
 	return 0;
 }
+
+void intel_guc_submission_print_info(struct intel_guc *guc,
+				     struct drm_printer *p)
+{
+	struct i915_sched_engine *sched_engine = guc->sched_engine;
+	struct rb_node *rb;
+	unsigned long flags;
+
+	if (!sched_engine)
+		return;
+
+	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
+		   atomic_read(&guc->outstanding_submission_g2h));
+	drm_printf(p, "GuC tasklet count: %u\n\n",
+		   atomic_read(&sched_engine->tasklet.count));
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	drm_printf(p, "Requests in GuC submit tasklet:\n");
+	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
+		struct i915_priolist *pl = to_priolist(rb);
+		struct i915_request *rq;
+
+		priolist_for_each_request(rq, pl)
+			drm_printf(p, "guc_id=%u, seqno=%llu\n",
+				   rq->context->guc_id,
+				   rq->fence.seqno);
+	}
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+	drm_printf(p, "\n");
+}
+
+void intel_guc_submission_print_context_info(struct intel_guc *guc,
+					     struct drm_printer *p)
+{
+	struct intel_context *ce;
+	unsigned long index;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
+		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
+		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
+			   ce->ring->head,
+			   ce->lrc_reg_state[CTX_RING_HEAD]);
+		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
+			   ce->ring->tail,
+			   ce->lrc_reg_state[CTX_RING_TAIL]);
+		drm_printf(p, "\t\tContext Pin Count: %u\n",
+			   atomic_read(&ce->pin_count));
+		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
+			   atomic_read(&ce->guc_id_ref));
+		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
+			   ce->guc_state.sched_state,
+			   atomic_read(&ce->guc_sched_state_no_lock));
+	}
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 3f7005018939..2b9470c90558 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -10,6 +10,7 @@
 
 #include "intel_guc.h"
 
+struct drm_printer;
 struct intel_engine_cs;
 
 void intel_guc_submission_init_early(struct intel_guc *guc);
@@ -20,6 +21,10 @@ void intel_guc_submission_fini(struct intel_guc *guc);
 int intel_guc_preempt_work_create(struct intel_guc *guc);
 void intel_guc_preempt_work_destroy(struct intel_guc *guc);
 int intel_guc_submission_setup(struct intel_engine_cs *engine);
+void intel_guc_submission_print_info(struct intel_guc *guc,
+				     struct drm_printer *p);
+void intel_guc_submission_print_context_info(struct intel_guc *guc,
+					     struct drm_printer *p);
 
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 17/51] drm/i915/guc: Add several request trace points
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Add trace points for request dependencies and GuC submit. Extended
existing request trace points to include submit fence value,, guc_id,
and ring tail value.

v2: Fix white space alignment in i915_request_add trace point

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
 drivers/gpu/drm/i915/i915_request.c           |  3 ++
 drivers/gpu/drm/i915/i915_trace.h             | 43 +++++++++++++++++--
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a2af7e17dcc2..480fb2184ecf 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -417,6 +417,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 			guc->stalled_request = last;
 			return false;
 		}
+		trace_i915_request_guc_submit(last);
 	}
 
 	guc->stalled_request = NULL;
@@ -637,6 +638,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	ret = guc_add_request(guc, rq);
 	if (ret == -EBUSY)
 		guc->stalled_request = rq;
+	else
+		trace_i915_request_guc_submit(rq);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2b2b63cba06c..01aa3d1ee2b1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1319,6 +1319,9 @@ __i915_request_await_execution(struct i915_request *to,
 			return err;
 	}
 
+	trace_i915_request_dep_to(to);
+	trace_i915_request_dep_from(from);
+
 	/* Couple the dependency tree for PI on this exposed to->fence */
 	if (to->engine->sched_engine->schedule) {
 		err = i915_sched_node_add_dependency(&to->sched,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6778ad2a14a4..ea41d069bf7d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -794,30 +794,50 @@ DECLARE_EVENT_CLASS(i915_request,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u64, ctx)
+			     __field(u32, guc_id)
 			     __field(u16, class)
 			     __field(u16, instance)
 			     __field(u32, seqno)
+			     __field(u32, tail)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = rq->engine->i915->drm.primary->index;
 			   __entry->class = rq->engine->uabi_class;
 			   __entry->instance = rq->engine->uabi_instance;
+			   __entry->guc_id = rq->context->guc_id;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
+			   __entry->tail = rq->tail;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
+	    TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u",
 		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->ctx, __entry->seqno)
+		      __entry->guc_id, __entry->ctx, __entry->seqno,
+		      __entry->tail)
 );
 
 DEFINE_EVENT(i915_request, i915_request_add,
-	    TP_PROTO(struct i915_request *rq),
-	    TP_ARGS(rq)
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
 );
 
 #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+DEFINE_EVENT(i915_request, i915_request_dep_to,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_dep_from,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_guc_submit,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
 DEFINE_EVENT(i915_request, i915_request_submit,
 	     TP_PROTO(struct i915_request *rq),
 	     TP_ARGS(rq)
@@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
 
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
+static inline void
+trace_i915_request_dep_to(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_dep_from(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_guc_submit(struct i915_request *rq)
+{
+}
+
 static inline void
 trace_i915_request_submit(struct i915_request *rq)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 17/51] drm/i915/guc: Add several request trace points
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add trace points for request dependencies and GuC submit. Extended
existing request trace points to include submit fence value,, guc_id,
and ring tail value.

v2: Fix white space alignment in i915_request_add trace point

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
 drivers/gpu/drm/i915/i915_request.c           |  3 ++
 drivers/gpu/drm/i915/i915_trace.h             | 43 +++++++++++++++++--
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index a2af7e17dcc2..480fb2184ecf 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -417,6 +417,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 			guc->stalled_request = last;
 			return false;
 		}
+		trace_i915_request_guc_submit(last);
 	}
 
 	guc->stalled_request = NULL;
@@ -637,6 +638,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	ret = guc_add_request(guc, rq);
 	if (ret == -EBUSY)
 		guc->stalled_request = rq;
+	else
+		trace_i915_request_guc_submit(rq);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2b2b63cba06c..01aa3d1ee2b1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1319,6 +1319,9 @@ __i915_request_await_execution(struct i915_request *to,
 			return err;
 	}
 
+	trace_i915_request_dep_to(to);
+	trace_i915_request_dep_from(from);
+
 	/* Couple the dependency tree for PI on this exposed to->fence */
 	if (to->engine->sched_engine->schedule) {
 		err = i915_sched_node_add_dependency(&to->sched,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6778ad2a14a4..ea41d069bf7d 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -794,30 +794,50 @@ DECLARE_EVENT_CLASS(i915_request,
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u64, ctx)
+			     __field(u32, guc_id)
 			     __field(u16, class)
 			     __field(u16, instance)
 			     __field(u32, seqno)
+			     __field(u32, tail)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = rq->engine->i915->drm.primary->index;
 			   __entry->class = rq->engine->uabi_class;
 			   __entry->instance = rq->engine->uabi_instance;
+			   __entry->guc_id = rq->context->guc_id;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
+			   __entry->tail = rq->tail;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
+	    TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u",
 		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->ctx, __entry->seqno)
+		      __entry->guc_id, __entry->ctx, __entry->seqno,
+		      __entry->tail)
 );
 
 DEFINE_EVENT(i915_request, i915_request_add,
-	    TP_PROTO(struct i915_request *rq),
-	    TP_ARGS(rq)
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
 );
 
 #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
+DEFINE_EVENT(i915_request, i915_request_dep_to,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_dep_from,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
+DEFINE_EVENT(i915_request, i915_request_guc_submit,
+	     TP_PROTO(struct i915_request *rq),
+	     TP_ARGS(rq)
+);
+
 DEFINE_EVENT(i915_request, i915_request_submit,
 	     TP_PROTO(struct i915_request *rq),
 	     TP_ARGS(rq)
@@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
 
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
+static inline void
+trace_i915_request_dep_to(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_dep_from(struct i915_request *rq)
+{
+}
+
+static inline void
+trace_i915_request_guc_submit(struct i915_request *rq)
+{
+}
+
 static inline void
 trace_i915_request_submit(struct i915_request *rq)
 {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 18/51] drm/i915: Add intel_context tracing
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Add intel_context tracing. These trace points are particular helpful
when debugging the GuC firmware and can be enabled via
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   6 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  14 ++
 drivers/gpu/drm/i915/i915_trace.h             | 144 ++++++++++++++++++
 3 files changed, 164 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 91349d071e0e..251ff7eea22d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -8,6 +8,7 @@
 
 #include "i915_drv.h"
 #include "i915_globals.h"
+#include "i915_trace.h"
 
 #include "intel_context.h"
 #include "intel_engine.h"
@@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu)
 {
 	struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
 
+	trace_intel_context_free(ce);
 	kmem_cache_free(global.slab_ce, ce);
 }
 
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine)
 		return ERR_PTR(-ENOMEM);
 
 	intel_context_init(ce, engine);
+	trace_intel_context_create(ce);
 	return ce;
 }
 
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
 
 	GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
 
+	trace_intel_context_do_pin(ce);
+
 err_unlock:
 	mutex_unlock(&ce->pin_mutex);
 err_post_unpin:
@@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub)
 	 */
 	intel_context_get(ce);
 	intel_context_active_release(ce);
+	trace_intel_context_do_unpin(ce);
 	intel_context_put(ce);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 480fb2184ecf..05958260e849 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -343,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
+		trace_intel_context_sched_enable(ce);
 		atomic_inc(&guc->outstanding_submission_g2h);
 		set_context_enabled(ce);
 	} else if (!enabled) {
@@ -808,6 +809,8 @@ static int register_context(struct intel_context *ce)
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
 		ce->guc_id * sizeof(struct guc_lrc_desc);
 
+	trace_intel_context_register(ce);
+
 	return __guc_action_register_context(guc, ce->guc_id, offset);
 }
 
@@ -828,6 +831,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
+	trace_intel_context_deregister(ce);
+
 	return __guc_action_deregister_context(guc, guc_id);
 }
 
@@ -902,6 +907,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	 * GuC before registering this context.
 	 */
 	if (context_registered) {
+		trace_intel_context_steal_guc_id(ce);
 		set_context_wait_for_deregister_to_register(ce);
 		intel_context_get(ce);
 
@@ -960,6 +966,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 
+	trace_intel_context_sched_disable(ce);
 	intel_context_get(ce);
 
 	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1121,6 +1128,9 @@ static void __guc_signal_context_fence(struct intel_context *ce)
 
 	lockdep_assert_held(&ce->guc_state.lock);
 
+	if (!list_empty(&ce->guc_state.fences))
+		trace_intel_context_fence_release(ce);
+
 	list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
 		i915_sw_fence_complete(&rq->submit);
 
@@ -1531,6 +1541,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 	if (unlikely(!ce))
 		return -EPROTO;
 
+	trace_intel_context_deregister_done(ce);
+
 	if (context_wait_for_deregister_to_register(ce)) {
 		struct intel_runtime_pm *runtime_pm =
 			&ce->engine->gt->i915->runtime_pm;
@@ -1582,6 +1594,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		return -EPROTO;
 	}
 
+	trace_intel_context_sched_done(ce);
+
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index ea41d069bf7d..97c2e83984ed 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -905,6 +905,90 @@ TRACE_EVENT(i915_request_out,
 			      __entry->ctx, __entry->seqno, __entry->completed)
 );
 
+DECLARE_EVENT_CLASS(intel_context,
+	    TP_PROTO(struct intel_context *ce),
+	    TP_ARGS(ce),
+
+	    TP_STRUCT__entry(
+			     __field(u32, guc_id)
+			     __field(int, pin_count)
+			     __field(u32, sched_state)
+			     __field(u32, guc_sched_state_no_lock)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->guc_id = ce->guc_id;
+			   __entry->pin_count = atomic_read(&ce->pin_count);
+			   __entry->sched_state = ce->guc_state.sched_state;
+			   __entry->guc_sched_state_no_lock =
+			   atomic_read(&ce->guc_sched_state_no_lock);
+			   ),
+
+	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
+		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
+		      __entry->guc_sched_state_no_lock)
+);
+
+DEFINE_EVENT(intel_context, intel_context_register,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_deregister,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_deregister_done,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_enable,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_disable,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_done,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_create,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_fence_release,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_free,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_steal_guc_id,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_do_pin,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_do_unpin,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
 static inline void
@@ -941,6 +1025,66 @@ static inline void
 trace_i915_request_out(struct i915_request *rq)
 {
 }
+
+static inline void
+trace_intel_context_register(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_deregister(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_deregister_done(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_enable(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_disable(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_done(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_create(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_fence_release(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_free(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_steal_guc_id(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_do_pin(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_do_unpin(struct intel_context *ce)
+{
+}
 #endif
 #endif
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 18/51] drm/i915: Add intel_context tracing
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add intel_context tracing. These trace points are particular helpful
when debugging the GuC firmware and can be enabled via
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   6 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  14 ++
 drivers/gpu/drm/i915/i915_trace.h             | 144 ++++++++++++++++++
 3 files changed, 164 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 91349d071e0e..251ff7eea22d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -8,6 +8,7 @@
 
 #include "i915_drv.h"
 #include "i915_globals.h"
+#include "i915_trace.h"
 
 #include "intel_context.h"
 #include "intel_engine.h"
@@ -28,6 +29,7 @@ static void rcu_context_free(struct rcu_head *rcu)
 {
 	struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
 
+	trace_intel_context_free(ce);
 	kmem_cache_free(global.slab_ce, ce);
 }
 
@@ -46,6 +48,7 @@ intel_context_create(struct intel_engine_cs *engine)
 		return ERR_PTR(-ENOMEM);
 
 	intel_context_init(ce, engine);
+	trace_intel_context_create(ce);
 	return ce;
 }
 
@@ -268,6 +271,8 @@ int __intel_context_do_pin_ww(struct intel_context *ce,
 
 	GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
 
+	trace_intel_context_do_pin(ce);
+
 err_unlock:
 	mutex_unlock(&ce->pin_mutex);
 err_post_unpin:
@@ -323,6 +328,7 @@ void __intel_context_do_unpin(struct intel_context *ce, int sub)
 	 */
 	intel_context_get(ce);
 	intel_context_active_release(ce);
+	trace_intel_context_do_unpin(ce);
 	intel_context_put(ce);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 480fb2184ecf..05958260e849 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -343,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 
 	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
 	if (!enabled && !err) {
+		trace_intel_context_sched_enable(ce);
 		atomic_inc(&guc->outstanding_submission_g2h);
 		set_context_enabled(ce);
 	} else if (!enabled) {
@@ -808,6 +809,8 @@ static int register_context(struct intel_context *ce)
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
 		ce->guc_id * sizeof(struct guc_lrc_desc);
 
+	trace_intel_context_register(ce);
+
 	return __guc_action_register_context(guc, ce->guc_id, offset);
 }
 
@@ -828,6 +831,8 @@ static int deregister_context(struct intel_context *ce, u32 guc_id)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
+	trace_intel_context_deregister(ce);
+
 	return __guc_action_deregister_context(guc, guc_id);
 }
 
@@ -902,6 +907,7 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	 * GuC before registering this context.
 	 */
 	if (context_registered) {
+		trace_intel_context_steal_guc_id(ce);
 		set_context_wait_for_deregister_to_register(ce);
 		intel_context_get(ce);
 
@@ -960,6 +966,7 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 
+	trace_intel_context_sched_disable(ce);
 	intel_context_get(ce);
 
 	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
@@ -1121,6 +1128,9 @@ static void __guc_signal_context_fence(struct intel_context *ce)
 
 	lockdep_assert_held(&ce->guc_state.lock);
 
+	if (!list_empty(&ce->guc_state.fences))
+		trace_intel_context_fence_release(ce);
+
 	list_for_each_entry(rq, &ce->guc_state.fences, guc_fence_link)
 		i915_sw_fence_complete(&rq->submit);
 
@@ -1531,6 +1541,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 	if (unlikely(!ce))
 		return -EPROTO;
 
+	trace_intel_context_deregister_done(ce);
+
 	if (context_wait_for_deregister_to_register(ce)) {
 		struct intel_runtime_pm *runtime_pm =
 			&ce->engine->gt->i915->runtime_pm;
@@ -1582,6 +1594,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		return -EPROTO;
 	}
 
+	trace_intel_context_sched_done(ce);
+
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index ea41d069bf7d..97c2e83984ed 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -905,6 +905,90 @@ TRACE_EVENT(i915_request_out,
 			      __entry->ctx, __entry->seqno, __entry->completed)
 );
 
+DECLARE_EVENT_CLASS(intel_context,
+	    TP_PROTO(struct intel_context *ce),
+	    TP_ARGS(ce),
+
+	    TP_STRUCT__entry(
+			     __field(u32, guc_id)
+			     __field(int, pin_count)
+			     __field(u32, sched_state)
+			     __field(u32, guc_sched_state_no_lock)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->guc_id = ce->guc_id;
+			   __entry->pin_count = atomic_read(&ce->pin_count);
+			   __entry->sched_state = ce->guc_state.sched_state;
+			   __entry->guc_sched_state_no_lock =
+			   atomic_read(&ce->guc_sched_state_no_lock);
+			   ),
+
+	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
+		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
+		      __entry->guc_sched_state_no_lock)
+);
+
+DEFINE_EVENT(intel_context, intel_context_register,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_deregister,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_deregister_done,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_enable,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_disable,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_sched_done,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_create,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_fence_release,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_free,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_steal_guc_id,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_do_pin,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
+DEFINE_EVENT(intel_context, intel_context_do_unpin,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 #else
 #if !defined(TRACE_HEADER_MULTI_READ)
 static inline void
@@ -941,6 +1025,66 @@ static inline void
 trace_i915_request_out(struct i915_request *rq)
 {
 }
+
+static inline void
+trace_intel_context_register(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_deregister(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_deregister_done(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_enable(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_disable(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_sched_done(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_create(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_fence_release(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_free(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_steal_guc_id(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_do_pin(struct intel_context *ce)
+{
+}
+
+static inline void
+trace_intel_context_do_unpin(struct intel_context *ce)
+{
+}
 #endif
 #endif
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 19/51] drm/i915/guc: GuC virtual engines
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Implement GuC virtual engines. Rather simple implementation, basically
just allocate an engine, setup context enter / exit function to virtual
engine specific functions, set all other variables / functions to guc
versions, and set the engine mask to that of all the siblings.

v2: Update to work with proto-ctx

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
 drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
 .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
 .../drm/i915/gt/intel_execlists_submission.h  |   4 -
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
 10 files changed, 308 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 64659802d4df..edefe299bd76 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -74,7 +74,6 @@
 #include "gt/intel_context_param.h"
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_engine_user.h"
-#include "gt/intel_execlists_submission.h" /* virtual_engine */
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
 
@@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
 	if (!HAS_EXECLISTS(i915))
 		return -ENODEV;
 
-	if (intel_uc_uses_guc_submission(&i915->gt.uc))
-		return -ENODEV; /* not implement yet */
-
 	if (get_user(idx, &ext->engine_index))
 		return -EFAULT;
 
@@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
 			break;
 
 		case I915_GEM_ENGINE_TYPE_BALANCED:
-			ce = intel_execlists_create_virtual(pe[n].siblings,
-							    pe[n].num_siblings);
+			ce = intel_engine_create_virtual(pe[n].siblings,
+							 pe[n].num_siblings);
 			break;
 
 		case I915_GEM_ENGINE_TYPE_INVALID:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index 20411db84914..2639c719a7a6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -10,6 +10,7 @@
 #include "i915_gem_context_types.h"
 
 #include "gt/intel_context.h"
+#include "gt/intel_engine.h"
 
 #include "i915_drv.h"
 #include "i915_gem.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 4a5518d295c2..542c98418771 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -47,6 +47,12 @@ struct intel_context_ops {
 
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
+
+	/* virtual engine/context interface */
+	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
+						unsigned int count);
+	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
+					       unsigned int sibling);
 };
 
 struct intel_context {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index f911c1224ab2..9fec0aca5f4b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
 	return intel_engine_has_preemption(engine);
 }
 
+struct intel_context *
+intel_engine_create_virtual(struct intel_engine_cs **siblings,
+			    unsigned int count);
+
+static inline bool
+intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
+{
+	if (intel_engine_uses_guc(engine))
+		return intel_guc_virtual_engine_has_heartbeat(engine);
+	else
+		GEM_BUG_ON("Only should be called in GuC submission");
+
+	return false;
+}
+
 static inline bool
 intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
 {
 	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
 		return false;
 
-	return READ_ONCE(engine->props.heartbeat_interval_ms);
+	if (intel_engine_is_virtual(engine))
+		return intel_virtual_engine_has_heartbeat(engine);
+	else
+		return READ_ONCE(engine->props.heartbeat_interval_ms);
+}
+
+static inline struct intel_engine_cs *
+intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
+{
+	GEM_BUG_ON(!intel_engine_is_virtual(engine));
+	return engine->cops->get_sibling(engine, sibling);
 }
 
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index d561573ed98c..b7292d5cb7da 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
 	return total;
 }
 
+struct intel_context *
+intel_engine_create_virtual(struct intel_engine_cs **siblings,
+			    unsigned int count)
+{
+	if (count == 0)
+		return ERR_PTR(-EINVAL);
+
+	if (count == 1)
+		return intel_context_create(siblings[0]);
+
+	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
+	return siblings[0]->cops->create_virtual(siblings, count);
+}
+
 static bool match_ring(struct i915_request *rq)
 {
 	u32 ring = ENGINE_READ(rq->engine, RING_START);
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 56e25090da67..28492cdce706 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
 	return container_of(engine, struct virtual_engine, base);
 }
 
+static struct intel_context *
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+
 static struct i915_request *
 __active_request(const struct intel_timeline * const tl,
 		 struct i915_request *rq,
@@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
 
 	.reset = lrc_reset,
 	.destroy = lrc_destroy,
+
+	.create_virtual = execlists_create_virtual,
 };
 
 static int emit_pdps(struct i915_request *rq)
@@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
 		intel_engine_pm_put(ve->siblings[n]);
 }
 
+static struct intel_engine_cs *
+virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+
+	if (sibling >= ve->num_siblings)
+		return NULL;
+
+	return ve->siblings[sibling];
+}
+
 static const struct intel_context_ops virtual_context_ops = {
 	.flags = COPS_HAS_INFLIGHT,
 
@@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
 	.exit = virtual_context_exit,
 
 	.destroy = virtual_context_destroy,
+
+	.get_sibling = virtual_get_sibling,
 };
 
 static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
@@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
 	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
 }
 
-struct intel_context *
-intel_execlists_create_virtual(struct intel_engine_cs **siblings,
-			       unsigned int count)
+static struct intel_context *
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 {
 	struct virtual_engine *ve;
 	unsigned int n;
 	int err;
 
-	if (count == 0)
-		return ERR_PTR(-EINVAL);
-
-	if (count == 1)
-		return intel_context_create(siblings[0]);
-
 	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
 	if (!ve)
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
index ad4f3e1a0fde..a1aa92c983a5 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
@@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							int indent),
 				   unsigned int max);
 
-struct intel_context *
-intel_execlists_create_virtual(struct intel_engine_cs **siblings,
-			       unsigned int count);
-
 bool
 intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 73ddc6e14730..59cf8afc6d6f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
 	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
 
 	for (n = 0; n < nctx; n++) {
-		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
+		ve[n] = intel_engine_create_virtual(siblings, nsibling);
 		if (IS_ERR(ve[n])) {
 			err = PTR_ERR(ve[n]);
 			nctx = n;
@@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
 	 * restrict it to our desired engine within the virtual engine.
 	 */
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_close;
@@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
 		i915_request_add(rq);
 	}
 
-	ce = intel_execlists_create_virtual(siblings, nsibling);
+	ce = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ce)) {
 		err = PTR_ERR(ce);
 		goto out;
@@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
 
 	/* XXX We do not handle oversubscription and fairness with normal rq */
 	for (n = 0; n < nsibling; n++) {
-		ce = intel_execlists_create_virtual(siblings, nsibling);
+		ce = intel_engine_create_virtual(siblings, nsibling);
 		if (IS_ERR(ce)) {
 			err = PTR_ERR(ce);
 			goto out;
@@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
 	if (err)
 		goto out_scratch;
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_scratch;
@@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_spin;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 05958260e849..7b3e1c91e689 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,15 @@
  *
  */
 
+/* GuC Virtual Engine */
+struct guc_virtual_engine {
+	struct intel_engine_cs base;
+	struct intel_context context;
+};
+
+static struct intel_context *
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
 /*
@@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	return ret;
 }
 
-static int guc_context_pre_pin(struct intel_context *ce,
-			       struct i915_gem_ww_ctx *ww,
-			       void **vaddr)
+static int __guc_context_pre_pin(struct intel_context *ce,
+				 struct intel_engine_cs *engine,
+				 struct i915_gem_ww_ctx *ww,
+				 void **vaddr)
 {
-	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
+	return lrc_pre_pin(ce, engine, ww, vaddr);
 }
 
-static int guc_context_pin(struct intel_context *ce, void *vaddr)
+static int __guc_context_pin(struct intel_context *ce,
+			     struct intel_engine_cs *engine,
+			     void *vaddr)
 {
 	if (i915_ggtt_offset(ce->state) !=
 	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
 		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
-	return lrc_pin(ce, ce->engine, vaddr);
+	return lrc_pin(ce, engine, vaddr);
+}
+
+static int guc_context_pre_pin(struct intel_context *ce,
+			       struct i915_gem_ww_ctx *ww,
+			       void **vaddr)
+{
+	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
+}
+
+static int guc_context_pin(struct intel_context *ce, void *vaddr)
+{
+	return __guc_context_pin(ce, ce->engine, vaddr);
 }
 
 static void guc_context_unpin(struct intel_context *ce)
@@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	deregister_context(ce, ce->guc_id);
 }
 
+static void __guc_context_destroy(struct intel_context *ce)
+{
+	lrc_fini(ce);
+	intel_context_fini(ce);
+
+	if (intel_engine_is_virtual(ce->engine)) {
+		struct guc_virtual_engine *ve =
+			container_of(ce, typeof(*ve), context);
+
+		kfree(ve);
+	} else {
+		intel_context_free(ce);
+	}
+}
+
 static void guc_context_destroy(struct kref *kref)
 {
 	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
@@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
 	if (context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		release_guc_id(guc, ce);
-		lrc_destroy(kref);
+		__guc_context_destroy(ce);
 		return;
 	}
 
@@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
 	if (context_guc_id_invalid(ce)) {
 		__release_guc_id(guc, ce);
 		spin_unlock_irqrestore(&guc->contexts_lock, flags);
-		lrc_destroy(kref);
+		__guc_context_destroy(ce);
 		return;
 	}
 
@@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
 
 	.reset = lrc_reset,
 	.destroy = guc_context_destroy,
+
+	.create_virtual = guc_create_virtual,
 };
 
 static void __guc_signal_context_fence(struct intel_context *ce)
@@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
 	return 0;
 }
 
+static struct intel_engine_cs *
+guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (num_siblings++ == sibling)
+			return engine;
+
+	return NULL;
+}
+
+static int guc_virtual_context_pre_pin(struct intel_context *ce,
+				       struct i915_gem_ww_ctx *ww,
+				       void **vaddr)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return __guc_context_pre_pin(ce, engine, ww, vaddr);
+}
+
+static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return __guc_context_pin(ce, engine, vaddr);
+}
+
+static void guc_virtual_context_enter(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_get(engine);
+
+	intel_timeline_enter(ce->timeline);
+}
+
+static void guc_virtual_context_exit(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_put(engine);
+
+	intel_timeline_exit(ce->timeline);
+}
+
+static int guc_virtual_context_alloc(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return lrc_alloc(ce, engine);
+}
+
+static const struct intel_context_ops virtual_guc_context_ops = {
+	.alloc = guc_virtual_context_alloc,
+
+	.pre_pin = guc_virtual_context_pre_pin,
+	.pin = guc_virtual_context_pin,
+	.unpin = guc_context_unpin,
+	.post_unpin = guc_context_post_unpin,
+
+	.enter = guc_virtual_context_enter,
+	.exit = guc_virtual_context_exit,
+
+	.sched_disable = guc_context_sched_disable,
+
+	.destroy = guc_context_destroy,
+
+	.get_sibling = guc_virtual_get_sibling,
+};
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
 		release_guc_id(guc, ce);
-		lrc_destroy(&ce->ref);
+		__guc_context_destroy(ce);
 	}
 
 	decr_outstanding_submission_g2h(guc);
@@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 			   atomic_read(&ce->guc_sched_state_no_lock));
 	}
 }
+
+static struct intel_context *
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
+{
+	struct guc_virtual_engine *ve;
+	struct intel_guc *guc;
+	unsigned int n;
+	int err;
+
+	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
+	if (!ve)
+		return ERR_PTR(-ENOMEM);
+
+	guc = &siblings[0]->gt->uc.guc;
+
+	ve->base.i915 = siblings[0]->i915;
+	ve->base.gt = siblings[0]->gt;
+	ve->base.uncore = siblings[0]->uncore;
+	ve->base.id = -1;
+
+	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
+	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.saturated = ALL_ENGINES;
+	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
+	if (!ve->base.breadcrumbs) {
+		kfree(ve);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
+
+	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
+
+	ve->base.cops = &virtual_guc_context_ops;
+	ve->base.request_alloc = guc_request_alloc;
+
+	ve->base.submit_request = guc_submit_request;
+
+	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+
+	intel_context_init(&ve->context, &ve->base);
+
+	for (n = 0; n < count; n++) {
+		struct intel_engine_cs *sibling = siblings[n];
+
+		GEM_BUG_ON(!is_power_of_2(sibling->mask));
+		if (sibling->mask & ve->base.mask) {
+			DRM_DEBUG("duplicate %s entry in load balancer\n",
+				  sibling->name);
+			err = -EINVAL;
+			goto err_put;
+		}
+
+		ve->base.mask |= sibling->mask;
+
+		if (n != 0 && ve->base.class != sibling->class) {
+			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
+				  sibling->class, ve->base.class);
+			err = -EINVAL;
+			goto err_put;
+		} else if (n == 0) {
+			ve->base.class = sibling->class;
+			ve->base.uabi_class = sibling->uabi_class;
+			snprintf(ve->base.name, sizeof(ve->base.name),
+				 "v%dx%d", ve->base.class, count);
+			ve->base.context_size = sibling->context_size;
+
+			ve->base.emit_bb_start = sibling->emit_bb_start;
+			ve->base.emit_flush = sibling->emit_flush;
+			ve->base.emit_init_breadcrumb =
+				sibling->emit_init_breadcrumb;
+			ve->base.emit_fini_breadcrumb =
+				sibling->emit_fini_breadcrumb;
+			ve->base.emit_fini_breadcrumb_dw =
+				sibling->emit_fini_breadcrumb_dw;
+
+			ve->base.flags |= sibling->flags;
+
+			ve->base.props.timeslice_duration_ms =
+				sibling->props.timeslice_duration_ms;
+		}
+	}
+
+	return &ve->context;
+
+err_put:
+	intel_context_put(&ve->context);
+	return ERR_PTR(err);
+}
+
+
+
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (READ_ONCE(engine->props.heartbeat_interval_ms))
+			return true;
+
+	return false;
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 2b9470c90558..5f263ac4f46a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 void intel_guc_submission_print_context_info(struct intel_guc *guc,
 					     struct drm_printer *p);
 
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
 	/* XXX: GuC submission is unavailable for now */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 19/51] drm/i915/guc: GuC virtual engines
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Implement GuC virtual engines. Rather simple implementation, basically
just allocate an engine, setup context enter / exit function to virtual
engine specific functions, set all other variables / functions to guc
versions, and set the engine mask to that of all the siblings.

v2: Update to work with proto-ctx

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
 drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
 .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
 .../drm/i915/gt/intel_execlists_submission.h  |   4 -
 drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
 10 files changed, 308 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 64659802d4df..edefe299bd76 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -74,7 +74,6 @@
 #include "gt/intel_context_param.h"
 #include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_engine_user.h"
-#include "gt/intel_execlists_submission.h" /* virtual_engine */
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
 
@@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
 	if (!HAS_EXECLISTS(i915))
 		return -ENODEV;
 
-	if (intel_uc_uses_guc_submission(&i915->gt.uc))
-		return -ENODEV; /* not implement yet */
-
 	if (get_user(idx, &ext->engine_index))
 		return -EFAULT;
 
@@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
 			break;
 
 		case I915_GEM_ENGINE_TYPE_BALANCED:
-			ce = intel_execlists_create_virtual(pe[n].siblings,
-							    pe[n].num_siblings);
+			ce = intel_engine_create_virtual(pe[n].siblings,
+							 pe[n].num_siblings);
 			break;
 
 		case I915_GEM_ENGINE_TYPE_INVALID:
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index 20411db84914..2639c719a7a6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -10,6 +10,7 @@
 #include "i915_gem_context_types.h"
 
 #include "gt/intel_context.h"
+#include "gt/intel_engine.h"
 
 #include "i915_drv.h"
 #include "i915_gem.h"
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 4a5518d295c2..542c98418771 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -47,6 +47,12 @@ struct intel_context_ops {
 
 	void (*reset)(struct intel_context *ce);
 	void (*destroy)(struct kref *kref);
+
+	/* virtual engine/context interface */
+	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
+						unsigned int count);
+	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
+					       unsigned int sibling);
 };
 
 struct intel_context {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index f911c1224ab2..9fec0aca5f4b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
 	return intel_engine_has_preemption(engine);
 }
 
+struct intel_context *
+intel_engine_create_virtual(struct intel_engine_cs **siblings,
+			    unsigned int count);
+
+static inline bool
+intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
+{
+	if (intel_engine_uses_guc(engine))
+		return intel_guc_virtual_engine_has_heartbeat(engine);
+	else
+		GEM_BUG_ON("Only should be called in GuC submission");
+
+	return false;
+}
+
 static inline bool
 intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
 {
 	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
 		return false;
 
-	return READ_ONCE(engine->props.heartbeat_interval_ms);
+	if (intel_engine_is_virtual(engine))
+		return intel_virtual_engine_has_heartbeat(engine);
+	else
+		return READ_ONCE(engine->props.heartbeat_interval_ms);
+}
+
+static inline struct intel_engine_cs *
+intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
+{
+	GEM_BUG_ON(!intel_engine_is_virtual(engine));
+	return engine->cops->get_sibling(engine, sibling);
 }
 
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index d561573ed98c..b7292d5cb7da 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
 	return total;
 }
 
+struct intel_context *
+intel_engine_create_virtual(struct intel_engine_cs **siblings,
+			    unsigned int count)
+{
+	if (count == 0)
+		return ERR_PTR(-EINVAL);
+
+	if (count == 1)
+		return intel_context_create(siblings[0]);
+
+	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
+	return siblings[0]->cops->create_virtual(siblings, count);
+}
+
 static bool match_ring(struct i915_request *rq)
 {
 	u32 ring = ENGINE_READ(rq->engine, RING_START);
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 56e25090da67..28492cdce706 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
 	return container_of(engine, struct virtual_engine, base);
 }
 
+static struct intel_context *
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+
 static struct i915_request *
 __active_request(const struct intel_timeline * const tl,
 		 struct i915_request *rq,
@@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
 
 	.reset = lrc_reset,
 	.destroy = lrc_destroy,
+
+	.create_virtual = execlists_create_virtual,
 };
 
 static int emit_pdps(struct i915_request *rq)
@@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
 		intel_engine_pm_put(ve->siblings[n]);
 }
 
+static struct intel_engine_cs *
+virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+
+	if (sibling >= ve->num_siblings)
+		return NULL;
+
+	return ve->siblings[sibling];
+}
+
 static const struct intel_context_ops virtual_context_ops = {
 	.flags = COPS_HAS_INFLIGHT,
 
@@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
 	.exit = virtual_context_exit,
 
 	.destroy = virtual_context_destroy,
+
+	.get_sibling = virtual_get_sibling,
 };
 
 static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
@@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
 	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
 }
 
-struct intel_context *
-intel_execlists_create_virtual(struct intel_engine_cs **siblings,
-			       unsigned int count)
+static struct intel_context *
+execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 {
 	struct virtual_engine *ve;
 	unsigned int n;
 	int err;
 
-	if (count == 0)
-		return ERR_PTR(-EINVAL);
-
-	if (count == 1)
-		return intel_context_create(siblings[0]);
-
 	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
 	if (!ve)
 		return ERR_PTR(-ENOMEM);
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
index ad4f3e1a0fde..a1aa92c983a5 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
@@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							int indent),
 				   unsigned int max);
 
-struct intel_context *
-intel_execlists_create_virtual(struct intel_engine_cs **siblings,
-			       unsigned int count);
-
 bool
 intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
index 73ddc6e14730..59cf8afc6d6f 100644
--- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
+++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
@@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
 	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
 
 	for (n = 0; n < nctx; n++) {
-		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
+		ve[n] = intel_engine_create_virtual(siblings, nsibling);
 		if (IS_ERR(ve[n])) {
 			err = PTR_ERR(ve[n]);
 			nctx = n;
@@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
 	 * restrict it to our desired engine within the virtual engine.
 	 */
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_close;
@@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
 		i915_request_add(rq);
 	}
 
-	ce = intel_execlists_create_virtual(siblings, nsibling);
+	ce = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ce)) {
 		err = PTR_ERR(ce);
 		goto out;
@@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
 
 	/* XXX We do not handle oversubscription and fairness with normal rq */
 	for (n = 0; n < nsibling; n++) {
-		ce = intel_execlists_create_virtual(siblings, nsibling);
+		ce = intel_engine_create_virtual(siblings, nsibling);
 		if (IS_ERR(ce)) {
 			err = PTR_ERR(ce);
 			goto out;
@@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
 	if (err)
 		goto out_scratch;
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_scratch;
@@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	if (igt_spinner_init(&spin, gt))
 		return -ENOMEM;
 
-	ve = intel_execlists_create_virtual(siblings, nsibling);
+	ve = intel_engine_create_virtual(siblings, nsibling);
 	if (IS_ERR(ve)) {
 		err = PTR_ERR(ve);
 		goto out_spin;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 05958260e849..7b3e1c91e689 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -60,6 +60,15 @@
  *
  */
 
+/* GuC Virtual Engine */
+struct guc_virtual_engine {
+	struct intel_engine_cs base;
+	struct intel_context context;
+};
+
+static struct intel_context *
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
+
 #define GUC_REQUEST_SIZE 64 /* bytes */
 
 /*
@@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	return ret;
 }
 
-static int guc_context_pre_pin(struct intel_context *ce,
-			       struct i915_gem_ww_ctx *ww,
-			       void **vaddr)
+static int __guc_context_pre_pin(struct intel_context *ce,
+				 struct intel_engine_cs *engine,
+				 struct i915_gem_ww_ctx *ww,
+				 void **vaddr)
 {
-	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
+	return lrc_pre_pin(ce, engine, ww, vaddr);
 }
 
-static int guc_context_pin(struct intel_context *ce, void *vaddr)
+static int __guc_context_pin(struct intel_context *ce,
+			     struct intel_engine_cs *engine,
+			     void *vaddr)
 {
 	if (i915_ggtt_offset(ce->state) !=
 	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
 		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
 
-	return lrc_pin(ce, ce->engine, vaddr);
+	return lrc_pin(ce, engine, vaddr);
+}
+
+static int guc_context_pre_pin(struct intel_context *ce,
+			       struct i915_gem_ww_ctx *ww,
+			       void **vaddr)
+{
+	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
+}
+
+static int guc_context_pin(struct intel_context *ce, void *vaddr)
+{
+	return __guc_context_pin(ce, ce->engine, vaddr);
 }
 
 static void guc_context_unpin(struct intel_context *ce)
@@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	deregister_context(ce, ce->guc_id);
 }
 
+static void __guc_context_destroy(struct intel_context *ce)
+{
+	lrc_fini(ce);
+	intel_context_fini(ce);
+
+	if (intel_engine_is_virtual(ce->engine)) {
+		struct guc_virtual_engine *ve =
+			container_of(ce, typeof(*ve), context);
+
+		kfree(ve);
+	} else {
+		intel_context_free(ce);
+	}
+}
+
 static void guc_context_destroy(struct kref *kref)
 {
 	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
@@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
 	if (context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		release_guc_id(guc, ce);
-		lrc_destroy(kref);
+		__guc_context_destroy(ce);
 		return;
 	}
 
@@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
 	if (context_guc_id_invalid(ce)) {
 		__release_guc_id(guc, ce);
 		spin_unlock_irqrestore(&guc->contexts_lock, flags);
-		lrc_destroy(kref);
+		__guc_context_destroy(ce);
 		return;
 	}
 
@@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
 
 	.reset = lrc_reset,
 	.destroy = guc_context_destroy,
+
+	.create_virtual = guc_create_virtual,
 };
 
 static void __guc_signal_context_fence(struct intel_context *ce)
@@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
 	return 0;
 }
 
+static struct intel_engine_cs *
+guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (num_siblings++ == sibling)
+			return engine;
+
+	return NULL;
+}
+
+static int guc_virtual_context_pre_pin(struct intel_context *ce,
+				       struct i915_gem_ww_ctx *ww,
+				       void **vaddr)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return __guc_context_pre_pin(ce, engine, ww, vaddr);
+}
+
+static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return __guc_context_pin(ce, engine, vaddr);
+}
+
+static void guc_virtual_context_enter(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_get(engine);
+
+	intel_timeline_enter(ce->timeline);
+}
+
+static void guc_virtual_context_exit(struct intel_context *ce)
+{
+	intel_engine_mask_t tmp, mask = ce->engine->mask;
+	struct intel_engine_cs *engine;
+
+	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
+		intel_engine_pm_put(engine);
+
+	intel_timeline_exit(ce->timeline);
+}
+
+static int guc_virtual_context_alloc(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
+
+	return lrc_alloc(ce, engine);
+}
+
+static const struct intel_context_ops virtual_guc_context_ops = {
+	.alloc = guc_virtual_context_alloc,
+
+	.pre_pin = guc_virtual_context_pre_pin,
+	.pin = guc_virtual_context_pin,
+	.unpin = guc_context_unpin,
+	.post_unpin = guc_context_post_unpin,
+
+	.enter = guc_virtual_context_enter,
+	.exit = guc_virtual_context_exit,
+
+	.sched_disable = guc_context_sched_disable,
+
+	.destroy = guc_context_destroy,
+
+	.get_sibling = guc_virtual_get_sibling,
+};
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 	} else if (context_destroyed(ce)) {
 		/* Context has been destroyed */
 		release_guc_id(guc, ce);
-		lrc_destroy(&ce->ref);
+		__guc_context_destroy(ce);
 	}
 
 	decr_outstanding_submission_g2h(guc);
@@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 			   atomic_read(&ce->guc_sched_state_no_lock));
 	}
 }
+
+static struct intel_context *
+guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
+{
+	struct guc_virtual_engine *ve;
+	struct intel_guc *guc;
+	unsigned int n;
+	int err;
+
+	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
+	if (!ve)
+		return ERR_PTR(-ENOMEM);
+
+	guc = &siblings[0]->gt->uc.guc;
+
+	ve->base.i915 = siblings[0]->i915;
+	ve->base.gt = siblings[0]->gt;
+	ve->base.uncore = siblings[0]->uncore;
+	ve->base.id = -1;
+
+	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
+	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.saturated = ALL_ENGINES;
+	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
+	if (!ve->base.breadcrumbs) {
+		kfree(ve);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
+
+	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
+
+	ve->base.cops = &virtual_guc_context_ops;
+	ve->base.request_alloc = guc_request_alloc;
+
+	ve->base.submit_request = guc_submit_request;
+
+	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+
+	intel_context_init(&ve->context, &ve->base);
+
+	for (n = 0; n < count; n++) {
+		struct intel_engine_cs *sibling = siblings[n];
+
+		GEM_BUG_ON(!is_power_of_2(sibling->mask));
+		if (sibling->mask & ve->base.mask) {
+			DRM_DEBUG("duplicate %s entry in load balancer\n",
+				  sibling->name);
+			err = -EINVAL;
+			goto err_put;
+		}
+
+		ve->base.mask |= sibling->mask;
+
+		if (n != 0 && ve->base.class != sibling->class) {
+			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
+				  sibling->class, ve->base.class);
+			err = -EINVAL;
+			goto err_put;
+		} else if (n == 0) {
+			ve->base.class = sibling->class;
+			ve->base.uabi_class = sibling->uabi_class;
+			snprintf(ve->base.name, sizeof(ve->base.name),
+				 "v%dx%d", ve->base.class, count);
+			ve->base.context_size = sibling->context_size;
+
+			ve->base.emit_bb_start = sibling->emit_bb_start;
+			ve->base.emit_flush = sibling->emit_flush;
+			ve->base.emit_init_breadcrumb =
+				sibling->emit_init_breadcrumb;
+			ve->base.emit_fini_breadcrumb =
+				sibling->emit_fini_breadcrumb;
+			ve->base.emit_fini_breadcrumb_dw =
+				sibling->emit_fini_breadcrumb_dw;
+
+			ve->base.flags |= sibling->flags;
+
+			ve->base.props.timeslice_duration_ms =
+				sibling->props.timeslice_duration_ms;
+		}
+	}
+
+	return &ve->context;
+
+err_put:
+	intel_context_put(&ve->context);
+	return ERR_PTR(err);
+}
+
+
+
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (READ_ONCE(engine->props.heartbeat_interval_ms))
+			return true;
+
+	return false;
+}
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 2b9470c90558..5f263ac4f46a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 void intel_guc_submission_print_context_info(struct intel_guc *guc,
 					     struct drm_printer *p);
 
+bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
 	/* XXX: GuC submission is unavailable for now */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of virtual
to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. This
is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do given
that it has no knowledge of the GuC's scheduling decisions.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
 .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
 drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
 drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c              |  4 +++-
 6 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 1cb9c3b70b29..8ad304b2f2e4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -388,6 +388,8 @@ struct intel_engine_cs {
 	void		(*park)(struct intel_engine_cs *engine);
 	void		(*unpark)(struct intel_engine_cs *engine);
 
+	void		(*bump_serial)(struct intel_engine_cs *engine);
+
 	void		(*set_default_submission)(struct intel_engine_cs *engine);
 
 	const struct intel_context_ops *cops;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 28492cdce706..920707e22eb0 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs *engine)
 	lrc_fini_wa_ctx(engine);
 }
 
+static void execlist_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 static void
 logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 {
@@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->cops = &execlists_context_ops;
 	engine->request_alloc = execlists_request_alloc;
+	engine->bump_serial = execlist_bump_serial;
 
 	engine->reset.prepare = execlists_reset_prepare;
 	engine->reset.rewind = execlists_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 5c4d204d07cc..61469c631057 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine)
 	}
 }
 
+static void ring_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 static void setup_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine)
 
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
+	engine->bump_serial = ring_bump_serial;
 
 	/*
 	 * Using a global execution timeline; the previous final breadcrumb is
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 68970398e4ef..9203c766db80 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 	intel_engine_fini_retire(engine);
 }
 
+static void mock_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 				    const char *name,
 				    int id)
@@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 
 	engine->base.cops = &mock_context_ops;
 	engine->base.request_alloc = mock_request_alloc;
+	engine->base.bump_serial = mock_bump_serial;
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 7b3e1c91e689..372e0dc7617a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1485,6 +1485,20 @@ static void guc_release(struct intel_engine_cs *engine)
 	lrc_fini_wa_ctx(engine);
 }
 
+static void guc_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
+static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
+{
+	struct intel_engine_cs *e;
+	intel_engine_mask_t tmp, mask = engine->mask;
+
+	for_each_engine_masked(e, engine->gt, mask, tmp)
+		e->serial++;
+}
+
 static void guc_default_vfuncs(struct intel_engine_cs *engine)
 {
 	/* Default vfuncs which can be overridden by each engine. */
@@ -1493,6 +1507,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
+	engine->bump_serial = guc_bump_serial;
 
 	engine->sched_engine->schedule = i915_schedule;
 
@@ -1828,6 +1843,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 
 	ve->base.cops = &virtual_guc_context_ops;
 	ve->base.request_alloc = guc_request_alloc;
+	ve->base.bump_serial = virtual_guc_bump_serial;
 
 	ve->base.submit_request = guc_submit_request;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 01aa3d1ee2b1..30ecdc46a12f 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -669,7 +669,9 @@ bool __i915_request_submit(struct i915_request *request)
 				     request->ring->vaddr + request->postfix);
 
 	trace_i915_request_execute(request);
-	engine->serial++;
+	if (engine->bump_serial)
+		engine->bump_serial(engine);
+
 	result = true;
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of virtual
to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. This
is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do given
that it has no knowledge of the GuC's scheduling decisions.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
 .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
 drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
 drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c              |  4 +++-
 6 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 1cb9c3b70b29..8ad304b2f2e4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -388,6 +388,8 @@ struct intel_engine_cs {
 	void		(*park)(struct intel_engine_cs *engine);
 	void		(*unpark)(struct intel_engine_cs *engine);
 
+	void		(*bump_serial)(struct intel_engine_cs *engine);
+
 	void		(*set_default_submission)(struct intel_engine_cs *engine);
 
 	const struct intel_context_ops *cops;
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 28492cdce706..920707e22eb0 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs *engine)
 	lrc_fini_wa_ctx(engine);
 }
 
+static void execlist_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 static void
 logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 {
@@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->cops = &execlists_context_ops;
 	engine->request_alloc = execlists_request_alloc;
+	engine->bump_serial = execlist_bump_serial;
 
 	engine->reset.prepare = execlists_reset_prepare;
 	engine->reset.rewind = execlists_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 5c4d204d07cc..61469c631057 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine)
 	}
 }
 
+static void ring_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 static void setup_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine)
 
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
+	engine->bump_serial = ring_bump_serial;
 
 	/*
 	 * Using a global execution timeline; the previous final breadcrumb is
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 68970398e4ef..9203c766db80 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 	intel_engine_fini_retire(engine);
 }
 
+static void mock_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 				    const char *name,
 				    int id)
@@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 
 	engine->base.cops = &mock_context_ops;
 	engine->base.request_alloc = mock_request_alloc;
+	engine->base.bump_serial = mock_bump_serial;
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 7b3e1c91e689..372e0dc7617a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1485,6 +1485,20 @@ static void guc_release(struct intel_engine_cs *engine)
 	lrc_fini_wa_ctx(engine);
 }
 
+static void guc_bump_serial(struct intel_engine_cs *engine)
+{
+	engine->serial++;
+}
+
+static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
+{
+	struct intel_engine_cs *e;
+	intel_engine_mask_t tmp, mask = engine->mask;
+
+	for_each_engine_masked(e, engine->gt, mask, tmp)
+		e->serial++;
+}
+
 static void guc_default_vfuncs(struct intel_engine_cs *engine)
 {
 	/* Default vfuncs which can be overridden by each engine. */
@@ -1493,6 +1507,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
+	engine->bump_serial = guc_bump_serial;
 
 	engine->sched_engine->schedule = i915_schedule;
 
@@ -1828,6 +1843,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 
 	ve->base.cops = &virtual_guc_context_ops;
 	ve->base.request_alloc = guc_request_alloc;
+	ve->base.bump_serial = virtual_guc_bump_serial;
 
 	ve->base.submit_request = guc_submit_request;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 01aa3d1ee2b1..30ecdc46a12f 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -669,7 +669,9 @@ bool __i915_request_submit(struct i915_request *request)
 				     request->ring->vaddr + request->postfix);
 
 	trace_i915_request_execute(request);
-	engine->serial++;
+	if (engine->bump_serial)
+		engine->bump_serial(engine);
+
 	result = true;
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 21/51] drm/i915: Hold reference to intel_context over life of i915_request
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Hold a reference to the intel_context over life of an i915_request.
Without this an i915_request can exist after the context has been
destroyed (e.g. request retired, context closed, but user space holds a
reference to the request from an out fence). In the case of GuC
submission + virtual engine, the engine that the request references is
also destroyed which can trigger bad pointer dref in fence ops (e.g.
i915_fence_get_driver_name). We could likely change
i915_fence_get_driver_name to avoid touching the engine but let's just
be safe and hold the intel_context reference.

v2:
 (John Harrison)
  - Update comment explaining how GuC mode and execlists mode deal with
    virtual engines differently

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 55 ++++++++++++-----------------
 1 file changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 30ecdc46a12f..b3c792d55321 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -125,39 +125,17 @@ static void i915_fence_release(struct dma_fence *fence)
 	i915_sw_fence_fini(&rq->semaphore);
 
 	/*
-	 * Keep one request on each engine for reserved use under mempressure
-	 *
-	 * We do not hold a reference to the engine here and so have to be
-	 * very careful in what rq->engine we poke. The virtual engine is
-	 * referenced via the rq->context and we released that ref during
-	 * i915_request_retire(), ergo we must not dereference a virtual
-	 * engine here. Not that we would want to, as the only consumer of
-	 * the reserved engine->request_pool is the power management parking,
-	 * which must-not-fail, and that is only run on the physical engines.
-	 *
-	 * Since the request must have been executed to be have completed,
-	 * we know that it will have been processed by the HW and will
-	 * not be unsubmitted again, so rq->engine and rq->execution_mask
-	 * at this point is stable. rq->execution_mask will be a single
-	 * bit if the last and _only_ engine it could execution on was a
-	 * physical engine, if it's multiple bits then it started on and
-	 * could still be on a virtual engine. Thus if the mask is not a
-	 * power-of-two we assume that rq->engine may still be a virtual
-	 * engine and so a dangling invalid pointer that we cannot dereference
-	 *
-	 * For example, consider the flow of a bonded request through a virtual
-	 * engine. The request is created with a wide engine mask (all engines
-	 * that we might execute on). On processing the bond, the request mask
-	 * is reduced to one or more engines. If the request is subsequently
-	 * bound to a single engine, it will then be constrained to only
-	 * execute on that engine and never returned to the virtual engine
-	 * after timeslicing away, see __unwind_incomplete_requests(). Thus we
-	 * know that if the rq->execution_mask is a single bit, rq->engine
-	 * can be a physical engine with the exact corresponding mask.
+	 * Keep one request on each engine for reserved use under mempressure,
+	 * do not use with virtual engines as this really is only needed for
+	 * kernel contexts.
 	 */
-	if (is_power_of_2(rq->execution_mask) &&
-	    !cmpxchg(&rq->engine->request_pool, NULL, rq))
+	if (!intel_engine_is_virtual(rq->engine) &&
+	    !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
+		intel_context_put(rq->context);
 		return;
+	}
+
+	intel_context_put(rq->context);
 
 	kmem_cache_free(global.slab_requests, rq);
 }
@@ -954,7 +932,19 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 		}
 	}
 
-	rq->context = ce;
+	/*
+	 * Hold a reference to the intel_context over life of an i915_request.
+	 * Without this an i915_request can exist after the context has been
+	 * destroyed (e.g. request retired, context closed, but user space holds
+	 * a reference to the request from an out fence). In the case of GuC
+	 * submission + virtual engine, the engine that the request references
+	 * is also destroyed which can trigger bad pointer dref in fence ops
+	 * (e.g. i915_fence_get_driver_name). We could likely change these
+	 * functions to avoid touching the engine but let's just be safe and
+	 * hold the intel_context reference. In execlist mode the request always
+	 * eventually points to a physical engine so this isn't an issue.
+	 */
+	rq->context = intel_context_get(ce);
 	rq->engine = ce->engine;
 	rq->ring = ce->ring;
 	rq->execution_mask = ce->engine->mask;
@@ -1031,6 +1021,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
 err_free:
+	intel_context_put(ce);
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	intel_context_unpin(ce);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 21/51] drm/i915: Hold reference to intel_context over life of i915_request
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Hold a reference to the intel_context over life of an i915_request.
Without this an i915_request can exist after the context has been
destroyed (e.g. request retired, context closed, but user space holds a
reference to the request from an out fence). In the case of GuC
submission + virtual engine, the engine that the request references is
also destroyed which can trigger bad pointer dref in fence ops (e.g.
i915_fence_get_driver_name). We could likely change
i915_fence_get_driver_name to avoid touching the engine but let's just
be safe and hold the intel_context reference.

v2:
 (John Harrison)
  - Update comment explaining how GuC mode and execlists mode deal with
    virtual engines differently

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 55 ++++++++++++-----------------
 1 file changed, 23 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 30ecdc46a12f..b3c792d55321 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -125,39 +125,17 @@ static void i915_fence_release(struct dma_fence *fence)
 	i915_sw_fence_fini(&rq->semaphore);
 
 	/*
-	 * Keep one request on each engine for reserved use under mempressure
-	 *
-	 * We do not hold a reference to the engine here and so have to be
-	 * very careful in what rq->engine we poke. The virtual engine is
-	 * referenced via the rq->context and we released that ref during
-	 * i915_request_retire(), ergo we must not dereference a virtual
-	 * engine here. Not that we would want to, as the only consumer of
-	 * the reserved engine->request_pool is the power management parking,
-	 * which must-not-fail, and that is only run on the physical engines.
-	 *
-	 * Since the request must have been executed to be have completed,
-	 * we know that it will have been processed by the HW and will
-	 * not be unsubmitted again, so rq->engine and rq->execution_mask
-	 * at this point is stable. rq->execution_mask will be a single
-	 * bit if the last and _only_ engine it could execution on was a
-	 * physical engine, if it's multiple bits then it started on and
-	 * could still be on a virtual engine. Thus if the mask is not a
-	 * power-of-two we assume that rq->engine may still be a virtual
-	 * engine and so a dangling invalid pointer that we cannot dereference
-	 *
-	 * For example, consider the flow of a bonded request through a virtual
-	 * engine. The request is created with a wide engine mask (all engines
-	 * that we might execute on). On processing the bond, the request mask
-	 * is reduced to one or more engines. If the request is subsequently
-	 * bound to a single engine, it will then be constrained to only
-	 * execute on that engine and never returned to the virtual engine
-	 * after timeslicing away, see __unwind_incomplete_requests(). Thus we
-	 * know that if the rq->execution_mask is a single bit, rq->engine
-	 * can be a physical engine with the exact corresponding mask.
+	 * Keep one request on each engine for reserved use under mempressure,
+	 * do not use with virtual engines as this really is only needed for
+	 * kernel contexts.
 	 */
-	if (is_power_of_2(rq->execution_mask) &&
-	    !cmpxchg(&rq->engine->request_pool, NULL, rq))
+	if (!intel_engine_is_virtual(rq->engine) &&
+	    !cmpxchg(&rq->engine->request_pool, NULL, rq)) {
+		intel_context_put(rq->context);
 		return;
+	}
+
+	intel_context_put(rq->context);
 
 	kmem_cache_free(global.slab_requests, rq);
 }
@@ -954,7 +932,19 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 		}
 	}
 
-	rq->context = ce;
+	/*
+	 * Hold a reference to the intel_context over life of an i915_request.
+	 * Without this an i915_request can exist after the context has been
+	 * destroyed (e.g. request retired, context closed, but user space holds
+	 * a reference to the request from an out fence). In the case of GuC
+	 * submission + virtual engine, the engine that the request references
+	 * is also destroyed which can trigger bad pointer dref in fence ops
+	 * (e.g. i915_fence_get_driver_name). We could likely change these
+	 * functions to avoid touching the engine but let's just be safe and
+	 * hold the intel_context reference. In execlist mode the request always
+	 * eventually points to a physical engine so this isn't an issue.
+	 */
+	rq->context = intel_context_get(ce);
 	rq->engine = ce->engine;
 	rq->ring = ce->ring;
 	rq->execution_mask = ce->engine->mask;
@@ -1031,6 +1021,7 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
 err_free:
+	intel_context_put(ce);
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	intel_context_unpin(ce);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 22/51] drm/i915/guc: Disable bonding extension with GuC submission
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Update the bonding extension to return -ENODEV when using GuC submission
as this extension fundamentally will not work with the GuC submission
interface.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index edefe299bd76..28c62f7ccfc7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -491,6 +491,11 @@ set_proto_ctx_engines_bond(struct i915_user_extension __user *base, void *data)
 		return -EINVAL;
 	}
 
+	if (intel_engine_uses_guc(master)) {
+		DRM_DEBUG("bonding extension not supported with GuC submission");
+		return -ENODEV;
+	}
+
 	if (get_user(num_bonds, &ext->num_bonds))
 		return -EFAULT;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 22/51] drm/i915/guc: Disable bonding extension with GuC submission
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Update the bonding extension to return -ENODEV when using GuC submission
as this extension fundamentally will not work with the GuC submission
interface.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index edefe299bd76..28c62f7ccfc7 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -491,6 +491,11 @@ set_proto_ctx_engines_bond(struct i915_user_extension __user *base, void *data)
 		return -EINVAL;
 	}
 
+	if (intel_engine_uses_guc(master)) {
+		DRM_DEBUG("bonding extension not supported with GuC submission");
+		return -ENODEV;
+	}
+
 	if (get_user(num_bonds, &ext->num_bonds))
 		return -EFAULT;
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

With GuC virtual engines the physical engine which a request executes
and completes on isn't known to the i915. Therefore we can't attach a
request to a physical engines breadcrumbs. To work around this we create
a single breadcrumbs per engine class when using GuC submission and
direct all physical engine interrupts to this breadcrumbs.

v2:
 (John H)
  - Rework header file structure so intel_engine_mask_t can be in
    intel_engine_types.h

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 ++++-
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
 drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
 9 files changed, 133 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb..2007dc6f6b99 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -15,28 +15,14 @@
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 
-static bool irq_enable(struct intel_engine_cs *engine)
+static bool irq_enable(struct intel_breadcrumbs *b)
 {
-	if (!engine->irq_enable)
-		return false;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->gt->irq_lock);
-	engine->irq_enable(engine);
-	spin_unlock(&engine->gt->irq_lock);
-
-	return true;
+	return intel_engine_irq_enable(b->irq_engine);
 }
 
-static void irq_disable(struct intel_engine_cs *engine)
+static void irq_disable(struct intel_breadcrumbs *b)
 {
-	if (!engine->irq_disable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->gt->irq_lock);
-	engine->irq_disable(engine);
-	spin_unlock(&engine->gt->irq_lock);
+	intel_engine_irq_disable(b->irq_engine);
 }
 
 static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
@@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 	WRITE_ONCE(b->irq_armed, true);
 
 	/* Requests may have completed before we could enable the interrupt. */
-	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
+	if (!b->irq_enabled++ && b->irq_enable(b))
 		irq_work_queue(&b->irq_work);
 }
 
@@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 {
 	GEM_BUG_ON(!b->irq_enabled);
 	if (!--b->irq_enabled)
-		irq_disable(b->irq_engine);
+		b->irq_disable(b);
 
 	WRITE_ONCE(b->irq_armed, false);
 	intel_gt_pm_put_async(b->irq_engine->gt);
@@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 	if (!b)
 		return NULL;
 
-	b->irq_engine = irq_engine;
+	kref_init(&b->ref);
 
 	spin_lock_init(&b->signalers_lock);
 	INIT_LIST_HEAD(&b->signalers);
@@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 	spin_lock_init(&b->irq_lock);
 	init_irq_work(&b->irq_work, signal_irq_work);
 
+	b->irq_engine = irq_engine;
+	b->irq_enable = irq_enable;
+	b->irq_disable = irq_disable;
+
 	return b;
 }
 
@@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
 	spin_lock_irqsave(&b->irq_lock, flags);
 
 	if (b->irq_enabled)
-		irq_enable(b->irq_engine);
+		b->irq_enable(b);
 	else
-		irq_disable(b->irq_engine);
+		b->irq_disable(b);
 
 	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
@@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 	}
 }
 
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
+void intel_breadcrumbs_free(struct kref *kref)
 {
+	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
+
 	irq_work_sync(&b->irq_work);
 	GEM_BUG_ON(!list_empty(&b->signalers));
 	GEM_BUG_ON(b->irq_armed);
+
 	kfree(b);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
index 3ce5ce270b04..be0d4f379a85 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
@@ -9,7 +9,7 @@
 #include <linux/atomic.h>
 #include <linux/irq_work.h>
 
-#include "intel_engine_types.h"
+#include "intel_breadcrumbs_types.h"
 
 struct drm_printer;
 struct i915_request;
@@ -17,7 +17,7 @@ struct intel_breadcrumbs;
 
 struct intel_breadcrumbs *
 intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
+void intel_breadcrumbs_free(struct kref *kref);
 
 void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
 void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
@@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
 void intel_context_remove_breadcrumbs(struct intel_context *ce,
 				      struct intel_breadcrumbs *b);
 
+static inline struct intel_breadcrumbs *
+intel_breadcrumbs_get(struct intel_breadcrumbs *b)
+{
+	kref_get(&b->ref);
+	return b;
+}
+
+static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
+{
+	kref_put(&b->ref, intel_breadcrumbs_free);
+}
+
 #endif /* __INTEL_BREADCRUMBS__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
index 3a084ce8ff5e..72dfd3748c4c 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -7,10 +7,13 @@
 #define __INTEL_BREADCRUMBS_TYPES__
 
 #include <linux/irq_work.h>
+#include <linux/kref.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
+#include "intel_engine_types.h"
+
 /*
  * Rather than have every client wait upon all user interrupts,
  * with the herd waking after every interrupt and each doing the
@@ -29,6 +32,7 @@
  * the overhead of waking that client is much preferred.
  */
 struct intel_breadcrumbs {
+	struct kref ref;
 	atomic_t active;
 
 	spinlock_t signalers_lock; /* protects the list of signalers */
@@ -42,7 +46,10 @@ struct intel_breadcrumbs {
 	bool irq_armed;
 
 	/* Not all breadcrumbs are attached to physical HW */
+	intel_engine_mask_t	engine_mask;
 	struct intel_engine_cs *irq_engine;
+	bool	(*irq_enable)(struct intel_breadcrumbs *b);
+	void	(*irq_disable)(struct intel_breadcrumbs *b);
 };
 
 #endif /* __INTEL_BREADCRUMBS_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 9fec0aca5f4b..edbde6171bca 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
 
 void intel_engine_init_execlists(struct intel_engine_cs *engine);
 
+bool intel_engine_irq_enable(struct intel_engine_cs *engine);
+void intel_engine_irq_disable(struct intel_engine_cs *engine);
+
 static inline void __intel_engine_reset(struct intel_engine_cs *engine,
 					bool stalled)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index b7292d5cb7da..d95d666407f5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
 err_cmd_parser:
 	i915_sched_engine_put(engine->sched_engine);
 err_sched_engine:
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 err_status:
 	cleanup_status_page(engine);
 	return err;
@@ -948,7 +948,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
 
 	i915_sched_engine_put(engine->sched_engine);
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 
 	intel_engine_fini_retire(engine);
 	intel_engine_cleanup_cmd_parser(engine);
@@ -1265,6 +1265,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
 	return true;
 }
 
+bool intel_engine_irq_enable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_enable)
+		return false;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->gt->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock(&engine->gt->irq_lock);
+
+	return true;
+}
+
+void intel_engine_irq_disable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_disable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->gt->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock(&engine->gt->irq_lock);
+}
+
 void intel_engines_reset_default_submission(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 8ad304b2f2e4..03a81e8d87f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -21,7 +21,6 @@
 #include "i915_pmu.h"
 #include "i915_priolist_types.h"
 #include "i915_selftest.h"
-#include "intel_breadcrumbs_types.h"
 #include "intel_sseu.h"
 #include "intel_timeline_types.h"
 #include "intel_uncore.h"
@@ -63,6 +62,7 @@ struct i915_sched_engine;
 struct intel_gt;
 struct intel_ring;
 struct intel_uncore;
+struct intel_breadcrumbs;
 
 typedef u8 intel_engine_mask_t;
 #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 920707e22eb0..abe48421fd7a 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3407,7 +3407,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
 	intel_context_fini(&ve->context);
 
 	if (ve->base.breadcrumbs)
-		intel_breadcrumbs_free(ve->base.breadcrumbs);
+		intel_breadcrumbs_put(ve->base.breadcrumbs);
 	if (ve->base.sched_engine)
 		i915_sched_engine_put(ve->base.sched_engine);
 	intel_engine_free_request_pool(&ve->base);
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 9203c766db80..fc5a65ab1937 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 	GEM_BUG_ON(timer_pending(&mock->hw_delay));
 
 	i915_sched_engine_put(engine->sched_engine);
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 
 	intel_context_unpin(engine->kernel_context);
 	intel_context_put(engine->kernel_context);
@@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
 	return 0;
 
 err_breadcrumbs:
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 err_schedule:
 	i915_sched_engine_put(engine->sched_engine);
 	return -ENOMEM;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 372e0dc7617a..9f28899ff17f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1076,6 +1076,9 @@ static void __guc_context_destroy(struct intel_context *ce)
 		struct guc_virtual_engine *ve =
 			container_of(ce, typeof(*ve), context);
 
+		if (ve->base.breadcrumbs)
+			intel_breadcrumbs_put(ve->base.breadcrumbs);
+
 		kfree(ve);
 	} else {
 		intel_context_free(ce);
@@ -1366,6 +1369,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 	.get_sibling = guc_virtual_get_sibling,
 };
 
+static bool
+guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *sibling;
+	intel_engine_mask_t tmp, mask = b->engine_mask;
+	bool result = false;
+
+	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
+		result |= intel_engine_irq_enable(sibling);
+
+	return result;
+}
+
+static void
+guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *sibling;
+	intel_engine_mask_t tmp, mask = b->engine_mask;
+
+	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
+		intel_engine_irq_disable(sibling);
+}
+
+static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
+{
+	int i;
+
+       /*
+        * In GuC submission mode we do not know which physical engine a request
+        * will be scheduled on, this creates a problem because the breadcrumb
+        * interrupt is per physical engine. To work around this we attach
+        * requests and direct all breadcrumb interrupts to the first instance
+        * of an engine per class. In addition all breadcrumb interrupts are
+	* enabled / disabled across an engine class in unison.
+        */
+	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
+		struct intel_engine_cs *sibling =
+			engine->gt->engine_class[engine->class][i];
+
+		if (sibling) {
+			if (engine->breadcrumbs != sibling->breadcrumbs) {
+				intel_breadcrumbs_put(engine->breadcrumbs);
+				engine->breadcrumbs =
+					intel_breadcrumbs_get(sibling->breadcrumbs);
+			}
+			break;
+		}
+	}
+
+	if (engine->breadcrumbs) {
+		engine->breadcrumbs->engine_mask |= engine->mask;
+		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
+		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
+	}
+}
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -1589,6 +1648,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
+	guc_init_breadcrumbs(engine);
 
 	if (engine->class == RENDER_CLASS)
 		rcs_submission_override(engine);
@@ -1831,11 +1891,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.saturated = ALL_ENGINES;
-	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
-	if (!ve->base.breadcrumbs) {
-		kfree(ve);
-		return ERR_PTR(-ENOMEM);
-	}
 
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
@@ -1884,6 +1939,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 				sibling->emit_fini_breadcrumb;
 			ve->base.emit_fini_breadcrumb_dw =
 				sibling->emit_fini_breadcrumb_dw;
+			ve->base.breadcrumbs =
+				intel_breadcrumbs_get(sibling->breadcrumbs);
 
 			ve->base.flags |= sibling->flags;
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

With GuC virtual engines the physical engine which a request executes
and completes on isn't known to the i915. Therefore we can't attach a
request to a physical engines breadcrumbs. To work around this we create
a single breadcrumbs per engine class when using GuC submission and
direct all physical engine interrupts to this breadcrumbs.

v2:
 (John H)
  - Rework header file structure so intel_engine_mask_t can be in
    intel_engine_types.h

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 ++++-
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
 drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
 drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
 9 files changed, 133 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb..2007dc6f6b99 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -15,28 +15,14 @@
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 
-static bool irq_enable(struct intel_engine_cs *engine)
+static bool irq_enable(struct intel_breadcrumbs *b)
 {
-	if (!engine->irq_enable)
-		return false;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->gt->irq_lock);
-	engine->irq_enable(engine);
-	spin_unlock(&engine->gt->irq_lock);
-
-	return true;
+	return intel_engine_irq_enable(b->irq_engine);
 }
 
-static void irq_disable(struct intel_engine_cs *engine)
+static void irq_disable(struct intel_breadcrumbs *b)
 {
-	if (!engine->irq_disable)
-		return;
-
-	/* Caller disables interrupts */
-	spin_lock(&engine->gt->irq_lock);
-	engine->irq_disable(engine);
-	spin_unlock(&engine->gt->irq_lock);
+	intel_engine_irq_disable(b->irq_engine);
 }
 
 static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
@@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 	WRITE_ONCE(b->irq_armed, true);
 
 	/* Requests may have completed before we could enable the interrupt. */
-	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
+	if (!b->irq_enabled++ && b->irq_enable(b))
 		irq_work_queue(&b->irq_work);
 }
 
@@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 {
 	GEM_BUG_ON(!b->irq_enabled);
 	if (!--b->irq_enabled)
-		irq_disable(b->irq_engine);
+		b->irq_disable(b);
 
 	WRITE_ONCE(b->irq_armed, false);
 	intel_gt_pm_put_async(b->irq_engine->gt);
@@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 	if (!b)
 		return NULL;
 
-	b->irq_engine = irq_engine;
+	kref_init(&b->ref);
 
 	spin_lock_init(&b->signalers_lock);
 	INIT_LIST_HEAD(&b->signalers);
@@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 	spin_lock_init(&b->irq_lock);
 	init_irq_work(&b->irq_work, signal_irq_work);
 
+	b->irq_engine = irq_engine;
+	b->irq_enable = irq_enable;
+	b->irq_disable = irq_disable;
+
 	return b;
 }
 
@@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
 	spin_lock_irqsave(&b->irq_lock, flags);
 
 	if (b->irq_enabled)
-		irq_enable(b->irq_engine);
+		b->irq_enable(b);
 	else
-		irq_disable(b->irq_engine);
+		b->irq_disable(b);
 
 	spin_unlock_irqrestore(&b->irq_lock, flags);
 }
@@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 	}
 }
 
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
+void intel_breadcrumbs_free(struct kref *kref)
 {
+	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
+
 	irq_work_sync(&b->irq_work);
 	GEM_BUG_ON(!list_empty(&b->signalers));
 	GEM_BUG_ON(b->irq_armed);
+
 	kfree(b);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
index 3ce5ce270b04..be0d4f379a85 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
@@ -9,7 +9,7 @@
 #include <linux/atomic.h>
 #include <linux/irq_work.h>
 
-#include "intel_engine_types.h"
+#include "intel_breadcrumbs_types.h"
 
 struct drm_printer;
 struct i915_request;
@@ -17,7 +17,7 @@ struct intel_breadcrumbs;
 
 struct intel_breadcrumbs *
 intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
-void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
+void intel_breadcrumbs_free(struct kref *kref);
 
 void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
 void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
@@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
 void intel_context_remove_breadcrumbs(struct intel_context *ce,
 				      struct intel_breadcrumbs *b);
 
+static inline struct intel_breadcrumbs *
+intel_breadcrumbs_get(struct intel_breadcrumbs *b)
+{
+	kref_get(&b->ref);
+	return b;
+}
+
+static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
+{
+	kref_put(&b->ref, intel_breadcrumbs_free);
+}
+
 #endif /* __INTEL_BREADCRUMBS__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
index 3a084ce8ff5e..72dfd3748c4c 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -7,10 +7,13 @@
 #define __INTEL_BREADCRUMBS_TYPES__
 
 #include <linux/irq_work.h>
+#include <linux/kref.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/types.h>
 
+#include "intel_engine_types.h"
+
 /*
  * Rather than have every client wait upon all user interrupts,
  * with the herd waking after every interrupt and each doing the
@@ -29,6 +32,7 @@
  * the overhead of waking that client is much preferred.
  */
 struct intel_breadcrumbs {
+	struct kref ref;
 	atomic_t active;
 
 	spinlock_t signalers_lock; /* protects the list of signalers */
@@ -42,7 +46,10 @@ struct intel_breadcrumbs {
 	bool irq_armed;
 
 	/* Not all breadcrumbs are attached to physical HW */
+	intel_engine_mask_t	engine_mask;
 	struct intel_engine_cs *irq_engine;
+	bool	(*irq_enable)(struct intel_breadcrumbs *b);
+	void	(*irq_disable)(struct intel_breadcrumbs *b);
 };
 
 #endif /* __INTEL_BREADCRUMBS_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 9fec0aca5f4b..edbde6171bca 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
 
 void intel_engine_init_execlists(struct intel_engine_cs *engine);
 
+bool intel_engine_irq_enable(struct intel_engine_cs *engine);
+void intel_engine_irq_disable(struct intel_engine_cs *engine);
+
 static inline void __intel_engine_reset(struct intel_engine_cs *engine,
 					bool stalled)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index b7292d5cb7da..d95d666407f5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
 err_cmd_parser:
 	i915_sched_engine_put(engine->sched_engine);
 err_sched_engine:
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 err_status:
 	cleanup_status_page(engine);
 	return err;
@@ -948,7 +948,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
 
 	i915_sched_engine_put(engine->sched_engine);
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 
 	intel_engine_fini_retire(engine);
 	intel_engine_cleanup_cmd_parser(engine);
@@ -1265,6 +1265,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
 	return true;
 }
 
+bool intel_engine_irq_enable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_enable)
+		return false;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->gt->irq_lock);
+	engine->irq_enable(engine);
+	spin_unlock(&engine->gt->irq_lock);
+
+	return true;
+}
+
+void intel_engine_irq_disable(struct intel_engine_cs *engine)
+{
+	if (!engine->irq_disable)
+		return;
+
+	/* Caller disables interrupts */
+	spin_lock(&engine->gt->irq_lock);
+	engine->irq_disable(engine);
+	spin_unlock(&engine->gt->irq_lock);
+}
+
 void intel_engines_reset_default_submission(struct intel_gt *gt)
 {
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 8ad304b2f2e4..03a81e8d87f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -21,7 +21,6 @@
 #include "i915_pmu.h"
 #include "i915_priolist_types.h"
 #include "i915_selftest.h"
-#include "intel_breadcrumbs_types.h"
 #include "intel_sseu.h"
 #include "intel_timeline_types.h"
 #include "intel_uncore.h"
@@ -63,6 +62,7 @@ struct i915_sched_engine;
 struct intel_gt;
 struct intel_ring;
 struct intel_uncore;
+struct intel_breadcrumbs;
 
 typedef u8 intel_engine_mask_t;
 #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 920707e22eb0..abe48421fd7a 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3407,7 +3407,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
 	intel_context_fini(&ve->context);
 
 	if (ve->base.breadcrumbs)
-		intel_breadcrumbs_free(ve->base.breadcrumbs);
+		intel_breadcrumbs_put(ve->base.breadcrumbs);
 	if (ve->base.sched_engine)
 		i915_sched_engine_put(ve->base.sched_engine);
 	intel_engine_free_request_pool(&ve->base);
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index 9203c766db80..fc5a65ab1937 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
 	GEM_BUG_ON(timer_pending(&mock->hw_delay));
 
 	i915_sched_engine_put(engine->sched_engine);
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 
 	intel_context_unpin(engine->kernel_context);
 	intel_context_put(engine->kernel_context);
@@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
 	return 0;
 
 err_breadcrumbs:
-	intel_breadcrumbs_free(engine->breadcrumbs);
+	intel_breadcrumbs_put(engine->breadcrumbs);
 err_schedule:
 	i915_sched_engine_put(engine->sched_engine);
 	return -ENOMEM;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 372e0dc7617a..9f28899ff17f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1076,6 +1076,9 @@ static void __guc_context_destroy(struct intel_context *ce)
 		struct guc_virtual_engine *ve =
 			container_of(ce, typeof(*ve), context);
 
+		if (ve->base.breadcrumbs)
+			intel_breadcrumbs_put(ve->base.breadcrumbs);
+
 		kfree(ve);
 	} else {
 		intel_context_free(ce);
@@ -1366,6 +1369,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 	.get_sibling = guc_virtual_get_sibling,
 };
 
+static bool
+guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *sibling;
+	intel_engine_mask_t tmp, mask = b->engine_mask;
+	bool result = false;
+
+	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
+		result |= intel_engine_irq_enable(sibling);
+
+	return result;
+}
+
+static void
+guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
+{
+	struct intel_engine_cs *sibling;
+	intel_engine_mask_t tmp, mask = b->engine_mask;
+
+	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
+		intel_engine_irq_disable(sibling);
+}
+
+static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
+{
+	int i;
+
+       /*
+        * In GuC submission mode we do not know which physical engine a request
+        * will be scheduled on, this creates a problem because the breadcrumb
+        * interrupt is per physical engine. To work around this we attach
+        * requests and direct all breadcrumb interrupts to the first instance
+        * of an engine per class. In addition all breadcrumb interrupts are
+	* enabled / disabled across an engine class in unison.
+        */
+	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
+		struct intel_engine_cs *sibling =
+			engine->gt->engine_class[engine->class][i];
+
+		if (sibling) {
+			if (engine->breadcrumbs != sibling->breadcrumbs) {
+				intel_breadcrumbs_put(engine->breadcrumbs);
+				engine->breadcrumbs =
+					intel_breadcrumbs_get(sibling->breadcrumbs);
+			}
+			break;
+		}
+	}
+
+	if (engine->breadcrumbs) {
+		engine->breadcrumbs->engine_mask |= engine->mask;
+		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
+		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
+	}
+}
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -1589,6 +1648,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 	guc_default_vfuncs(engine);
 	guc_default_irqs(engine);
+	guc_init_breadcrumbs(engine);
 
 	if (engine->class == RENDER_CLASS)
 		rcs_submission_override(engine);
@@ -1831,11 +1891,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.saturated = ALL_ENGINES;
-	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
-	if (!ve->base.breadcrumbs) {
-		kfree(ve);
-		return ERR_PTR(-ENOMEM);
-	}
 
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
@@ -1884,6 +1939,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 				sibling->emit_fini_breadcrumb;
 			ve->base.emit_fini_breadcrumb_dw =
 				sibling->emit_fini_breadcrumb_dw;
+			ve->base.breadcrumbs =
+				intel_breadcrumbs_get(sibling->breadcrumbs);
 
 			ve->base.flags |= sibling->flags;
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

This help the backends clean up when the schedule engine object gets
destroyed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/i915_scheduler.c       | 3 ++-
 drivers/gpu/drm/i915/i915_scheduler.h       | 4 +---
 drivers/gpu/drm/i915/i915_scheduler_types.h | 5 +++++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 3a58a9130309..4fceda96deed 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m,
 	rcu_read_unlock();
 }
 
-void i915_sched_engine_free(struct kref *kref)
+static void default_destroy(struct kref *kref)
 {
 	struct i915_sched_engine *sched_engine =
 		container_of(kref, typeof(*sched_engine), ref);
@@ -453,6 +453,7 @@ i915_sched_engine_create(unsigned int subclass)
 
 	sched_engine->queue = RB_ROOT_CACHED;
 	sched_engine->queue_priority_hint = INT_MIN;
+	sched_engine->destroy = default_destroy;
 
 	INIT_LIST_HEAD(&sched_engine->requests);
 	INIT_LIST_HEAD(&sched_engine->hold);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 650ab8e0db9f..3c9504e9f409 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -51,8 +51,6 @@ static inline void i915_priolist_free(struct i915_priolist *p)
 struct i915_sched_engine *
 i915_sched_engine_create(unsigned int subclass);
 
-void i915_sched_engine_free(struct kref *kref);
-
 static inline struct i915_sched_engine *
 i915_sched_engine_get(struct i915_sched_engine *sched_engine)
 {
@@ -63,7 +61,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine)
 static inline void
 i915_sched_engine_put(struct i915_sched_engine *sched_engine)
 {
-	kref_put(&sched_engine->ref, i915_sched_engine_free);
+	kref_put(&sched_engine->ref, sched_engine->destroy);
 }
 
 static inline bool
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 5935c3152bdc..00384e2c5273 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -163,6 +163,11 @@ struct i915_sched_engine {
 	 */
 	void *private_data;
 
+	/**
+	 * @destroy: destroy schedule engine / cleanup in backend
+	 */
+	void	(*destroy)(struct kref *kref);
+
 	/**
 	 * @kick_backend: kick backend after a request's priority has changed
 	 */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

This help the backends clean up when the schedule engine object gets
destroyed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/i915_scheduler.c       | 3 ++-
 drivers/gpu/drm/i915/i915_scheduler.h       | 4 +---
 drivers/gpu/drm/i915/i915_scheduler_types.h | 5 +++++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 3a58a9130309..4fceda96deed 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m,
 	rcu_read_unlock();
 }
 
-void i915_sched_engine_free(struct kref *kref)
+static void default_destroy(struct kref *kref)
 {
 	struct i915_sched_engine *sched_engine =
 		container_of(kref, typeof(*sched_engine), ref);
@@ -453,6 +453,7 @@ i915_sched_engine_create(unsigned int subclass)
 
 	sched_engine->queue = RB_ROOT_CACHED;
 	sched_engine->queue_priority_hint = INT_MIN;
+	sched_engine->destroy = default_destroy;
 
 	INIT_LIST_HEAD(&sched_engine->requests);
 	INIT_LIST_HEAD(&sched_engine->hold);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 650ab8e0db9f..3c9504e9f409 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -51,8 +51,6 @@ static inline void i915_priolist_free(struct i915_priolist *p)
 struct i915_sched_engine *
 i915_sched_engine_create(unsigned int subclass);
 
-void i915_sched_engine_free(struct kref *kref);
-
 static inline struct i915_sched_engine *
 i915_sched_engine_get(struct i915_sched_engine *sched_engine)
 {
@@ -63,7 +61,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine)
 static inline void
 i915_sched_engine_put(struct i915_sched_engine *sched_engine)
 {
-	kref_put(&sched_engine->ref, i915_sched_engine_free);
+	kref_put(&sched_engine->ref, sched_engine->destroy);
 }
 
 static inline bool
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 5935c3152bdc..00384e2c5273 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -163,6 +163,11 @@ struct i915_sched_engine {
 	 */
 	void *private_data;
 
+	/**
+	 * @destroy: destroy schedule engine / cleanup in backend
+	 */
+	void	(*destroy)(struct kref *kref);
+
 	/**
 	 * @kick_backend: kick backend after a request's priority has changed
 	 */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 25/51] drm/i915: Move active request tracking to a vfunc
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Move active request tracking to a backend vfunc rather than assuming all
backends want to do this in the maner. In the case execlists /
ring submission the tracking is on the physical engine while with GuC
submission it is on the context.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |  3 ++
 drivers/gpu/drm/i915/gt/intel_context_types.h |  7 ++++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  6 +++
 .../drm/i915/gt/intel_execlists_submission.c  | 40 ++++++++++++++++++
 .../gpu/drm/i915/gt/intel_ring_submission.c   | 22 ++++++++++
 drivers/gpu/drm/i915/gt/mock_engine.c         | 30 ++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 33 +++++++++++++++
 drivers/gpu/drm/i915/i915_request.c           | 41 ++-----------------
 drivers/gpu/drm/i915/i915_request.h           |  2 +
 9 files changed, 147 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 251ff7eea22d..bfb05d8697d1 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -393,6 +393,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	spin_lock_init(&ce->guc_state.lock);
 	INIT_LIST_HEAD(&ce->guc_state.fences);
 
+	spin_lock_init(&ce->guc_active.lock);
+	INIT_LIST_HEAD(&ce->guc_active.requests);
+
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 542c98418771..035108c10b2c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -162,6 +162,13 @@ struct intel_context {
 		struct list_head fences;
 	} guc_state;
 
+	struct {
+		/** lock: protects everything in guc_active */
+		spinlock_t lock;
+		/** requests: active requests on this context */
+		struct list_head requests;
+	} guc_active;
+
 	/* GuC scheduling state flags that do not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 03a81e8d87f4..950fc73ed6af 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -420,6 +420,12 @@ struct intel_engine_cs {
 
 	void		(*release)(struct intel_engine_cs *engine);
 
+	/*
+	 * Add / remove request from engine active tracking
+	 */
+	void		(*add_active_request)(struct i915_request *rq);
+	void		(*remove_active_request)(struct i915_request *rq);
+
 	struct intel_engine_execlists execlists;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index abe48421fd7a..f9b5f54a5abe 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3106,6 +3106,42 @@ static void execlists_park(struct intel_engine_cs *engine)
 	cancel_timer(&engine->execlists.preempt);
 }
 
+static void add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched_engine->lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched_engine->lock);
+		spin_lock(&engine->sched_engine->lock);
+		locked = engine;
+	}
+	list_del_init(&rq->sched.link);
+
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&locked->sched_engine->lock);
+
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static bool can_preempt(struct intel_engine_cs *engine)
 {
 	if (GRAPHICS_VER(engine->i915) > 8)
@@ -3206,6 +3242,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &execlists_context_ops;
 	engine->request_alloc = execlists_request_alloc;
 	engine->bump_serial = execlist_bump_serial;
+	engine->add_active_request = add_to_engine;
+	engine->remove_active_request = remove_from_engine;
 
 	engine->reset.prepare = execlists_reset_prepare;
 	engine->reset.rewind = execlists_reset_rewind;
@@ -3797,6 +3835,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 			 "v%dx%d", ve->base.class, count);
 		ve->base.context_size = sibling->context_size;
 
+		ve->base.add_active_request = sibling->add_active_request;
+		ve->base.remove_active_request = sibling->remove_active_request;
 		ve->base.emit_bb_start = sibling->emit_bb_start;
 		ve->base.emit_flush = sibling->emit_flush;
 		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 61469c631057..3b1471c50d40 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1052,6 +1052,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
 	engine->serial++;
 }
 
+static void add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+	spin_lock_irq(&rq->engine->sched_engine->lock);
+	list_del_init(&rq->sched.link);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&rq->engine->sched_engine->lock);
+
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static void setup_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1069,6 +1088,9 @@ static void setup_common(struct intel_engine_cs *engine)
 	engine->reset.cancel = reset_cancel;
 	engine->reset.finish = reset_finish;
 
+	engine->add_active_request = add_to_engine;
+	engine->remove_active_request = remove_from_engine;
+
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
 	engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index fc5a65ab1937..5f86ff79c98c 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -235,6 +235,34 @@ static void mock_submit_request(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
+static void mock_add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void mock_remove_from_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched_engine->lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched_engine->lock);
+		spin_lock(&engine->sched_engine->lock);
+		locked = engine;
+	}
+	list_del_init(&rq->sched.link);
+	spin_unlock_irq(&locked->sched_engine->lock);
+}
+
 static void mock_reset_prepare(struct intel_engine_cs *engine)
 {
 }
@@ -327,6 +355,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
+	engine->base.add_active_request = mock_add_to_engine;
+	engine->base.remove_active_request = mock_remove_from_engine;
 
 	engine->base.reset.prepare = mock_reset_prepare;
 	engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9f28899ff17f..f8a6077fa3f9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1147,6 +1147,33 @@ static int guc_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void add_to_context(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock(&ce->guc_active.lock);
+	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
+	spin_unlock(&ce->guc_active.lock);
+}
+
+static void remove_from_context(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock_irq(&ce->guc_active.lock);
+
+	list_del_init(&rq->sched.link);
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&ce->guc_active.lock);
+
+	atomic_dec(&ce->guc_id_ref);
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static const struct intel_context_ops guc_context_ops = {
 	.alloc = guc_context_alloc,
 
@@ -1567,6 +1594,8 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
 	engine->bump_serial = guc_bump_serial;
+	engine->add_active_request = add_to_context;
+	engine->remove_active_request = remove_from_context;
 
 	engine->sched_engine->schedule = i915_schedule;
 
@@ -1931,6 +1960,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 				 "v%dx%d", ve->base.class, count);
 			ve->base.context_size = sibling->context_size;
 
+			ve->base.add_active_request =
+				sibling->add_active_request;
+			ve->base.remove_active_request =
+				sibling->remove_active_request;
 			ve->base.emit_bb_start = sibling->emit_bb_start;
 			ve->base.emit_flush = sibling->emit_flush;
 			ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index b3c792d55321..4eba848b20e3 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -182,7 +182,7 @@ static bool irq_work_imm(struct irq_work *wrk)
 	return false;
 }
 
-static void __notify_execute_cb_imm(struct i915_request *rq)
+void i915_request_notify_execute_cb_imm(struct i915_request *rq)
 {
 	__notify_execute_cb(rq, irq_work_imm);
 }
@@ -256,37 +256,6 @@ i915_request_active_engine(struct i915_request *rq,
 	return ret;
 }
 
-
-static void remove_from_engine(struct i915_request *rq)
-{
-	struct intel_engine_cs *engine, *locked;
-
-	/*
-	 * Virtual engines complicate acquiring the engine timeline lock,
-	 * as their rq->engine pointer is not stable until under that
-	 * engine lock. The simple ploy we use is to take the lock then
-	 * check that the rq still belongs to the newly locked engine.
-	 */
-	locked = READ_ONCE(rq->engine);
-	spin_lock_irq(&locked->sched_engine->lock);
-	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
-		spin_unlock(&locked->sched_engine->lock);
-		spin_lock(&engine->sched_engine->lock);
-		locked = engine;
-	}
-	list_del_init(&rq->sched.link);
-
-	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
-
-	/* Prevent further __await_execution() registering a cb, then flush */
-	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
-
-	spin_unlock_irq(&locked->sched_engine->lock);
-
-	__notify_execute_cb_imm(rq);
-}
-
 static void __rq_init_watchdog(struct i915_request *rq)
 {
 	rq->watchdog.timer.function = NULL;
@@ -383,9 +352,7 @@ bool i915_request_retire(struct i915_request *rq)
 	 * after removing the breadcrumb and signaling it, so that we do not
 	 * inadvertently attach the breadcrumb to a completed request.
 	 */
-	if (!list_empty(&rq->sched.link))
-		remove_from_engine(rq);
-	atomic_dec(&rq->context->guc_id_ref);
+	rq->engine->remove_active_request(rq);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
 	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -516,7 +483,7 @@ __await_execution(struct i915_request *rq,
 	if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
 		if (i915_request_is_active(signal) ||
 		    __request_in_flight(signal))
-			__notify_execute_cb_imm(signal);
+			i915_request_notify_execute_cb_imm(signal);
 	}
 
 	return 0;
@@ -653,7 +620,7 @@ bool __i915_request_submit(struct i915_request *request)
 	result = true;
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
+	engine->add_active_request(request);
 active:
 	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 717e5b292046..128030f43bbf 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -647,4 +647,6 @@ bool
 i915_request_active_engine(struct i915_request *rq,
 			   struct intel_engine_cs **active);
 
+void i915_request_notify_execute_cb_imm(struct i915_request *rq);
+
 #endif /* I915_REQUEST_H */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 25/51] drm/i915: Move active request tracking to a vfunc
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Move active request tracking to a backend vfunc rather than assuming all
backends want to do this in the maner. In the case execlists /
ring submission the tracking is on the physical engine while with GuC
submission it is on the context.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |  3 ++
 drivers/gpu/drm/i915/gt/intel_context_types.h |  7 ++++
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  6 +++
 .../drm/i915/gt/intel_execlists_submission.c  | 40 ++++++++++++++++++
 .../gpu/drm/i915/gt/intel_ring_submission.c   | 22 ++++++++++
 drivers/gpu/drm/i915/gt/mock_engine.c         | 30 ++++++++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 33 +++++++++++++++
 drivers/gpu/drm/i915/i915_request.c           | 41 ++-----------------
 drivers/gpu/drm/i915/i915_request.h           |  2 +
 9 files changed, 147 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 251ff7eea22d..bfb05d8697d1 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -393,6 +393,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	spin_lock_init(&ce->guc_state.lock);
 	INIT_LIST_HEAD(&ce->guc_state.fences);
 
+	spin_lock_init(&ce->guc_active.lock);
+	INIT_LIST_HEAD(&ce->guc_active.requests);
+
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 542c98418771..035108c10b2c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -162,6 +162,13 @@ struct intel_context {
 		struct list_head fences;
 	} guc_state;
 
+	struct {
+		/** lock: protects everything in guc_active */
+		spinlock_t lock;
+		/** requests: active requests on this context */
+		struct list_head requests;
+	} guc_active;
+
 	/* GuC scheduling state flags that do not require a lock. */
 	atomic_t guc_sched_state_no_lock;
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 03a81e8d87f4..950fc73ed6af 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -420,6 +420,12 @@ struct intel_engine_cs {
 
 	void		(*release)(struct intel_engine_cs *engine);
 
+	/*
+	 * Add / remove request from engine active tracking
+	 */
+	void		(*add_active_request)(struct i915_request *rq);
+	void		(*remove_active_request)(struct i915_request *rq);
+
 	struct intel_engine_execlists execlists;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index abe48421fd7a..f9b5f54a5abe 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3106,6 +3106,42 @@ static void execlists_park(struct intel_engine_cs *engine)
 	cancel_timer(&engine->execlists.preempt);
 }
 
+static void add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched_engine->lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched_engine->lock);
+		spin_lock(&engine->sched_engine->lock);
+		locked = engine;
+	}
+	list_del_init(&rq->sched.link);
+
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&locked->sched_engine->lock);
+
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static bool can_preempt(struct intel_engine_cs *engine)
 {
 	if (GRAPHICS_VER(engine->i915) > 8)
@@ -3206,6 +3242,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &execlists_context_ops;
 	engine->request_alloc = execlists_request_alloc;
 	engine->bump_serial = execlist_bump_serial;
+	engine->add_active_request = add_to_engine;
+	engine->remove_active_request = remove_from_engine;
 
 	engine->reset.prepare = execlists_reset_prepare;
 	engine->reset.rewind = execlists_reset_rewind;
@@ -3797,6 +3835,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 			 "v%dx%d", ve->base.class, count);
 		ve->base.context_size = sibling->context_size;
 
+		ve->base.add_active_request = sibling->add_active_request;
+		ve->base.remove_active_request = sibling->remove_active_request;
 		ve->base.emit_bb_start = sibling->emit_bb_start;
 		ve->base.emit_flush = sibling->emit_flush;
 		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 61469c631057..3b1471c50d40 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -1052,6 +1052,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
 	engine->serial++;
 }
 
+static void add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void remove_from_engine(struct i915_request *rq)
+{
+	spin_lock_irq(&rq->engine->sched_engine->lock);
+	list_del_init(&rq->sched.link);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&rq->engine->sched_engine->lock);
+
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static void setup_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1069,6 +1088,9 @@ static void setup_common(struct intel_engine_cs *engine)
 	engine->reset.cancel = reset_cancel;
 	engine->reset.finish = reset_finish;
 
+	engine->add_active_request = add_to_engine;
+	engine->remove_active_request = remove_from_engine;
+
 	engine->cops = &ring_context_ops;
 	engine->request_alloc = ring_request_alloc;
 	engine->bump_serial = ring_bump_serial;
diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
index fc5a65ab1937..5f86ff79c98c 100644
--- a/drivers/gpu/drm/i915/gt/mock_engine.c
+++ b/drivers/gpu/drm/i915/gt/mock_engine.c
@@ -235,6 +235,34 @@ static void mock_submit_request(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
+static void mock_add_to_engine(struct i915_request *rq)
+{
+	lockdep_assert_held(&rq->engine->sched_engine->lock);
+	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
+}
+
+static void mock_remove_from_engine(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine, *locked;
+
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+
+	locked = READ_ONCE(rq->engine);
+	spin_lock_irq(&locked->sched_engine->lock);
+	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
+		spin_unlock(&locked->sched_engine->lock);
+		spin_lock(&engine->sched_engine->lock);
+		locked = engine;
+	}
+	list_del_init(&rq->sched.link);
+	spin_unlock_irq(&locked->sched_engine->lock);
+}
+
 static void mock_reset_prepare(struct intel_engine_cs *engine)
 {
 }
@@ -327,6 +355,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
+	engine->base.add_active_request = mock_add_to_engine;
+	engine->base.remove_active_request = mock_remove_from_engine;
 
 	engine->base.reset.prepare = mock_reset_prepare;
 	engine->base.reset.rewind = mock_reset_rewind;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 9f28899ff17f..f8a6077fa3f9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1147,6 +1147,33 @@ static int guc_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void add_to_context(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock(&ce->guc_active.lock);
+	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
+	spin_unlock(&ce->guc_active.lock);
+}
+
+static void remove_from_context(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock_irq(&ce->guc_active.lock);
+
+	list_del_init(&rq->sched.link);
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+	/* Prevent further __await_execution() registering a cb, then flush */
+	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
+
+	spin_unlock_irq(&ce->guc_active.lock);
+
+	atomic_dec(&ce->guc_id_ref);
+	i915_request_notify_execute_cb_imm(rq);
+}
+
 static const struct intel_context_ops guc_context_ops = {
 	.alloc = guc_context_alloc,
 
@@ -1567,6 +1594,8 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->cops = &guc_context_ops;
 	engine->request_alloc = guc_request_alloc;
 	engine->bump_serial = guc_bump_serial;
+	engine->add_active_request = add_to_context;
+	engine->remove_active_request = remove_from_context;
 
 	engine->sched_engine->schedule = i915_schedule;
 
@@ -1931,6 +1960,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
 				 "v%dx%d", ve->base.class, count);
 			ve->base.context_size = sibling->context_size;
 
+			ve->base.add_active_request =
+				sibling->add_active_request;
+			ve->base.remove_active_request =
+				sibling->remove_active_request;
 			ve->base.emit_bb_start = sibling->emit_bb_start;
 			ve->base.emit_flush = sibling->emit_flush;
 			ve->base.emit_init_breadcrumb =
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index b3c792d55321..4eba848b20e3 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -182,7 +182,7 @@ static bool irq_work_imm(struct irq_work *wrk)
 	return false;
 }
 
-static void __notify_execute_cb_imm(struct i915_request *rq)
+void i915_request_notify_execute_cb_imm(struct i915_request *rq)
 {
 	__notify_execute_cb(rq, irq_work_imm);
 }
@@ -256,37 +256,6 @@ i915_request_active_engine(struct i915_request *rq,
 	return ret;
 }
 
-
-static void remove_from_engine(struct i915_request *rq)
-{
-	struct intel_engine_cs *engine, *locked;
-
-	/*
-	 * Virtual engines complicate acquiring the engine timeline lock,
-	 * as their rq->engine pointer is not stable until under that
-	 * engine lock. The simple ploy we use is to take the lock then
-	 * check that the rq still belongs to the newly locked engine.
-	 */
-	locked = READ_ONCE(rq->engine);
-	spin_lock_irq(&locked->sched_engine->lock);
-	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
-		spin_unlock(&locked->sched_engine->lock);
-		spin_lock(&engine->sched_engine->lock);
-		locked = engine;
-	}
-	list_del_init(&rq->sched.link);
-
-	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
-
-	/* Prevent further __await_execution() registering a cb, then flush */
-	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
-
-	spin_unlock_irq(&locked->sched_engine->lock);
-
-	__notify_execute_cb_imm(rq);
-}
-
 static void __rq_init_watchdog(struct i915_request *rq)
 {
 	rq->watchdog.timer.function = NULL;
@@ -383,9 +352,7 @@ bool i915_request_retire(struct i915_request *rq)
 	 * after removing the breadcrumb and signaling it, so that we do not
 	 * inadvertently attach the breadcrumb to a completed request.
 	 */
-	if (!list_empty(&rq->sched.link))
-		remove_from_engine(rq);
-	atomic_dec(&rq->context->guc_id_ref);
+	rq->engine->remove_active_request(rq);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
 	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
@@ -516,7 +483,7 @@ __await_execution(struct i915_request *rq,
 	if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
 		if (i915_request_is_active(signal) ||
 		    __request_in_flight(signal))
-			__notify_execute_cb_imm(signal);
+			i915_request_notify_execute_cb_imm(signal);
 	}
 
 	return 0;
@@ -653,7 +620,7 @@ bool __i915_request_submit(struct i915_request *request)
 	result = true;
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
-	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
+	engine->add_active_request(request);
 active:
 	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 717e5b292046..128030f43bbf 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -647,4 +647,6 @@ bool
 i915_request_active_engine(struct i915_request *rq,
 			   struct intel_engine_cs **active);
 
+void i915_request_notify_execute_cb_imm(struct i915_request *rq);
+
 #endif /* I915_REQUEST_H */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:16   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.

v2:
 (Michal)
  - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
v3:
 (John H)
  - Split into a series of smaller patches

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
 drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  13 -
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 562 ++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  39 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
 7 files changed, 515 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index aef3084e8b16..463a6ae605a0 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
 	if (intel_gt_is_wedged(gt))
 		intel_gt_unset_wedged(gt);
 
-	intel_uc_sanitize(&gt->uc);
-
 	for_each_engine(engine, gt, id)
 		if (engine->reset.prepare)
 			engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
 			__intel_engine_reset(engine, false);
 	}
 
+	intel_uc_reset(&gt->uc, false);
+
 	for_each_engine(engine, gt, id)
 		if (engine->reset.finish)
 			engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
 		goto err_wedged;
 	}
 
+	intel_uc_reset_finish(&gt->uc);
+
 	intel_rps_enable(&gt->rps);
 	intel_llc_enable(&gt->llc);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 72251638d4ea..2987282dff6d 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
 		__intel_engine_reset(engine, stalled_mask & engine->mask);
 	local_bh_enable();
 
+	intel_uc_reset(&gt->uc, true);
+
 	intel_ggtt_restore_fences(gt->ggtt);
 
 	return err;
@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
 		if (awake & engine->mask)
 			intel_engine_pm_put(engine);
 	}
+
+	intel_uc_reset_finish(&gt->uc);
 }
 
 static void nop_submit_request(struct i915_request *request)
@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
 	for_each_engine(engine, gt, id)
 		if (engine->reset.cancel)
 			engine->reset.cancel(engine);
+	intel_uc_cancel_requests(&gt->uc);
 	local_bh_enable();
 
 	reset_finish(gt, awake);
@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
 	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
 
+	if (intel_engine_uses_guc(engine))
+		return -ENODEV;
+
 	if (!intel_engine_pm_get_if_awake(engine))
 		return 0;
 
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 			   "Resetting %s for %s\n", engine->name, msg);
 	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
 
-	if (intel_engine_uses_guc(engine))
-		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
-	else
-		ret = intel_gt_reset_engine(engine);
+	ret = intel_gt_reset_engine(engine);
 	if (ret) {
 		/* If we fail here, we expect to fallback to a global reset */
-		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
+		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
 		goto out;
 	}
 
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
+	if (!intel_uc_uses_guc_submission(&gt->uc) &&
+	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
 		local_bh_disable();
 		for_each_engine_masked(engine, gt, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 6661dcb02239..9b09395b998f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc)
 	return 0;
 }
 
-/**
- * intel_guc_reset_engine() - ask GuC to reset an engine
- * @guc:	intel_guc structure
- * @engine:	engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
-			   struct intel_engine_cs *engine)
-{
-	/* XXX: to be implemented with submission interface rework */
-
-	return -ENODEV;
-}
-
 /**
  * intel_guc_resume() - notify GuC resuming from suspend state
  * @guc:	the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 3cc566565224..d75a76882a44 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 
 int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
 
-int intel_guc_reset_engine(struct intel_guc *guc,
-			   struct intel_engine_cs *engine);
-
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
 int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
 
+void intel_guc_submission_reset_prepare(struct intel_guc *guc);
+void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
+void intel_guc_submission_reset_finish(struct intel_guc *guc);
+void intel_guc_submission_cancel_requests(struct intel_guc *guc);
+
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f8a6077fa3f9..0f5823922369 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
 static inline void
 set_context_wait_for_deregister_to_register(struct intel_context *ce)
 {
-	/* Only should be called from guc_lrc_desc_pin() */
+	/* Only should be called from guc_lrc_desc_pin() without lock */
 	ce->guc_state.sched_state |=
 		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
 }
@@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 
 static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 {
+	guc->lrc_desc_pool_vaddr = NULL;
 	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
+static inline bool guc_submission_initialized(struct intel_guc *guc)
+{
+	return guc->lrc_desc_pool_vaddr != NULL;
+}
+
 static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
 {
-	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+	if (likely(guc_submission_initialized(guc))) {
+		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+		unsigned long flags;
 
-	memset(desc, 0, sizeof(*desc));
-	xa_erase_irq(&guc->context_lookup, id);
+		memset(desc, 0, sizeof(*desc));
+
+		/*
+		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
+		 * the lower level functions directly.
+		 */
+		xa_lock_irqsave(&guc->context_lookup, flags);
+		__xa_erase(&guc->context_lookup, id);
+		xa_unlock_irqrestore(&guc->context_lookup, flags);
+	}
 }
 
 static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
 static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 					   struct intel_context *ce)
 {
-	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	unsigned long flags;
+
+	/*
+	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
+	 * lower level functions directly.
+	 */
+	xa_lock_irqsave(&guc->context_lookup, flags);
+	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	xa_unlock_irqrestore(&guc->context_lookup, flags);
 }
 
 static int guc_submission_send_busy_loop(struct intel_guc* guc,
@@ -326,6 +350,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 					true, timeout);
 }
 
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
+
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err;
@@ -333,11 +359,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	u32 action[3];
 	int len = 0;
 	u32 g2h_len_dw = 0;
-	bool enabled = context_enabled(ce);
+	bool enabled;
 
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
 
+	/*
+	 * Corner case where the GuC firmware was blown away and reloaded while
+	 * this context was pinned.
+	 */
+	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
+		err = guc_lrc_desc_pin(ce, false);
+		if (unlikely(err))
+			goto out;
+	}
+	enabled = context_enabled(ce);
+
 	if (!enabled) {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
@@ -360,6 +397,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		intel_context_put(ce);
 	}
 
+out:
 	return err;
 }
 
@@ -414,15 +452,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	if (submit) {
 		guc_set_lrc_tail(last);
 resubmit:
-		/*
-		 * We only check for -EBUSY here even though it is possible for
-		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
-		 * died and a full GT reset needs to be done. The hangcheck will
-		 * eventually detect that the GuC has died and trigger this
-		 * reset so no need to handle -EDEADLK here.
-		 */
 		ret = guc_add_request(guc, last);
-		if (ret == -EBUSY) {
+		if (unlikely(ret == -EPIPE))
+			goto deadlk;
+		else if (ret == -EBUSY) {
 			tasklet_schedule(&sched_engine->tasklet);
 			guc->stalled_request = last;
 			return false;
@@ -432,6 +465,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 
 	guc->stalled_request = NULL;
 	return submit;
+
+deadlk:
+	sched_engine->tasklet.callback = NULL;
+	tasklet_disable_nosync(&sched_engine->tasklet);
+	return false;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -458,27 +496,166 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 		intel_engine_signal_breadcrumbs(engine);
 }
 
-static void guc_reset_prepare(struct intel_engine_cs *engine)
+static void __guc_context_destroy(struct intel_context *ce);
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
+static void guc_signal_context_fence(struct intel_context *ce);
+
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
+{
+	struct intel_context *ce;
+	unsigned long index, flags;
+	bool pending_disable, pending_enable, deregister, destroyed;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		/* Flush context */
+		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		/*
+		 * Once we are at this point submission_disabled() is guaranteed
+		 * to visible to all callers who set the below flags (see above
+		 * flush and flushes in reset_prepare). If submission_disabled()
+		 * is set, the caller shouldn't set these flags.
+		 */
+
+		destroyed = context_destroyed(ce);
+		pending_enable = context_pending_enable(ce);
+		pending_disable = context_pending_disable(ce);
+		deregister = context_wait_for_deregister_to_register(ce);
+		init_sched_state(ce);
+
+		if (pending_enable || destroyed || deregister) {
+			atomic_dec(&guc->outstanding_submission_g2h);
+			if (deregister)
+				guc_signal_context_fence(ce);
+			if (destroyed) {
+				release_guc_id(guc, ce);
+				__guc_context_destroy(ce);
+			}
+			if (pending_enable|| deregister)
+				intel_context_put(ce);
+		}
+
+		/* Not mutualy exclusive with above if statement. */
+		if (pending_disable) {
+			guc_signal_context_fence(ce);
+			intel_context_sched_disable_unpin(ce);
+			atomic_dec(&guc->outstanding_submission_g2h);
+			intel_context_put(ce);
+		}
+	}
+}
+
+static inline bool
+submission_disabled(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	return unlikely(!sched_engine ||
+			!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+
+static void disable_submission(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
+		GEM_BUG_ON(!guc->ct.enabled);
+		__tasklet_disable_sync_once(&sched_engine->tasklet);
+		sched_engine->tasklet.callback = NULL;
+	}
+}
+
+static void enable_submission(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->sched_engine->lock, flags);
+	sched_engine->tasklet.callback = guc_submission_tasklet;
+	wmb();
+	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
+	    __tasklet_enable(&sched_engine->tasklet)) {
+		GEM_BUG_ON(!guc->ct.enabled);
+
+		/* And kick in case we missed a new request submission. */
+		tasklet_hi_schedule(&sched_engine->tasklet);
+	}
+	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+
+static void guc_flush_submissions(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 {
-	ENGINE_TRACE(engine, "\n");
+	int i;
+
+	if (unlikely(!guc_submission_initialized(guc)))
+		/* Reset called during driver load? GuC not yet initialised! */
+		return;
+
+	disable_submission(guc);
+	guc->interrupts.disable(guc);
+
+	/* Flush IRQ handler */
+	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
+	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
+
+	guc_flush_submissions(guc);
 
 	/*
-	 * Prevent request submission to the hardware until we have
-	 * completed the reset in i915_gem_reset_finish(). If a request
-	 * is completed by one engine, it may then queue a request
-	 * to a second via its execlists->tasklet *just* as we are
-	 * calling engine->init_hw() and also writing the ELSP.
-	 * Turning off the execlists->tasklet until the reset is over
-	 * prevents the race.
-	 */
-	__tasklet_disable_sync_once(&engine->sched_engine->tasklet);
+	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
+	 * each pass as interrupt have been disabled. We always scrub for
+	 * outstanding G2H as it is possible for outstanding_submission_g2h to
+	 * be incremented after the context state update.
+ 	 */
+	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
+		intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
+		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
+		do {
+			wait_for_reset(guc, &guc->outstanding_submission_g2h);
+		} while (!list_empty(&guc->ct.requests.incoming));
+	}
+	scrub_guc_desc_for_outstanding_g2h(guc);
+}
+
+static struct intel_engine_cs *
+guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (num_siblings++ == sibling)
+			return engine;
+
+	return NULL;
+}
+
+static inline struct intel_engine_cs *
+__context_to_physical_engine(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+
+	if (intel_engine_is_virtual(engine))
+		engine = guc_virtual_get_sibling(engine, 0);
+
+	return engine;
 }
 
-static void guc_reset_state(struct intel_context *ce,
-			    struct intel_engine_cs *engine,
-			    u32 head,
-			    bool scrub)
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
 {
+	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
+
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
 
 	/*
@@ -496,42 +673,147 @@ static void guc_reset_state(struct intel_context *ce,
 	lrc_update_regs(ce, engine, head);
 }
 
-static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
+static void guc_reset_nop(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request *rq;
+}
+
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
+{
+}
+
+static void
+__unwind_incomplete_requests(struct intel_context *ce)
+{
+	struct i915_request *rq, *rn;
+	struct list_head *pl;
+	int prio = I915_PRIORITY_INVALID;
+	struct i915_sched_engine * const sched_engine =
+		ce->engine->sched_engine;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_lock(&ce->guc_active.lock);
+	list_for_each_entry_safe(rq, rn,
+				 &ce->guc_active.requests,
+				 sched.link) {
+		if (i915_request_completed(rq))
+			continue;
+
+		list_del_init(&rq->sched.link);
+		spin_unlock(&ce->guc_active.lock);
+
+		__i915_request_unsubmit(rq);
+
+		/* Push the request back into the queue for later resubmission. */
+		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+		if (rq_prio(rq) != prio) {
+			prio = rq_prio(rq);
+			pl = i915_sched_lookup_priolist(sched_engine, prio);
+		}
+		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
 
-	/* Push back any incomplete requests for replay after the reset. */
-	rq = execlists_unwind_incomplete_requests(execlists);
-	if (!rq)
-		goto out_unlock;
+		list_add_tail(&rq->sched.link, pl);
+		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+		spin_lock(&ce->guc_active.lock);
+	}
+	spin_unlock(&ce->guc_active.lock);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+static struct i915_request *context_find_active_request(struct intel_context *ce)
+{
+	struct i915_request *rq, *active = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ce->guc_active.lock, flags);
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+				    sched.link) {
+		if (i915_request_completed(rq))
+			break;
+
+		active = rq;
+	}
+	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+
+	return active;
+}
+
+static void __guc_reset_context(struct intel_context *ce, bool stalled)
+{
+	struct i915_request *rq;
+	u32 head;
+
+	/*
+	 * GuC will implicitly mark the context as non-schedulable
+	 * when it sends the reset notification. Make sure our state
+	 * reflects this change. The context will be marked enabled
+	 * on resubmission.
+	 */
+	clr_context_enabled(ce);
+
+	rq = context_find_active_request(ce);
+	if (!rq) {
+		head = ce->ring->tail;
+		stalled = false;
+		goto out_replay;
+	}
 
 	if (!i915_request_started(rq))
 		stalled = false;
 
+	GEM_BUG_ON(i915_active_is_idle(&ce->active));
+	head = intel_ring_wrap(ce->ring, rq->head);
 	__i915_request_reset(rq, stalled);
-	guc_reset_state(rq->context, engine, rq->head, stalled);
 
-out_unlock:
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
+	guc_reset_state(ce, head, stalled);
+	__unwind_incomplete_requests(ce);
+}
+
+void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
+{
+	struct intel_context *ce;
+	unsigned long index;
+
+	if (unlikely(!guc_submission_initialized(guc)))
+		/* Reset called during driver load? GuC not yet initialised! */
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce)
+		if (intel_context_is_pinned(ce))
+			__guc_reset_context(ce, stalled);
+
+	/* GuC is blown away, drop all references to contexts */
+	xa_destroy(&guc->context_lookup);
+}
+
+static void guc_cancel_context_requests(struct intel_context *ce)
+{
+	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
+	struct i915_request *rq;
+	unsigned long flags;
+
+	/* Mark all executing requests as skipped. */
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_lock(&ce->guc_active.lock);
+	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
+		i915_request_put(i915_request_mark_eio(rq));
+	spin_unlock(&ce->guc_active.lock);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static void guc_reset_cancel(struct intel_engine_cs *engine)
+static void
+guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
 {
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 	struct i915_request *rq, *rn;
 	struct rb_node *rb;
 	unsigned long flags;
 
 	/* Can be called during boot if GuC fails to load */
-	if (!engine->gt)
+	if (!sched_engine)
 		return;
 
-	ENGINE_TRACE(engine, "\n");
-
 	/*
 	 * Before we call engine->cancel_requests(), we should have exclusive
 	 * access to the submission state. This is arranged for us by the
@@ -548,21 +830,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	 */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	/* Mark all executing requests as skipped. */
-	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
-		i915_request_set_error_once(rq, -EIO);
-		i915_request_mark_complete(rq);
-	}
-
 	/* Flush the queued requests to the timeline list (for retiring). */
 	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 
 		priolist_for_each_request_consume(rq, rn, p) {
 			list_del_init(&rq->sched.link);
+
 			__i915_request_submit(rq);
-			dma_fence_set_error(&rq->fence, -EIO);
-			i915_request_mark_complete(rq);
+
+			i915_request_put(i915_request_mark_eio(rq));
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
@@ -577,14 +854,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static void guc_reset_finish(struct intel_engine_cs *engine)
+void intel_guc_submission_cancel_requests(struct intel_guc *guc)
 {
-	if (__tasklet_enable(&engine->sched_engine->tasklet))
-		/* And kick in case we missed a new request submission. */
-		tasklet_hi_schedule(&engine->sched_engine->tasklet);
+	struct intel_context *ce;
+	unsigned long index;
 
-	ENGINE_TRACE(engine, "depth->%d\n",
-		     atomic_read(&engine->sched_engine->tasklet.count));
+	xa_for_each(&guc->context_lookup, index, ce)
+		if (intel_context_is_pinned(ce))
+			guc_cancel_context_requests(ce);
+
+	guc_cancel_sched_engine_requests(guc->sched_engine);
+
+	/* GuC is blown away, drop all references to contexts */
+	xa_destroy(&guc->context_lookup);
+}
+
+void intel_guc_submission_reset_finish(struct intel_guc *guc)
+{
+	/* Reset called during driver load or during wedge? */
+	if (unlikely(!guc_submission_initialized(guc) ||
+		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
+		return;
+
+	/*
+	 * Technically possible for either of these values to be non-zero here,
+	 * but very unlikely + harmless. Regardless let's add a warn so we can
+	 * see in CI if this happens frequently / a precursor to taking down the
+	 * machine.
+	 */
+	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
+	atomic_set(&guc->outstanding_submission_g2h, 0);
+
+	enable_submission(guc);
 }
 
 /*
@@ -651,6 +952,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	else
 		trace_i915_request_guc_submit(rq);
 
+	if (unlikely(ret == -EPIPE))
+		disable_submission(guc);
+
 	return ret;
 }
 
@@ -663,7 +967,8 @@ static void guc_submit_request(struct i915_request *rq)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+	if (submission_disabled(guc) || guc->stalled_request ||
+	    !i915_sched_engine_is_empty(sched_engine))
 		queue_request(sched_engine, rq, rq_prio(rq));
 	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
 		tasklet_hi_schedule(&sched_engine->tasklet);
@@ -800,7 +1105,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 
 static int __guc_action_register_context(struct intel_guc *guc,
 					 u32 guc_id,
-					 u32 offset)
+					 u32 offset,
+					 bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_REGISTER_CONTEXT,
@@ -809,10 +1115,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
 	};
 
 	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
-					     0, true);
+					     0, loop);
 }
 
-static int register_context(struct intel_context *ce)
+static int register_context(struct intel_context *ce, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
@@ -820,11 +1126,12 @@ static int register_context(struct intel_context *ce)
 
 	trace_intel_context_register(ce);
 
-	return __guc_action_register_context(guc, ce->guc_id, offset);
+	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
 }
 
 static int __guc_action_deregister_context(struct intel_guc *guc,
-					   u32 guc_id)
+					   u32 guc_id,
+					   bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
@@ -833,16 +1140,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 
 	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
 					     G2H_LEN_DW_DEREGISTER_CONTEXT,
-					     true);
+					     loop);
 }
 
-static int deregister_context(struct intel_context *ce, u32 guc_id)
+static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
 	trace_intel_context_deregister(ce);
 
-	return __guc_action_deregister_context(guc, guc_id);
+	return __guc_action_deregister_context(guc, guc_id, loop);
 }
 
 static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -871,7 +1178,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
 }
 
-static int guc_lrc_desc_pin(struct intel_context *ce)
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 {
 	struct intel_runtime_pm *runtime_pm =
 		&ce->engine->gt->i915->runtime_pm;
@@ -917,18 +1224,45 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	 */
 	if (context_registered) {
 		trace_intel_context_steal_guc_id(ce);
-		set_context_wait_for_deregister_to_register(ce);
-		intel_context_get(ce);
+		if (!loop) {
+			set_context_wait_for_deregister_to_register(ce);
+			intel_context_get(ce);
+		} else {
+			bool disabled;
+			unsigned long flags;
+
+			/* Seal race with Reset */
+			spin_lock_irqsave(&ce->guc_state.lock, flags);
+			disabled = submission_disabled(guc);
+			if (likely(!disabled)) {
+				set_context_wait_for_deregister_to_register(ce);
+				intel_context_get(ce);
+			}
+			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+			if (unlikely(disabled)) {
+				reset_lrc_desc(guc, desc_idx);
+				return 0;	/* Will get registered later */
+			}
+		}
 
 		/*
 		 * If stealing the guc_id, this ce has the same guc_id as the
 		 * context whos guc_id was stole.
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			ret = deregister_context(ce, ce->guc_id);
+			ret = deregister_context(ce, ce->guc_id, loop);
+		if (unlikely(ret == -EBUSY)) {
+			clr_context_wait_for_deregister_to_register(ce);
+			intel_context_put(ce);
+		} else if (unlikely(ret == -ENODEV))
+			ret = 0;	/* Will get registered later */
 	} else {
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			ret = register_context(ce);
+			ret = register_context(ce, loop);
+		if (unlikely(ret == -EBUSY))
+			reset_lrc_desc(guc, desc_idx);
+		else if (unlikely(ret == -ENODEV))
+			ret = 0;	/* Will get registered later */
 	}
 
 	return ret;
@@ -991,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 
 	trace_intel_context_sched_disable(ce);
-	intel_context_get(ce);
 
 	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
 				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1003,6 +1336,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
 
 	set_context_pending_disable(ce);
 	clr_context_enabled(ce);
+	intel_context_get(ce);
 
 	return ce->guc_id;
 }
@@ -1015,7 +1349,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	u16 guc_id;
 	intel_wakeref_t wakeref;
 
-	if (context_guc_id_invalid(ce) ||
+	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		clr_context_enabled(ce);
 		goto unpin;
@@ -1036,6 +1370,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	 * a request before we set the 'context_pending_disable' flag here.
 	 */
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
 	}
 	guc_id = prep_context_pending_disable(ce);
@@ -1052,19 +1387,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
 
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
-	struct intel_engine_cs *engine = ce->engine;
-	struct intel_guc *guc = &engine->gt->uc.guc;
-	unsigned long flags;
+	struct intel_guc *guc = ce_to_guc(ce);
 
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
 	GEM_BUG_ON(context_enabled(ce));
 
-	spin_lock_irqsave(&ce->guc_state.lock, flags);
-	set_context_destroyed(ce);
-	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
-
-	deregister_context(ce, ce->guc_id);
+	deregister_context(ce, ce->guc_id, true);
 }
 
 static void __guc_context_destroy(struct intel_context *ce)
@@ -1092,13 +1421,15 @@ static void guc_context_destroy(struct kref *kref)
 	struct intel_guc *guc = &ce->engine->gt->uc.guc;
 	intel_wakeref_t wakeref;
 	unsigned long flags;
+	bool disabled;
 
 	/*
 	 * If the guc_id is invalid this context has been stolen and we can free
 	 * it immediately. Also can be freed immediately if the context is not
 	 * registered with the GuC.
 	 */
-	if (context_guc_id_invalid(ce) ||
+	if (submission_disabled(guc) ||
+	    context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
@@ -1125,6 +1456,18 @@ static void guc_context_destroy(struct kref *kref)
 		list_del_init(&ce->guc_id_link);
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 
+	/* Seal race with Reset */
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	disabled = submission_disabled(guc);
+	if (likely(!disabled))
+		set_context_destroyed(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	if (unlikely(disabled)) {
+		release_guc_id(guc, ce);
+		__guc_context_destroy(ce);
+		return;
+	}
+
 	/*
 	 * We defer GuC context deregistration until the context is destroyed
 	 * in order to save on CTBs. With this optimization ideally we only need
@@ -1212,8 +1555,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
 {
 	unsigned long flags;
 
-	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
-
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	clr_context_wait_for_deregister_to_register(ce);
 	__guc_signal_context_fence(ce);
@@ -1222,8 +1563,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
 
 static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
-	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
-		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
+		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
+		!submission_disabled(ce_to_guc(ce));
 }
 
 static int guc_request_alloc(struct i915_request *rq)
@@ -1281,8 +1623,12 @@ static int guc_request_alloc(struct i915_request *rq)
 	if (unlikely(ret < 0))
 		return ret;;
 	if (context_needs_register(ce, !!ret)) {
-		ret = guc_lrc_desc_pin(ce);
+		ret = guc_lrc_desc_pin(ce, true);
 		if (unlikely(ret)) {	/* unwind */
+			if (ret == -EPIPE) {
+				disable_submission(guc);
+				goto out;	/* GPU will be reset */
+			}
 			atomic_dec(&ce->guc_id_ref);
 			unpin_guc_id(guc, ce);
 			return ret;
@@ -1319,20 +1665,6 @@ static int guc_request_alloc(struct i915_request *rq)
 	return 0;
 }
 
-static struct intel_engine_cs *
-guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
-{
-	struct intel_engine_cs *engine;
-	intel_engine_mask_t tmp, mask = ve->mask;
-	unsigned int num_siblings = 0;
-
-	for_each_engine_masked(engine, ve->gt, mask, tmp)
-		if (num_siblings++ == sibling)
-			return engine;
-
-	return NULL;
-}
-
 static int guc_virtual_context_pre_pin(struct intel_context *ce,
 				       struct i915_gem_ww_ctx *ww,
 				       void **vaddr)
@@ -1528,7 +1860,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
 {
 	if (context_guc_id_invalid(ce))
 		pin_guc_id(guc, ce);
-	guc_lrc_desc_pin(ce);
+	guc_lrc_desc_pin(ce, true);
 }
 
 static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1599,10 +1931,10 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->sched_engine->schedule = i915_schedule;
 
-	engine->reset.prepare = guc_reset_prepare;
-	engine->reset.rewind = guc_reset_rewind;
-	engine->reset.cancel = guc_reset_cancel;
-	engine->reset.finish = guc_reset_finish;
+	engine->reset.prepare = guc_reset_nop;
+	engine->reset.rewind = guc_rewind_nop;
+	engine->reset.cancel = guc_reset_nop;
+	engine->reset.finish = guc_reset_nop;
 
 	engine->emit_flush = gen8_emit_flush_xcs;
 	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1651,6 +1983,17 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
 	intel_engine_set_irq_handler(engine, cs_irq_handler);
 }
 
+static void guc_sched_engine_destroy(struct kref *kref)
+{
+	struct i915_sched_engine *sched_engine =
+		container_of(kref, typeof(*sched_engine), ref);
+	struct intel_guc *guc = sched_engine->private_data;
+
+	guc->sched_engine = NULL;
+	tasklet_kill(&sched_engine->tasklet); /* flush the callback */
+	kfree(sched_engine);
+}
+
 int intel_guc_submission_setup(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1669,6 +2012,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 		guc->sched_engine->schedule = i915_schedule;
 		guc->sched_engine->private_data = guc;
+		guc->sched_engine->destroy = guc_sched_engine_destroy;
 		tasklet_setup(&guc->sched_engine->tasklet,
 			      guc_submission_tasklet);
 	}
@@ -1775,7 +2119,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		 * register this context.
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			register_context(ce);
+			register_context(ce, true);
 		guc_signal_context_fence(ce);
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 6d8b9233214e..f0b02200aa01 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
-	if (!intel_guc_is_ready(guc))
+
+	/* Nothing to do if GuC isn't supported */
+	if (!intel_uc_supports_guc(uc))
 		return;
 
+	/* Firmware expected to be running when this function is called */
+	if (!intel_guc_is_ready(guc))
+		goto sanitize;
+
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset_prepare(guc);
+
+sanitize:
 	__uc_sanitize(uc);
 }
 
+void intel_uc_reset(struct intel_uc *uc, bool stalled)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware can not be running when this function is called  */
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset(guc, stalled);
+}
+
+void intel_uc_reset_finish(struct intel_uc *uc)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware expected to be running when this function is called */
+	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset_finish(guc);
+}
+
+void intel_uc_cancel_requests(struct intel_uc *uc)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware can not be running when this function is called  */
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_cancel_requests(guc);
+}
+
 void intel_uc_runtime_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index c4cef885e984..eaa3202192ac 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
 void intel_uc_driver_remove(struct intel_uc *uc);
 void intel_uc_init_mmio(struct intel_uc *uc);
 void intel_uc_reset_prepare(struct intel_uc *uc);
+void intel_uc_reset(struct intel_uc *uc, bool stalled);
+void intel_uc_reset_finish(struct intel_uc *uc);
+void intel_uc_cancel_requests(struct intel_uc *uc);
 void intel_uc_suspend(struct intel_uc *uc);
 void intel_uc_runtime_suspend(struct intel_uc *uc);
 int intel_uc_resume(struct intel_uc *uc);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface
@ 2021-07-16 20:16   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:16 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.

v2:
 (Michal)
  - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
v3:
 (John H)
  - Split into a series of smaller patches

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
 drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  13 -
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 562 ++++++++++++++----
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  39 +-
 drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
 7 files changed, 515 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index aef3084e8b16..463a6ae605a0 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
 	if (intel_gt_is_wedged(gt))
 		intel_gt_unset_wedged(gt);
 
-	intel_uc_sanitize(&gt->uc);
-
 	for_each_engine(engine, gt, id)
 		if (engine->reset.prepare)
 			engine->reset.prepare(engine);
@@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
 			__intel_engine_reset(engine, false);
 	}
 
+	intel_uc_reset(&gt->uc, false);
+
 	for_each_engine(engine, gt, id)
 		if (engine->reset.finish)
 			engine->reset.finish(engine);
@@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
 		goto err_wedged;
 	}
 
+	intel_uc_reset_finish(&gt->uc);
+
 	intel_rps_enable(&gt->rps);
 	intel_llc_enable(&gt->llc);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 72251638d4ea..2987282dff6d 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
 		__intel_engine_reset(engine, stalled_mask & engine->mask);
 	local_bh_enable();
 
+	intel_uc_reset(&gt->uc, true);
+
 	intel_ggtt_restore_fences(gt->ggtt);
 
 	return err;
@@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
 		if (awake & engine->mask)
 			intel_engine_pm_put(engine);
 	}
+
+	intel_uc_reset_finish(&gt->uc);
 }
 
 static void nop_submit_request(struct i915_request *request)
@@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
 	for_each_engine(engine, gt, id)
 		if (engine->reset.cancel)
 			engine->reset.cancel(engine);
+	intel_uc_cancel_requests(&gt->uc);
 	local_bh_enable();
 
 	reset_finish(gt, awake);
@@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
 	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
 
+	if (intel_engine_uses_guc(engine))
+		return -ENODEV;
+
 	if (!intel_engine_pm_get_if_awake(engine))
 		return 0;
 
@@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 			   "Resetting %s for %s\n", engine->name, msg);
 	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
 
-	if (intel_engine_uses_guc(engine))
-		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
-	else
-		ret = intel_gt_reset_engine(engine);
+	ret = intel_gt_reset_engine(engine);
 	if (ret) {
 		/* If we fail here, we expect to fallback to a global reset */
-		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
+		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
 		goto out;
 	}
 
@@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
+	if (!intel_uc_uses_guc_submission(&gt->uc) &&
+	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
 		local_bh_disable();
 		for_each_engine_masked(engine, gt, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 6661dcb02239..9b09395b998f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc)
 	return 0;
 }
 
-/**
- * intel_guc_reset_engine() - ask GuC to reset an engine
- * @guc:	intel_guc structure
- * @engine:	engine to be reset
- */
-int intel_guc_reset_engine(struct intel_guc *guc,
-			   struct intel_engine_cs *engine)
-{
-	/* XXX: to be implemented with submission interface rework */
-
-	return -ENODEV;
-}
-
 /**
  * intel_guc_resume() - notify GuC resuming from suspend state
  * @guc:	the guc
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 3cc566565224..d75a76882a44 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
 
 int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
 
-int intel_guc_reset_engine(struct intel_guc *guc,
-			   struct intel_engine_cs *engine);
-
 int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
 int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
 
+void intel_guc_submission_reset_prepare(struct intel_guc *guc);
+void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
+void intel_guc_submission_reset_finish(struct intel_guc *guc);
+void intel_guc_submission_cancel_requests(struct intel_guc *guc);
+
 void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index f8a6077fa3f9..0f5823922369 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
 static inline void
 set_context_wait_for_deregister_to_register(struct intel_context *ce)
 {
-	/* Only should be called from guc_lrc_desc_pin() */
+	/* Only should be called from guc_lrc_desc_pin() without lock */
 	ce->guc_state.sched_state |=
 		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
 }
@@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
 
 static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
 {
+	guc->lrc_desc_pool_vaddr = NULL;
 	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
 }
 
+static inline bool guc_submission_initialized(struct intel_guc *guc)
+{
+	return guc->lrc_desc_pool_vaddr != NULL;
+}
+
 static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
 {
-	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+	if (likely(guc_submission_initialized(guc))) {
+		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
+		unsigned long flags;
 
-	memset(desc, 0, sizeof(*desc));
-	xa_erase_irq(&guc->context_lookup, id);
+		memset(desc, 0, sizeof(*desc));
+
+		/*
+		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
+		 * the lower level functions directly.
+		 */
+		xa_lock_irqsave(&guc->context_lookup, flags);
+		__xa_erase(&guc->context_lookup, id);
+		xa_unlock_irqrestore(&guc->context_lookup, flags);
+	}
 }
 
 static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
@@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
 static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
 					   struct intel_context *ce)
 {
-	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	unsigned long flags;
+
+	/*
+	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
+	 * lower level functions directly.
+	 */
+	xa_lock_irqsave(&guc->context_lookup, flags);
+	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
+	xa_unlock_irqrestore(&guc->context_lookup, flags);
 }
 
 static int guc_submission_send_busy_loop(struct intel_guc* guc,
@@ -326,6 +350,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 					true, timeout);
 }
 
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
+
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
 	int err;
@@ -333,11 +359,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	u32 action[3];
 	int len = 0;
 	u32 g2h_len_dw = 0;
-	bool enabled = context_enabled(ce);
+	bool enabled;
 
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
 
+	/*
+	 * Corner case where the GuC firmware was blown away and reloaded while
+	 * this context was pinned.
+	 */
+	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
+		err = guc_lrc_desc_pin(ce, false);
+		if (unlikely(err))
+			goto out;
+	}
+	enabled = context_enabled(ce);
+
 	if (!enabled) {
 		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
 		action[len++] = ce->guc_id;
@@ -360,6 +397,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		intel_context_put(ce);
 	}
 
+out:
 	return err;
 }
 
@@ -414,15 +452,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 	if (submit) {
 		guc_set_lrc_tail(last);
 resubmit:
-		/*
-		 * We only check for -EBUSY here even though it is possible for
-		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
-		 * died and a full GT reset needs to be done. The hangcheck will
-		 * eventually detect that the GuC has died and trigger this
-		 * reset so no need to handle -EDEADLK here.
-		 */
 		ret = guc_add_request(guc, last);
-		if (ret == -EBUSY) {
+		if (unlikely(ret == -EPIPE))
+			goto deadlk;
+		else if (ret == -EBUSY) {
 			tasklet_schedule(&sched_engine->tasklet);
 			guc->stalled_request = last;
 			return false;
@@ -432,6 +465,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 
 	guc->stalled_request = NULL;
 	return submit;
+
+deadlk:
+	sched_engine->tasklet.callback = NULL;
+	tasklet_disable_nosync(&sched_engine->tasklet);
+	return false;
 }
 
 static void guc_submission_tasklet(struct tasklet_struct *t)
@@ -458,27 +496,166 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 		intel_engine_signal_breadcrumbs(engine);
 }
 
-static void guc_reset_prepare(struct intel_engine_cs *engine)
+static void __guc_context_destroy(struct intel_context *ce);
+static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
+static void guc_signal_context_fence(struct intel_context *ce);
+
+static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
+{
+	struct intel_context *ce;
+	unsigned long index, flags;
+	bool pending_disable, pending_enable, deregister, destroyed;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		/* Flush context */
+		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		/*
+		 * Once we are at this point submission_disabled() is guaranteed
+		 * to visible to all callers who set the below flags (see above
+		 * flush and flushes in reset_prepare). If submission_disabled()
+		 * is set, the caller shouldn't set these flags.
+		 */
+
+		destroyed = context_destroyed(ce);
+		pending_enable = context_pending_enable(ce);
+		pending_disable = context_pending_disable(ce);
+		deregister = context_wait_for_deregister_to_register(ce);
+		init_sched_state(ce);
+
+		if (pending_enable || destroyed || deregister) {
+			atomic_dec(&guc->outstanding_submission_g2h);
+			if (deregister)
+				guc_signal_context_fence(ce);
+			if (destroyed) {
+				release_guc_id(guc, ce);
+				__guc_context_destroy(ce);
+			}
+			if (pending_enable|| deregister)
+				intel_context_put(ce);
+		}
+
+		/* Not mutualy exclusive with above if statement. */
+		if (pending_disable) {
+			guc_signal_context_fence(ce);
+			intel_context_sched_disable_unpin(ce);
+			atomic_dec(&guc->outstanding_submission_g2h);
+			intel_context_put(ce);
+		}
+	}
+}
+
+static inline bool
+submission_disabled(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	return unlikely(!sched_engine ||
+			!__tasklet_is_enabled(&sched_engine->tasklet));
+}
+
+static void disable_submission(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+
+	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
+		GEM_BUG_ON(!guc->ct.enabled);
+		__tasklet_disable_sync_once(&sched_engine->tasklet);
+		sched_engine->tasklet.callback = NULL;
+	}
+}
+
+static void enable_submission(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	unsigned long flags;
+
+	spin_lock_irqsave(&guc->sched_engine->lock, flags);
+	sched_engine->tasklet.callback = guc_submission_tasklet;
+	wmb();
+	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
+	    __tasklet_enable(&sched_engine->tasklet)) {
+		GEM_BUG_ON(!guc->ct.enabled);
+
+		/* And kick in case we missed a new request submission. */
+		tasklet_hi_schedule(&sched_engine->tasklet);
+	}
+	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
+}
+
+static void guc_flush_submissions(struct intel_guc *guc)
+{
+	struct i915_sched_engine * const sched_engine = guc->sched_engine;
+	unsigned long flags;
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 {
-	ENGINE_TRACE(engine, "\n");
+	int i;
+
+	if (unlikely(!guc_submission_initialized(guc)))
+		/* Reset called during driver load? GuC not yet initialised! */
+		return;
+
+	disable_submission(guc);
+	guc->interrupts.disable(guc);
+
+	/* Flush IRQ handler */
+	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
+	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
+
+	guc_flush_submissions(guc);
 
 	/*
-	 * Prevent request submission to the hardware until we have
-	 * completed the reset in i915_gem_reset_finish(). If a request
-	 * is completed by one engine, it may then queue a request
-	 * to a second via its execlists->tasklet *just* as we are
-	 * calling engine->init_hw() and also writing the ELSP.
-	 * Turning off the execlists->tasklet until the reset is over
-	 * prevents the race.
-	 */
-	__tasklet_disable_sync_once(&engine->sched_engine->tasklet);
+	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
+	 * each pass as interrupt have been disabled. We always scrub for
+	 * outstanding G2H as it is possible for outstanding_submission_g2h to
+	 * be incremented after the context state update.
+ 	 */
+	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
+		intel_guc_to_host_event_handler(guc);
+#define wait_for_reset(guc, wait_var) \
+		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
+		do {
+			wait_for_reset(guc, &guc->outstanding_submission_g2h);
+		} while (!list_empty(&guc->ct.requests.incoming));
+	}
+	scrub_guc_desc_for_outstanding_g2h(guc);
+}
+
+static struct intel_engine_cs *
+guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
+{
+	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp, mask = ve->mask;
+	unsigned int num_siblings = 0;
+
+	for_each_engine_masked(engine, ve->gt, mask, tmp)
+		if (num_siblings++ == sibling)
+			return engine;
+
+	return NULL;
+}
+
+static inline struct intel_engine_cs *
+__context_to_physical_engine(struct intel_context *ce)
+{
+	struct intel_engine_cs *engine = ce->engine;
+
+	if (intel_engine_is_virtual(engine))
+		engine = guc_virtual_get_sibling(engine, 0);
+
+	return engine;
 }
 
-static void guc_reset_state(struct intel_context *ce,
-			    struct intel_engine_cs *engine,
-			    u32 head,
-			    bool scrub)
+static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
 {
+	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
+
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
 
 	/*
@@ -496,42 +673,147 @@ static void guc_reset_state(struct intel_context *ce,
 	lrc_update_regs(ce, engine, head);
 }
 
-static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
+static void guc_reset_nop(struct intel_engine_cs *engine)
 {
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct i915_request *rq;
+}
+
+static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
+{
+}
+
+static void
+__unwind_incomplete_requests(struct intel_context *ce)
+{
+	struct i915_request *rq, *rn;
+	struct list_head *pl;
+	int prio = I915_PRIORITY_INVALID;
+	struct i915_sched_engine * const sched_engine =
+		ce->engine->sched_engine;
 	unsigned long flags;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_lock(&ce->guc_active.lock);
+	list_for_each_entry_safe(rq, rn,
+				 &ce->guc_active.requests,
+				 sched.link) {
+		if (i915_request_completed(rq))
+			continue;
+
+		list_del_init(&rq->sched.link);
+		spin_unlock(&ce->guc_active.lock);
+
+		__i915_request_unsubmit(rq);
+
+		/* Push the request back into the queue for later resubmission. */
+		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+		if (rq_prio(rq) != prio) {
+			prio = rq_prio(rq);
+			pl = i915_sched_lookup_priolist(sched_engine, prio);
+		}
+		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
 
-	/* Push back any incomplete requests for replay after the reset. */
-	rq = execlists_unwind_incomplete_requests(execlists);
-	if (!rq)
-		goto out_unlock;
+		list_add_tail(&rq->sched.link, pl);
+		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+
+		spin_lock(&ce->guc_active.lock);
+	}
+	spin_unlock(&ce->guc_active.lock);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+static struct i915_request *context_find_active_request(struct intel_context *ce)
+{
+	struct i915_request *rq, *active = NULL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ce->guc_active.lock, flags);
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+				    sched.link) {
+		if (i915_request_completed(rq))
+			break;
+
+		active = rq;
+	}
+	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+
+	return active;
+}
+
+static void __guc_reset_context(struct intel_context *ce, bool stalled)
+{
+	struct i915_request *rq;
+	u32 head;
+
+	/*
+	 * GuC will implicitly mark the context as non-schedulable
+	 * when it sends the reset notification. Make sure our state
+	 * reflects this change. The context will be marked enabled
+	 * on resubmission.
+	 */
+	clr_context_enabled(ce);
+
+	rq = context_find_active_request(ce);
+	if (!rq) {
+		head = ce->ring->tail;
+		stalled = false;
+		goto out_replay;
+	}
 
 	if (!i915_request_started(rq))
 		stalled = false;
 
+	GEM_BUG_ON(i915_active_is_idle(&ce->active));
+	head = intel_ring_wrap(ce->ring, rq->head);
 	__i915_request_reset(rq, stalled);
-	guc_reset_state(rq->context, engine, rq->head, stalled);
 
-out_unlock:
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
+out_replay:
+	guc_reset_state(ce, head, stalled);
+	__unwind_incomplete_requests(ce);
+}
+
+void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
+{
+	struct intel_context *ce;
+	unsigned long index;
+
+	if (unlikely(!guc_submission_initialized(guc)))
+		/* Reset called during driver load? GuC not yet initialised! */
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce)
+		if (intel_context_is_pinned(ce))
+			__guc_reset_context(ce, stalled);
+
+	/* GuC is blown away, drop all references to contexts */
+	xa_destroy(&guc->context_lookup);
+}
+
+static void guc_cancel_context_requests(struct intel_context *ce)
+{
+	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
+	struct i915_request *rq;
+	unsigned long flags;
+
+	/* Mark all executing requests as skipped. */
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	spin_lock(&ce->guc_active.lock);
+	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
+		i915_request_put(i915_request_mark_eio(rq));
+	spin_unlock(&ce->guc_active.lock);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static void guc_reset_cancel(struct intel_engine_cs *engine)
+static void
+guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
 {
-	struct i915_sched_engine * const sched_engine = engine->sched_engine;
 	struct i915_request *rq, *rn;
 	struct rb_node *rb;
 	unsigned long flags;
 
 	/* Can be called during boot if GuC fails to load */
-	if (!engine->gt)
+	if (!sched_engine)
 		return;
 
-	ENGINE_TRACE(engine, "\n");
-
 	/*
 	 * Before we call engine->cancel_requests(), we should have exclusive
 	 * access to the submission state. This is arranged for us by the
@@ -548,21 +830,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	 */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	/* Mark all executing requests as skipped. */
-	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
-		i915_request_set_error_once(rq, -EIO);
-		i915_request_mark_complete(rq);
-	}
-
 	/* Flush the queued requests to the timeline list (for retiring). */
 	while ((rb = rb_first_cached(&sched_engine->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 
 		priolist_for_each_request_consume(rq, rn, p) {
 			list_del_init(&rq->sched.link);
+
 			__i915_request_submit(rq);
-			dma_fence_set_error(&rq->fence, -EIO);
-			i915_request_mark_complete(rq);
+
+			i915_request_put(i915_request_mark_eio(rq));
 		}
 
 		rb_erase_cached(&p->node, &sched_engine->queue);
@@ -577,14 +854,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static void guc_reset_finish(struct intel_engine_cs *engine)
+void intel_guc_submission_cancel_requests(struct intel_guc *guc)
 {
-	if (__tasklet_enable(&engine->sched_engine->tasklet))
-		/* And kick in case we missed a new request submission. */
-		tasklet_hi_schedule(&engine->sched_engine->tasklet);
+	struct intel_context *ce;
+	unsigned long index;
 
-	ENGINE_TRACE(engine, "depth->%d\n",
-		     atomic_read(&engine->sched_engine->tasklet.count));
+	xa_for_each(&guc->context_lookup, index, ce)
+		if (intel_context_is_pinned(ce))
+			guc_cancel_context_requests(ce);
+
+	guc_cancel_sched_engine_requests(guc->sched_engine);
+
+	/* GuC is blown away, drop all references to contexts */
+	xa_destroy(&guc->context_lookup);
+}
+
+void intel_guc_submission_reset_finish(struct intel_guc *guc)
+{
+	/* Reset called during driver load or during wedge? */
+	if (unlikely(!guc_submission_initialized(guc) ||
+		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
+		return;
+
+	/*
+	 * Technically possible for either of these values to be non-zero here,
+	 * but very unlikely + harmless. Regardless let's add a warn so we can
+	 * see in CI if this happens frequently / a precursor to taking down the
+	 * machine.
+	 */
+	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
+	atomic_set(&guc->outstanding_submission_g2h, 0);
+
+	enable_submission(guc);
 }
 
 /*
@@ -651,6 +952,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	else
 		trace_i915_request_guc_submit(rq);
 
+	if (unlikely(ret == -EPIPE))
+		disable_submission(guc);
+
 	return ret;
 }
 
@@ -663,7 +967,8 @@ static void guc_submit_request(struct i915_request *rq)
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&sched_engine->lock, flags);
 
-	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
+	if (submission_disabled(guc) || guc->stalled_request ||
+	    !i915_sched_engine_is_empty(sched_engine))
 		queue_request(sched_engine, rq, rq_prio(rq));
 	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
 		tasklet_hi_schedule(&sched_engine->tasklet);
@@ -800,7 +1105,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
 
 static int __guc_action_register_context(struct intel_guc *guc,
 					 u32 guc_id,
-					 u32 offset)
+					 u32 offset,
+					 bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_REGISTER_CONTEXT,
@@ -809,10 +1115,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
 	};
 
 	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
-					     0, true);
+					     0, loop);
 }
 
-static int register_context(struct intel_context *ce)
+static int register_context(struct intel_context *ce, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
@@ -820,11 +1126,12 @@ static int register_context(struct intel_context *ce)
 
 	trace_intel_context_register(ce);
 
-	return __guc_action_register_context(guc, ce->guc_id, offset);
+	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
 }
 
 static int __guc_action_deregister_context(struct intel_guc *guc,
-					   u32 guc_id)
+					   u32 guc_id,
+					   bool loop)
 {
 	u32 action[] = {
 		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
@@ -833,16 +1140,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
 
 	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
 					     G2H_LEN_DW_DEREGISTER_CONTEXT,
-					     true);
+					     loop);
 }
 
-static int deregister_context(struct intel_context *ce, u32 guc_id)
+static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
 
 	trace_intel_context_deregister(ce);
 
-	return __guc_action_deregister_context(guc, guc_id);
+	return __guc_action_deregister_context(guc, guc_id, loop);
 }
 
 static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
@@ -871,7 +1178,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
 }
 
-static int guc_lrc_desc_pin(struct intel_context *ce)
+static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 {
 	struct intel_runtime_pm *runtime_pm =
 		&ce->engine->gt->i915->runtime_pm;
@@ -917,18 +1224,45 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
 	 */
 	if (context_registered) {
 		trace_intel_context_steal_guc_id(ce);
-		set_context_wait_for_deregister_to_register(ce);
-		intel_context_get(ce);
+		if (!loop) {
+			set_context_wait_for_deregister_to_register(ce);
+			intel_context_get(ce);
+		} else {
+			bool disabled;
+			unsigned long flags;
+
+			/* Seal race with Reset */
+			spin_lock_irqsave(&ce->guc_state.lock, flags);
+			disabled = submission_disabled(guc);
+			if (likely(!disabled)) {
+				set_context_wait_for_deregister_to_register(ce);
+				intel_context_get(ce);
+			}
+			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+			if (unlikely(disabled)) {
+				reset_lrc_desc(guc, desc_idx);
+				return 0;	/* Will get registered later */
+			}
+		}
 
 		/*
 		 * If stealing the guc_id, this ce has the same guc_id as the
 		 * context whos guc_id was stole.
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			ret = deregister_context(ce, ce->guc_id);
+			ret = deregister_context(ce, ce->guc_id, loop);
+		if (unlikely(ret == -EBUSY)) {
+			clr_context_wait_for_deregister_to_register(ce);
+			intel_context_put(ce);
+		} else if (unlikely(ret == -ENODEV))
+			ret = 0;	/* Will get registered later */
 	} else {
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			ret = register_context(ce);
+			ret = register_context(ce, loop);
+		if (unlikely(ret == -EBUSY))
+			reset_lrc_desc(guc, desc_idx);
+		else if (unlikely(ret == -ENODEV))
+			ret = 0;	/* Will get registered later */
 	}
 
 	return ret;
@@ -991,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
 
 	trace_intel_context_sched_disable(ce);
-	intel_context_get(ce);
 
 	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
 				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
@@ -1003,6 +1336,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
 
 	set_context_pending_disable(ce);
 	clr_context_enabled(ce);
+	intel_context_get(ce);
 
 	return ce->guc_id;
 }
@@ -1015,7 +1349,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	u16 guc_id;
 	intel_wakeref_t wakeref;
 
-	if (context_guc_id_invalid(ce) ||
+	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		clr_context_enabled(ce);
 		goto unpin;
@@ -1036,6 +1370,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	 * a request before we set the 'context_pending_disable' flag here.
 	 */
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
 	}
 	guc_id = prep_context_pending_disable(ce);
@@ -1052,19 +1387,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
 
 static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 {
-	struct intel_engine_cs *engine = ce->engine;
-	struct intel_guc *guc = &engine->gt->uc.guc;
-	unsigned long flags;
+	struct intel_guc *guc = ce_to_guc(ce);
 
 	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
 	GEM_BUG_ON(context_enabled(ce));
 
-	spin_lock_irqsave(&ce->guc_state.lock, flags);
-	set_context_destroyed(ce);
-	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
-
-	deregister_context(ce, ce->guc_id);
+	deregister_context(ce, ce->guc_id, true);
 }
 
 static void __guc_context_destroy(struct intel_context *ce)
@@ -1092,13 +1421,15 @@ static void guc_context_destroy(struct kref *kref)
 	struct intel_guc *guc = &ce->engine->gt->uc.guc;
 	intel_wakeref_t wakeref;
 	unsigned long flags;
+	bool disabled;
 
 	/*
 	 * If the guc_id is invalid this context has been stolen and we can free
 	 * it immediately. Also can be freed immediately if the context is not
 	 * registered with the GuC.
 	 */
-	if (context_guc_id_invalid(ce) ||
+	if (submission_disabled(guc) ||
+	    context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
 		release_guc_id(guc, ce);
 		__guc_context_destroy(ce);
@@ -1125,6 +1456,18 @@ static void guc_context_destroy(struct kref *kref)
 		list_del_init(&ce->guc_id_link);
 	spin_unlock_irqrestore(&guc->contexts_lock, flags);
 
+	/* Seal race with Reset */
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	disabled = submission_disabled(guc);
+	if (likely(!disabled))
+		set_context_destroyed(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	if (unlikely(disabled)) {
+		release_guc_id(guc, ce);
+		__guc_context_destroy(ce);
+		return;
+	}
+
 	/*
 	 * We defer GuC context deregistration until the context is destroyed
 	 * in order to save on CTBs. With this optimization ideally we only need
@@ -1212,8 +1555,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
 {
 	unsigned long flags;
 
-	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
-
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 	clr_context_wait_for_deregister_to_register(ce);
 	__guc_signal_context_fence(ce);
@@ -1222,8 +1563,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
 
 static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
 {
-	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
-		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
+	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
+		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
+		!submission_disabled(ce_to_guc(ce));
 }
 
 static int guc_request_alloc(struct i915_request *rq)
@@ -1281,8 +1623,12 @@ static int guc_request_alloc(struct i915_request *rq)
 	if (unlikely(ret < 0))
 		return ret;;
 	if (context_needs_register(ce, !!ret)) {
-		ret = guc_lrc_desc_pin(ce);
+		ret = guc_lrc_desc_pin(ce, true);
 		if (unlikely(ret)) {	/* unwind */
+			if (ret == -EPIPE) {
+				disable_submission(guc);
+				goto out;	/* GPU will be reset */
+			}
 			atomic_dec(&ce->guc_id_ref);
 			unpin_guc_id(guc, ce);
 			return ret;
@@ -1319,20 +1665,6 @@ static int guc_request_alloc(struct i915_request *rq)
 	return 0;
 }
 
-static struct intel_engine_cs *
-guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
-{
-	struct intel_engine_cs *engine;
-	intel_engine_mask_t tmp, mask = ve->mask;
-	unsigned int num_siblings = 0;
-
-	for_each_engine_masked(engine, ve->gt, mask, tmp)
-		if (num_siblings++ == sibling)
-			return engine;
-
-	return NULL;
-}
-
 static int guc_virtual_context_pre_pin(struct intel_context *ce,
 				       struct i915_gem_ww_ctx *ww,
 				       void **vaddr)
@@ -1528,7 +1860,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
 {
 	if (context_guc_id_invalid(ce))
 		pin_guc_id(guc, ce);
-	guc_lrc_desc_pin(ce);
+	guc_lrc_desc_pin(ce, true);
 }
 
 static inline void guc_init_lrc_mapping(struct intel_guc *guc)
@@ -1599,10 +1931,10 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 
 	engine->sched_engine->schedule = i915_schedule;
 
-	engine->reset.prepare = guc_reset_prepare;
-	engine->reset.rewind = guc_reset_rewind;
-	engine->reset.cancel = guc_reset_cancel;
-	engine->reset.finish = guc_reset_finish;
+	engine->reset.prepare = guc_reset_nop;
+	engine->reset.rewind = guc_rewind_nop;
+	engine->reset.cancel = guc_reset_nop;
+	engine->reset.finish = guc_reset_nop;
 
 	engine->emit_flush = gen8_emit_flush_xcs;
 	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
@@ -1651,6 +1983,17 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
 	intel_engine_set_irq_handler(engine, cs_irq_handler);
 }
 
+static void guc_sched_engine_destroy(struct kref *kref)
+{
+	struct i915_sched_engine *sched_engine =
+		container_of(kref, typeof(*sched_engine), ref);
+	struct intel_guc *guc = sched_engine->private_data;
+
+	guc->sched_engine = NULL;
+	tasklet_kill(&sched_engine->tasklet); /* flush the callback */
+	kfree(sched_engine);
+}
+
 int intel_guc_submission_setup(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
@@ -1669,6 +2012,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 
 		guc->sched_engine->schedule = i915_schedule;
 		guc->sched_engine->private_data = guc;
+		guc->sched_engine->destroy = guc_sched_engine_destroy;
 		tasklet_setup(&guc->sched_engine->tasklet,
 			      guc_submission_tasklet);
 	}
@@ -1775,7 +2119,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 		 * register this context.
 		 */
 		with_intel_runtime_pm(runtime_pm, wakeref)
-			register_context(ce);
+			register_context(ce, true);
 		guc_signal_context_fence(ce);
 		intel_context_put(ce);
 	} else if (context_destroyed(ce)) {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 6d8b9233214e..f0b02200aa01 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
-	if (!intel_guc_is_ready(guc))
+
+	/* Nothing to do if GuC isn't supported */
+	if (!intel_uc_supports_guc(uc))
 		return;
 
+	/* Firmware expected to be running when this function is called */
+	if (!intel_guc_is_ready(guc))
+		goto sanitize;
+
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset_prepare(guc);
+
+sanitize:
 	__uc_sanitize(uc);
 }
 
+void intel_uc_reset(struct intel_uc *uc, bool stalled)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware can not be running when this function is called  */
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset(guc, stalled);
+}
+
+void intel_uc_reset_finish(struct intel_uc *uc)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware expected to be running when this function is called */
+	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_reset_finish(guc);
+}
+
+void intel_uc_cancel_requests(struct intel_uc *uc)
+{
+	struct intel_guc *guc = &uc->guc;
+
+	/* Firmware can not be running when this function is called  */
+	if (intel_uc_uses_guc_submission(uc))
+		intel_guc_submission_cancel_requests(guc);
+}
+
 void intel_uc_runtime_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index c4cef885e984..eaa3202192ac 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
 void intel_uc_driver_remove(struct intel_uc *uc);
 void intel_uc_init_mmio(struct intel_uc *uc);
 void intel_uc_reset_prepare(struct intel_uc *uc);
+void intel_uc_reset(struct intel_uc *uc, bool stalled);
+void intel_uc_reset_finish(struct intel_uc *uc);
+void intel_uc_cancel_requests(struct intel_uc *uc);
 void intel_uc_suspend(struct intel_uc *uc);
 void intel_uc_runtime_suspend(struct intel_uc *uc);
 int intel_uc_resume(struct intel_uc *uc);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 27/51] drm/i915: Reset GPU immediately if submission is disabled
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

If submission is disabled by the backend for any reason, reset the GPU
immediately in the heartbeat code as the backend can't be reenabled
until the GPU is reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 63 +++++++++++++++----
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  4 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  9 +++
 drivers/gpu/drm/i915/i915_scheduler.c         |  6 ++
 drivers/gpu/drm/i915/i915_scheduler.h         |  6 ++
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  5 ++
 6 files changed, 80 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index b6a305e6a974..a8495364d906 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -70,12 +70,30 @@ static void show_heartbeat(const struct i915_request *rq,
 {
 	struct drm_printer p = drm_debug_printer("heartbeat");
 
-	intel_engine_dump(engine, &p,
-			  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
-			  engine->name,
-			  rq->fence.context,
-			  rq->fence.seqno,
-			  rq->sched.attr.priority);
+	if (!rq) {
+		intel_engine_dump(engine, &p,
+				  "%s heartbeat not ticking\n",
+				  engine->name);
+	} else {
+		intel_engine_dump(engine, &p,
+				  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
+				  engine->name,
+				  rq->fence.context,
+				  rq->fence.seqno,
+				  rq->sched.attr.priority);
+	}
+}
+
+static void
+reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
+		show_heartbeat(rq, engine);
+
+	intel_gt_handle_error(engine->gt, engine->mask,
+			      I915_ERROR_CAPTURE,
+			      "stopped heartbeat on %s",
+			      engine->name);
 }
 
 static void heartbeat(struct work_struct *wrk)
@@ -102,6 +120,11 @@ static void heartbeat(struct work_struct *wrk)
 	if (intel_gt_is_wedged(engine->gt))
 		goto out;
 
+	if (i915_sched_engine_disabled(engine->sched_engine)) {
+		reset_engine(engine, engine->heartbeat.systole);
+		goto out;
+	}
+
 	if (engine->heartbeat.systole) {
 		long delay = READ_ONCE(engine->props.heartbeat_interval_ms);
 
@@ -139,13 +162,7 @@ static void heartbeat(struct work_struct *wrk)
 			engine->sched_engine->schedule(rq, &attr);
 			local_bh_enable();
 		} else {
-			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
-				show_heartbeat(rq, engine);
-
-			intel_gt_handle_error(engine->gt, engine->mask,
-					      I915_ERROR_CAPTURE,
-					      "stopped heartbeat on %s",
-					      engine->name);
+			reset_engine(engine, rq);
 		}
 
 		rq->emitted_jiffies = jiffies;
@@ -194,6 +211,26 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine)
 		i915_request_put(fetch_and_zero(&engine->heartbeat.systole));
 }
 
+void intel_gt_unpark_heartbeats(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		if (intel_engine_pm_is_awake(engine))
+			intel_engine_unpark_heartbeat(engine);
+
+}
+
+void intel_gt_park_heartbeats(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		intel_engine_park_heartbeat(engine);
+}
+
 void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
 {
 	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
index a488ea3e84a3..5da6d809a87a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
@@ -7,6 +7,7 @@
 #define INTEL_ENGINE_HEARTBEAT_H
 
 struct intel_engine_cs;
+struct intel_gt;
 
 void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
 
@@ -16,6 +17,9 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
 void intel_engine_park_heartbeat(struct intel_engine_cs *engine);
 void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine);
 
+void intel_gt_park_heartbeats(struct intel_gt *gt);
+void intel_gt_unpark_heartbeats(struct intel_gt *gt);
+
 int intel_engine_pulse(struct intel_engine_cs *engine);
 int intel_engine_flush_barriers(struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0f5823922369..2cbf9847b055 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -10,6 +10,7 @@
 #include "gt/intel_breadcrumbs.h"
 #include "gt/intel_context.h"
 #include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
@@ -601,6 +602,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 		/* Reset called during driver load? GuC not yet initialised! */
 		return;
 
+	intel_gt_park_heartbeats(guc_to_gt(guc));
 	disable_submission(guc);
 	guc->interrupts.disable(guc);
 
@@ -886,6 +888,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	atomic_set(&guc->outstanding_submission_g2h, 0);
 
 	enable_submission(guc);
+	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
 
 /*
@@ -1850,6 +1853,11 @@ static int guc_resume(struct intel_engine_cs *engine)
 	return 0;
 }
 
+static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine)
+{
+	return !sched_engine->tasklet.callback;
+}
+
 static void guc_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = guc_submit_request;
@@ -2011,6 +2019,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 			return -ENOMEM;
 
 		guc->sched_engine->schedule = i915_schedule;
+		guc->sched_engine->disabled = guc_sched_engine_disabled;
 		guc->sched_engine->private_data = guc;
 		guc->sched_engine->destroy = guc_sched_engine_destroy;
 		tasklet_setup(&guc->sched_engine->tasklet,
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 4fceda96deed..8766a8643469 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -440,6 +440,11 @@ static void default_destroy(struct kref *kref)
 	kfree(sched_engine);
 }
 
+static bool default_disabled(struct i915_sched_engine *sched_engine)
+{
+	return false;
+}
+
 struct i915_sched_engine *
 i915_sched_engine_create(unsigned int subclass)
 {
@@ -454,6 +459,7 @@ i915_sched_engine_create(unsigned int subclass)
 	sched_engine->queue = RB_ROOT_CACHED;
 	sched_engine->queue_priority_hint = INT_MIN;
 	sched_engine->destroy = default_destroy;
+	sched_engine->disabled = default_disabled;
 
 	INIT_LIST_HEAD(&sched_engine->requests);
 	INIT_LIST_HEAD(&sched_engine->hold);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 3c9504e9f409..f4d9811ade5b 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -96,4 +96,10 @@ void i915_request_show_with_schedule(struct drm_printer *m,
 				     const char *prefix,
 				     int indent);
 
+static inline bool
+i915_sched_engine_disabled(struct i915_sched_engine *sched_engine)
+{
+	return sched_engine->disabled(sched_engine);
+}
+
 #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 00384e2c5273..eaef233e9080 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -168,6 +168,11 @@ struct i915_sched_engine {
 	 */
 	void	(*destroy)(struct kref *kref);
 
+	/**
+	 * @disabled: check if backend has disabled submission
+	 */
+	bool	(*disabled)(struct i915_sched_engine *sched_engine);
+
 	/**
 	 * @kick_backend: kick backend after a request's priority has changed
 	 */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 27/51] drm/i915: Reset GPU immediately if submission is disabled
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

If submission is disabled by the backend for any reason, reset the GPU
immediately in the heartbeat code as the backend can't be reenabled
until the GPU is reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 63 +++++++++++++++----
 .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |  4 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  9 +++
 drivers/gpu/drm/i915/i915_scheduler.c         |  6 ++
 drivers/gpu/drm/i915/i915_scheduler.h         |  6 ++
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  5 ++
 6 files changed, 80 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index b6a305e6a974..a8495364d906 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -70,12 +70,30 @@ static void show_heartbeat(const struct i915_request *rq,
 {
 	struct drm_printer p = drm_debug_printer("heartbeat");
 
-	intel_engine_dump(engine, &p,
-			  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
-			  engine->name,
-			  rq->fence.context,
-			  rq->fence.seqno,
-			  rq->sched.attr.priority);
+	if (!rq) {
+		intel_engine_dump(engine, &p,
+				  "%s heartbeat not ticking\n",
+				  engine->name);
+	} else {
+		intel_engine_dump(engine, &p,
+				  "%s heartbeat {seqno:%llx:%lld, prio:%d} not ticking\n",
+				  engine->name,
+				  rq->fence.context,
+				  rq->fence.seqno,
+				  rq->sched.attr.priority);
+	}
+}
+
+static void
+reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
+		show_heartbeat(rq, engine);
+
+	intel_gt_handle_error(engine->gt, engine->mask,
+			      I915_ERROR_CAPTURE,
+			      "stopped heartbeat on %s",
+			      engine->name);
 }
 
 static void heartbeat(struct work_struct *wrk)
@@ -102,6 +120,11 @@ static void heartbeat(struct work_struct *wrk)
 	if (intel_gt_is_wedged(engine->gt))
 		goto out;
 
+	if (i915_sched_engine_disabled(engine->sched_engine)) {
+		reset_engine(engine, engine->heartbeat.systole);
+		goto out;
+	}
+
 	if (engine->heartbeat.systole) {
 		long delay = READ_ONCE(engine->props.heartbeat_interval_ms);
 
@@ -139,13 +162,7 @@ static void heartbeat(struct work_struct *wrk)
 			engine->sched_engine->schedule(rq, &attr);
 			local_bh_enable();
 		} else {
-			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
-				show_heartbeat(rq, engine);
-
-			intel_gt_handle_error(engine->gt, engine->mask,
-					      I915_ERROR_CAPTURE,
-					      "stopped heartbeat on %s",
-					      engine->name);
+			reset_engine(engine, rq);
 		}
 
 		rq->emitted_jiffies = jiffies;
@@ -194,6 +211,26 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine)
 		i915_request_put(fetch_and_zero(&engine->heartbeat.systole));
 }
 
+void intel_gt_unpark_heartbeats(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		if (intel_engine_pm_is_awake(engine))
+			intel_engine_unpark_heartbeat(engine);
+
+}
+
+void intel_gt_park_heartbeats(struct intel_gt *gt)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id)
+		intel_engine_park_heartbeat(engine);
+}
+
 void intel_engine_init_heartbeat(struct intel_engine_cs *engine)
 {
 	INIT_DELAYED_WORK(&engine->heartbeat.work, heartbeat);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
index a488ea3e84a3..5da6d809a87a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h
@@ -7,6 +7,7 @@
 #define INTEL_ENGINE_HEARTBEAT_H
 
 struct intel_engine_cs;
+struct intel_gt;
 
 void intel_engine_init_heartbeat(struct intel_engine_cs *engine);
 
@@ -16,6 +17,9 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,
 void intel_engine_park_heartbeat(struct intel_engine_cs *engine);
 void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine);
 
+void intel_gt_park_heartbeats(struct intel_gt *gt);
+void intel_gt_unpark_heartbeats(struct intel_gt *gt);
+
 int intel_engine_pulse(struct intel_engine_cs *engine);
 int intel_engine_flush_barriers(struct intel_engine_cs *engine);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0f5823922369..2cbf9847b055 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -10,6 +10,7 @@
 #include "gt/intel_breadcrumbs.h"
 #include "gt/intel_context.h"
 #include "gt/intel_engine_pm.h"
+#include "gt/intel_engine_heartbeat.h"
 #include "gt/intel_gt.h"
 #include "gt/intel_gt_irq.h"
 #include "gt/intel_gt_pm.h"
@@ -601,6 +602,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 		/* Reset called during driver load? GuC not yet initialised! */
 		return;
 
+	intel_gt_park_heartbeats(guc_to_gt(guc));
 	disable_submission(guc);
 	guc->interrupts.disable(guc);
 
@@ -886,6 +888,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	atomic_set(&guc->outstanding_submission_g2h, 0);
 
 	enable_submission(guc);
+	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
 
 /*
@@ -1850,6 +1853,11 @@ static int guc_resume(struct intel_engine_cs *engine)
 	return 0;
 }
 
+static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine)
+{
+	return !sched_engine->tasklet.callback;
+}
+
 static void guc_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = guc_submit_request;
@@ -2011,6 +2019,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 			return -ENOMEM;
 
 		guc->sched_engine->schedule = i915_schedule;
+		guc->sched_engine->disabled = guc_sched_engine_disabled;
 		guc->sched_engine->private_data = guc;
 		guc->sched_engine->destroy = guc_sched_engine_destroy;
 		tasklet_setup(&guc->sched_engine->tasklet,
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 4fceda96deed..8766a8643469 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -440,6 +440,11 @@ static void default_destroy(struct kref *kref)
 	kfree(sched_engine);
 }
 
+static bool default_disabled(struct i915_sched_engine *sched_engine)
+{
+	return false;
+}
+
 struct i915_sched_engine *
 i915_sched_engine_create(unsigned int subclass)
 {
@@ -454,6 +459,7 @@ i915_sched_engine_create(unsigned int subclass)
 	sched_engine->queue = RB_ROOT_CACHED;
 	sched_engine->queue_priority_hint = INT_MIN;
 	sched_engine->destroy = default_destroy;
+	sched_engine->disabled = default_disabled;
 
 	INIT_LIST_HEAD(&sched_engine->requests);
 	INIT_LIST_HEAD(&sched_engine->hold);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 3c9504e9f409..f4d9811ade5b 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -96,4 +96,10 @@ void i915_request_show_with_schedule(struct drm_printer *m,
 				     const char *prefix,
 				     int indent);
 
+static inline bool
+i915_sched_engine_disabled(struct i915_sched_engine *sched_engine)
+{
+	return sched_engine->disabled(sched_engine);
+}
+
 #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 00384e2c5273..eaef233e9080 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -168,6 +168,11 @@ struct i915_sched_engine {
 	 */
 	void	(*destroy)(struct kref *kref);
 
+	/**
+	 * @disabled: check if backend has disabled submission
+	 */
+	bool	(*disabled)(struct i915_sched_engine *sched_engine);
+
 	/**
 	 * @kick_backend: kick backend after a request's priority has changed
 	 */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 28/51] drm/i915/guc: Add disable interrupts to guc sanitize
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Add disable GuC interrupts to intel_guc_sanitize(). Part of this
requires moving the guc_*_interrupt wrapper function into header file
intel_guc.h.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h | 16 ++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c  | 21 +++------------------
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index d75a76882a44..b3cfc52fe0bc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -217,9 +217,25 @@ static inline bool intel_guc_is_ready(struct intel_guc *guc)
 	return intel_guc_is_fw_running(guc) && intel_guc_ct_enabled(&guc->ct);
 }
 
+static inline void intel_guc_reset_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.reset(guc);
+}
+
+static inline void intel_guc_enable_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.enable(guc);
+}
+
+static inline void intel_guc_disable_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.disable(guc);
+}
+
 static inline int intel_guc_sanitize(struct intel_guc *guc)
 {
 	intel_uc_fw_sanitize(&guc->fw);
+	intel_guc_disable_interrupts(guc);
 	intel_guc_ct_sanitize(&guc->ct);
 	guc->mmio_msg = 0;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index f0b02200aa01..ab11fe731ee7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -207,21 +207,6 @@ static void guc_handle_mmio_msg(struct intel_guc *guc)
 	spin_unlock_irq(&guc->irq_lock);
 }
 
-static void guc_reset_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.reset(guc);
-}
-
-static void guc_enable_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.enable(guc);
-}
-
-static void guc_disable_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.disable(guc);
-}
-
 static int guc_enable_communication(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
@@ -242,7 +227,7 @@ static int guc_enable_communication(struct intel_guc *guc)
 	guc_get_mmio_msg(guc);
 	guc_handle_mmio_msg(guc);
 
-	guc_enable_interrupts(guc);
+	intel_guc_enable_interrupts(guc);
 
 	/* check for CT messages received before we enabled interrupts */
 	spin_lock_irq(&gt->irq_lock);
@@ -265,7 +250,7 @@ static void guc_disable_communication(struct intel_guc *guc)
 	 */
 	guc_clear_mmio_msg(guc);
 
-	guc_disable_interrupts(guc);
+	intel_guc_disable_interrupts(guc);
 
 	intel_guc_ct_disable(&guc->ct);
 
@@ -463,7 +448,7 @@ static int __uc_init_hw(struct intel_uc *uc)
 	if (ret)
 		goto err_out;
 
-	guc_reset_interrupts(guc);
+	intel_guc_reset_interrupts(guc);
 
 	/* WaEnableuKernelHeaderValidFix:skl */
 	/* WaEnableGuCBootHashCheckNotSet:skl,bxt,kbl */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 28/51] drm/i915/guc: Add disable interrupts to guc sanitize
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Add disable GuC interrupts to intel_guc_sanitize(). Part of this
requires moving the guc_*_interrupt wrapper function into header file
intel_guc.h.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h | 16 ++++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c  | 21 +++------------------
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index d75a76882a44..b3cfc52fe0bc 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -217,9 +217,25 @@ static inline bool intel_guc_is_ready(struct intel_guc *guc)
 	return intel_guc_is_fw_running(guc) && intel_guc_ct_enabled(&guc->ct);
 }
 
+static inline void intel_guc_reset_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.reset(guc);
+}
+
+static inline void intel_guc_enable_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.enable(guc);
+}
+
+static inline void intel_guc_disable_interrupts(struct intel_guc *guc)
+{
+	guc->interrupts.disable(guc);
+}
+
 static inline int intel_guc_sanitize(struct intel_guc *guc)
 {
 	intel_uc_fw_sanitize(&guc->fw);
+	intel_guc_disable_interrupts(guc);
 	intel_guc_ct_sanitize(&guc->ct);
 	guc->mmio_msg = 0;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index f0b02200aa01..ab11fe731ee7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -207,21 +207,6 @@ static void guc_handle_mmio_msg(struct intel_guc *guc)
 	spin_unlock_irq(&guc->irq_lock);
 }
 
-static void guc_reset_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.reset(guc);
-}
-
-static void guc_enable_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.enable(guc);
-}
-
-static void guc_disable_interrupts(struct intel_guc *guc)
-{
-	guc->interrupts.disable(guc);
-}
-
 static int guc_enable_communication(struct intel_guc *guc)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
@@ -242,7 +227,7 @@ static int guc_enable_communication(struct intel_guc *guc)
 	guc_get_mmio_msg(guc);
 	guc_handle_mmio_msg(guc);
 
-	guc_enable_interrupts(guc);
+	intel_guc_enable_interrupts(guc);
 
 	/* check for CT messages received before we enabled interrupts */
 	spin_lock_irq(&gt->irq_lock);
@@ -265,7 +250,7 @@ static void guc_disable_communication(struct intel_guc *guc)
 	 */
 	guc_clear_mmio_msg(guc);
 
-	guc_disable_interrupts(guc);
+	intel_guc_disable_interrupts(guc);
 
 	intel_guc_ct_disable(&guc->ct);
 
@@ -463,7 +448,7 @@ static int __uc_init_hw(struct intel_uc *uc)
 	if (ret)
 		goto err_out;
 
-	guc_reset_interrupts(guc);
+	intel_guc_reset_interrupts(guc);
 
 	/* WaEnableuKernelHeaderValidFix:skl */
 	/* WaEnableGuCBootHashCheckNotSet:skl,bxt,kbl */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 29/51] drm/i915/guc: Suspend/resume implementation for new interface
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

The new GuC interface introduces an MMIO H2G command,
INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This
MMIO tears down any active contexts generating a context reset G2H CTB
for each. Once that step completes the GuC tears down the CTB
channels. It is safe to suspend once this MMIO H2G command completes
and all G2H CTBs have been processed. In practice the i915 will likely
never receive a G2H as suspend should only be called after the GPU is
idle.

Resume is implemented in the same manner as before - simply reload the
GuC firmware and reinitialize everything (e.g. CTB channels, contexts,
etc..).

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 64 ++++++++-----------
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 +++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         | 20 ++++--
 5 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 57e18babdf4b..596cf4b818e5 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
 	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+	INTEL_GUC_ACTION_RESET_CLIENT = 0x5B01,
 	INTEL_GUC_ACTION_LIMIT
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 9b09395b998f..68266cbffd1f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -524,51 +524,34 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
  */
 int intel_guc_suspend(struct intel_guc *guc)
 {
-	struct intel_uncore *uncore = guc_to_gt(guc)->uncore;
 	int ret;
-	u32 status;
 	u32 action[] = {
-		INTEL_GUC_ACTION_ENTER_S_STATE,
-		GUC_POWER_D1, /* any value greater than GUC_POWER_D0 */
+		INTEL_GUC_ACTION_RESET_CLIENT,
 	};
 
-	/*
-	 * If GuC communication is enabled but submission is not supported,
-	 * we do not need to suspend the GuC.
-	 */
-	if (!intel_guc_submission_is_used(guc) || !intel_guc_is_ready(guc))
+	if (!intel_guc_is_ready(guc))
 		return 0;
 
-	/*
-	 * The ENTER_S_STATE action queues the save/restore operation in GuC FW
-	 * and then returns, so waiting on the H2G is not enough to guarantee
-	 * GuC is done. When all the processing is done, GuC writes
-	 * INTEL_GUC_SLEEP_STATE_SUCCESS to scratch register 14, so we can poll
-	 * on that. Note that GuC does not ensure that the value in the register
-	 * is different from INTEL_GUC_SLEEP_STATE_SUCCESS while the action is
-	 * in progress so we need to take care of that ourselves as well.
-	 */
-
-	intel_uncore_write(uncore, SOFT_SCRATCH(14),
-			   INTEL_GUC_SLEEP_STATE_INVALID_MASK);
-
-	ret = intel_guc_send(guc, action, ARRAY_SIZE(action));
-	if (ret)
-		return ret;
-
-	ret = __intel_wait_for_register(uncore, SOFT_SCRATCH(14),
-					INTEL_GUC_SLEEP_STATE_INVALID_MASK,
-					0, 0, 10, &status);
-	if (ret)
-		return ret;
-
-	if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) {
-		DRM_ERROR("GuC failed to change sleep state. "
-			  "action=0x%x, err=%u\n",
-			  action[0], status);
-		return -EIO;
+	if (intel_guc_submission_is_used(guc)) {
+		/*
+		 * This H2G MMIO command tears down the GuC in two steps. First it will
+		 * generate a G2H CTB for every active context indicating a reset. In
+		 * practice the i915 shouldn't ever get a G2H as suspend should only be
+		 * called when the GPU is idle. Next, it tears down the CTBs and this
+		 * H2G MMIO command completes.
+		 *
+		 * Don't abort on a failure code from the GuC. Keep going and do the
+		 * clean up in santize() and re-initialisation on resume and hopefully
+		 * the error here won't be problematic.
+		 */
+		ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0);
+		if (ret)
+			DRM_ERROR("GuC suspend: RESET_CLIENT action failed with error %d!\n", ret);
 	}
 
+	/* Signal that the GuC isn't running. */
+	intel_guc_sanitize(guc);
+
 	return 0;
 }
 
@@ -578,7 +561,12 @@ int intel_guc_suspend(struct intel_guc *guc)
  */
 int intel_guc_resume(struct intel_guc *guc)
 {
-	/* XXX: to be implemented with submission interface rework */
+	/*
+	 * NB: This function can still be called even if GuC submission is
+	 * disabled, e.g. if GuC is enabled for HuC authentication only. Thus,
+	 * if any code is later added here, it must be support doing nothing
+	 * if submission is disabled (as per intel_guc_suspend).
+	 */
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2cbf9847b055..fdb17279095c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -304,10 +304,10 @@ static int guc_submission_send_busy_loop(struct intel_guc* guc,
 	return err;
 }
 
-static int guc_wait_for_pending_msg(struct intel_guc *guc,
-				    atomic_t *wait_var,
-				    bool interruptible,
-				    long timeout)
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
+				   atomic_t *wait_var,
+				   bool interruptible,
+				   long timeout)
 {
 	const int state = interruptible ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
@@ -347,8 +347,9 @@ static int guc_wait_for_pending_msg(struct intel_guc *guc,
 
 int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 {
-	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
-					true, timeout);
+	return intel_guc_wait_for_pending_msg(guc,
+					      &guc->outstanding_submission_g2h,
+					      true, timeout);
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
@@ -621,7 +622,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
 		intel_guc_to_host_event_handler(guc);
 #define wait_for_reset(guc, wait_var) \
-		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
+		intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
 		do {
 			wait_for_reset(guc, &guc->outstanding_submission_g2h);
 		} while (!list_empty(&guc->ct.requests.incoming));
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 5f263ac4f46a..08ff77c5c50e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -28,6 +28,11 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 
 bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
 
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
+				   atomic_t *wait_var,
+				   bool interruptible,
+				   long timeout);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
 	/* XXX: GuC submission is unavailable for now */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index ab11fe731ee7..b523a8521351 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -596,14 +596,18 @@ void intel_uc_cancel_requests(struct intel_uc *uc)
 void intel_uc_runtime_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
-	int err;
 
 	if (!intel_guc_is_ready(guc))
 		return;
 
-	err = intel_guc_suspend(guc);
-	if (err)
-		DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
+	/*
+	 * Wait for any outstanding CTB before tearing down communication /w the
+	 * GuC.
+	 */
+#define OUTSTANDING_CTB_TIMEOUT_PERIOD	(HZ / 5)
+	intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+				       false, OUTSTANDING_CTB_TIMEOUT_PERIOD);
+	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
 
 	guc_disable_communication(guc);
 }
@@ -612,12 +616,16 @@ void intel_uc_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 	intel_wakeref_t wakeref;
+	int err;
 
 	if (!intel_guc_is_ready(guc))
 		return;
 
-	with_intel_runtime_pm(uc_to_gt(uc)->uncore->rpm, wakeref)
-		intel_uc_runtime_suspend(uc);
+	with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
+		err = intel_guc_suspend(guc);
+		if (err)
+			DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
+	}
 }
 
 static int __uc_resume(struct intel_uc *uc, bool enable_communication)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 29/51] drm/i915/guc: Suspend/resume implementation for new interface
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

The new GuC interface introduces an MMIO H2G command,
INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This
MMIO tears down any active contexts generating a context reset G2H CTB
for each. Once that step completes the GuC tears down the CTB
channels. It is safe to suspend once this MMIO H2G command completes
and all G2H CTBs have been processed. In practice the i915 will likely
never receive a G2H as suspend should only be called after the GPU is
idle.

Resume is implemented in the same manner as before - simply reload the
GuC firmware and reinitialize everything (e.g. CTB channels, contexts,
etc..).

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c        | 64 ++++++++-----------
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 15 +++--
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
 drivers/gpu/drm/i915/gt/uc/intel_uc.c         | 20 ++++--
 5 files changed, 54 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
index 57e18babdf4b..596cf4b818e5 100644
--- a/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
+++ b/drivers/gpu/drm/i915/gt/uc/abi/guc_actions_abi.h
@@ -142,6 +142,7 @@ enum intel_guc_action {
 	INTEL_GUC_ACTION_REGISTER_COMMAND_TRANSPORT_BUFFER = 0x4505,
 	INTEL_GUC_ACTION_DEREGISTER_COMMAND_TRANSPORT_BUFFER = 0x4506,
 	INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE = 0x4600,
+	INTEL_GUC_ACTION_RESET_CLIENT = 0x5B01,
 	INTEL_GUC_ACTION_LIMIT
 };
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 9b09395b998f..68266cbffd1f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -524,51 +524,34 @@ int intel_guc_auth_huc(struct intel_guc *guc, u32 rsa_offset)
  */
 int intel_guc_suspend(struct intel_guc *guc)
 {
-	struct intel_uncore *uncore = guc_to_gt(guc)->uncore;
 	int ret;
-	u32 status;
 	u32 action[] = {
-		INTEL_GUC_ACTION_ENTER_S_STATE,
-		GUC_POWER_D1, /* any value greater than GUC_POWER_D0 */
+		INTEL_GUC_ACTION_RESET_CLIENT,
 	};
 
-	/*
-	 * If GuC communication is enabled but submission is not supported,
-	 * we do not need to suspend the GuC.
-	 */
-	if (!intel_guc_submission_is_used(guc) || !intel_guc_is_ready(guc))
+	if (!intel_guc_is_ready(guc))
 		return 0;
 
-	/*
-	 * The ENTER_S_STATE action queues the save/restore operation in GuC FW
-	 * and then returns, so waiting on the H2G is not enough to guarantee
-	 * GuC is done. When all the processing is done, GuC writes
-	 * INTEL_GUC_SLEEP_STATE_SUCCESS to scratch register 14, so we can poll
-	 * on that. Note that GuC does not ensure that the value in the register
-	 * is different from INTEL_GUC_SLEEP_STATE_SUCCESS while the action is
-	 * in progress so we need to take care of that ourselves as well.
-	 */
-
-	intel_uncore_write(uncore, SOFT_SCRATCH(14),
-			   INTEL_GUC_SLEEP_STATE_INVALID_MASK);
-
-	ret = intel_guc_send(guc, action, ARRAY_SIZE(action));
-	if (ret)
-		return ret;
-
-	ret = __intel_wait_for_register(uncore, SOFT_SCRATCH(14),
-					INTEL_GUC_SLEEP_STATE_INVALID_MASK,
-					0, 0, 10, &status);
-	if (ret)
-		return ret;
-
-	if (status != INTEL_GUC_SLEEP_STATE_SUCCESS) {
-		DRM_ERROR("GuC failed to change sleep state. "
-			  "action=0x%x, err=%u\n",
-			  action[0], status);
-		return -EIO;
+	if (intel_guc_submission_is_used(guc)) {
+		/*
+		 * This H2G MMIO command tears down the GuC in two steps. First it will
+		 * generate a G2H CTB for every active context indicating a reset. In
+		 * practice the i915 shouldn't ever get a G2H as suspend should only be
+		 * called when the GPU is idle. Next, it tears down the CTBs and this
+		 * H2G MMIO command completes.
+		 *
+		 * Don't abort on a failure code from the GuC. Keep going and do the
+		 * clean up in santize() and re-initialisation on resume and hopefully
+		 * the error here won't be problematic.
+		 */
+		ret = intel_guc_send_mmio(guc, action, ARRAY_SIZE(action), NULL, 0);
+		if (ret)
+			DRM_ERROR("GuC suspend: RESET_CLIENT action failed with error %d!\n", ret);
 	}
 
+	/* Signal that the GuC isn't running. */
+	intel_guc_sanitize(guc);
+
 	return 0;
 }
 
@@ -578,7 +561,12 @@ int intel_guc_suspend(struct intel_guc *guc)
  */
 int intel_guc_resume(struct intel_guc *guc)
 {
-	/* XXX: to be implemented with submission interface rework */
+	/*
+	 * NB: This function can still be called even if GuC submission is
+	 * disabled, e.g. if GuC is enabled for HuC authentication only. Thus,
+	 * if any code is later added here, it must be support doing nothing
+	 * if submission is disabled (as per intel_guc_suspend).
+	 */
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2cbf9847b055..fdb17279095c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -304,10 +304,10 @@ static int guc_submission_send_busy_loop(struct intel_guc* guc,
 	return err;
 }
 
-static int guc_wait_for_pending_msg(struct intel_guc *guc,
-				    atomic_t *wait_var,
-				    bool interruptible,
-				    long timeout)
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
+				   atomic_t *wait_var,
+				   bool interruptible,
+				   long timeout)
 {
 	const int state = interruptible ?
 		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
@@ -347,8 +347,9 @@ static int guc_wait_for_pending_msg(struct intel_guc *guc,
 
 int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
 {
-	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
-					true, timeout);
+	return intel_guc_wait_for_pending_msg(guc,
+					      &guc->outstanding_submission_g2h,
+					      true, timeout);
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
@@ -621,7 +622,7 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc)
 	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
 		intel_guc_to_host_event_handler(guc);
 #define wait_for_reset(guc, wait_var) \
-		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
+		intel_guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
 		do {
 			wait_for_reset(guc, &guc->outstanding_submission_g2h);
 		} while (!list_empty(&guc->ct.requests.incoming));
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 5f263ac4f46a..08ff77c5c50e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -28,6 +28,11 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 
 bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
 
+int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
+				   atomic_t *wait_var,
+				   bool interruptible,
+				   long timeout);
+
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
 	/* XXX: GuC submission is unavailable for now */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index ab11fe731ee7..b523a8521351 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -596,14 +596,18 @@ void intel_uc_cancel_requests(struct intel_uc *uc)
 void intel_uc_runtime_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
-	int err;
 
 	if (!intel_guc_is_ready(guc))
 		return;
 
-	err = intel_guc_suspend(guc);
-	if (err)
-		DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
+	/*
+	 * Wait for any outstanding CTB before tearing down communication /w the
+	 * GuC.
+	 */
+#define OUTSTANDING_CTB_TIMEOUT_PERIOD	(HZ / 5)
+	intel_guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
+				       false, OUTSTANDING_CTB_TIMEOUT_PERIOD);
+	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
 
 	guc_disable_communication(guc);
 }
@@ -612,12 +616,16 @@ void intel_uc_suspend(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 	intel_wakeref_t wakeref;
+	int err;
 
 	if (!intel_guc_is_ready(guc))
 		return;
 
-	with_intel_runtime_pm(uc_to_gt(uc)->uncore->rpm, wakeref)
-		intel_uc_runtime_suspend(uc);
+	with_intel_runtime_pm(&uc_to_gt(uc)->i915->runtime_pm, wakeref) {
+		err = intel_guc_suspend(guc);
+		if (err)
+			DRM_DEBUG_DRIVER("Failed to suspend GuC, err=%d", err);
+	}
 }
 
 static int __uc_resume(struct intel_uc *uc, bool enable_communication)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 30/51] drm/i915/guc: Handle context reset notification
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

GuC will issue a reset on detecting an engine hang and will notify
the driver via a G2H message. The driver will service the notification
by resetting the guilty context to a simple state or banning it
completely.

v2:
 (John Harrison)
  - Move msg[0] lookup after length check

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_trace.h             | 10 ++++++
 4 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index b3cfc52fe0bc..f23a3a618550 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
 int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+					const u32 *msg, u32 len);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 503a78517610..c4f9b44b9f86 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -981,6 +981,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
 		ret = intel_guc_sched_done_process_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
+		ret = intel_guc_context_reset_process_msg(guc, payload, len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index fdb17279095c..feaf1ca61eaa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2196,6 +2196,42 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static void guc_context_replay(struct intel_context *ce)
+{
+	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+
+	__guc_reset_context(ce, true);
+	tasklet_hi_schedule(&sched_engine->tasklet);
+}
+
+static void guc_handle_context_reset(struct intel_guc *guc,
+				     struct intel_context *ce)
+{
+	trace_intel_context_reset(ce);
+	guc_context_replay(ce);
+}
+
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+					const u32 *msg, u32 len)
+{
+	struct intel_context *ce;
+	int desc_idx;
+
+	if (unlikely(len != 1)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	desc_idx = msg[0];
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	guc_handle_context_reset(guc, ce);
+
+	return 0;
+}
+
 void intel_guc_submission_print_info(struct intel_guc *guc,
 				     struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 97c2e83984ed..c095c4d39456 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
 		      __entry->guc_sched_state_no_lock)
 );
 
+DEFINE_EVENT(intel_context, intel_context_reset,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 DEFINE_EVENT(intel_context, intel_context_register,
 	     TP_PROTO(struct intel_context *ce),
 	     TP_ARGS(ce)
@@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
 {
 }
 
+static inline void
+trace_intel_context_reset(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_register(struct intel_context *ce)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 30/51] drm/i915/guc: Handle context reset notification
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

GuC will issue a reset on detecting an engine hang and will notify
the driver via a G2H message. The driver will service the notification
by resetting the guilty context to a simple state or banning it
completely.

v2:
 (John Harrison)
  - Move msg[0] lookup after length check

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 ++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_trace.h             | 10 ++++++
 4 files changed, 51 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index b3cfc52fe0bc..f23a3a618550 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
 					  const u32 *msg, u32 len);
 int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+					const u32 *msg, u32 len);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 503a78517610..c4f9b44b9f86 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -981,6 +981,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
 		ret = intel_guc_sched_done_process_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
+		ret = intel_guc_context_reset_process_msg(guc, payload, len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index fdb17279095c..feaf1ca61eaa 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2196,6 +2196,42 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static void guc_context_replay(struct intel_context *ce)
+{
+	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+
+	__guc_reset_context(ce, true);
+	tasklet_hi_schedule(&sched_engine->tasklet);
+}
+
+static void guc_handle_context_reset(struct intel_guc *guc,
+				     struct intel_context *ce)
+{
+	trace_intel_context_reset(ce);
+	guc_context_replay(ce);
+}
+
+int intel_guc_context_reset_process_msg(struct intel_guc *guc,
+					const u32 *msg, u32 len)
+{
+	struct intel_context *ce;
+	int desc_idx;
+
+	if (unlikely(len != 1)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	desc_idx = msg[0];
+	ce = g2h_context_lookup(guc, desc_idx);
+	if (unlikely(!ce))
+		return -EPROTO;
+
+	guc_handle_context_reset(guc, ce);
+
+	return 0;
+}
+
 void intel_guc_submission_print_info(struct intel_guc *guc,
 				     struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 97c2e83984ed..c095c4d39456 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
 		      __entry->guc_sched_state_no_lock)
 );
 
+DEFINE_EVENT(intel_context, intel_context_reset,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 DEFINE_EVENT(intel_context, intel_context_register,
 	     TP_PROTO(struct intel_context *ce),
 	     TP_ARGS(ce)
@@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
 {
 }
 
+static inline void
+trace_intel_context_reset(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_register(struct intel_context *ce)
 {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 31/51] drm/i915/guc: Handle engine reset failure notification
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

GuC will notify the driver, via G2H, if it fails to
reset an engine. We recover by resorting to a full GPU
reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 +++++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index f23a3a618550..7f14e1873010 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -264,6 +264,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
 int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 					const u32 *msg, u32 len);
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+					 const u32 *msg, u32 len);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index c4f9b44b9f86..d16381784ee2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -984,6 +984,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
 		ret = intel_guc_context_reset_process_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION:
+		ret = intel_guc_engine_failure_process_msg(guc, payload, len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index feaf1ca61eaa..035633f567b5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2232,6 +2232,49 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static struct intel_engine_cs *
+guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	u8 engine_class = guc_class_to_engine_class(guc_class);
+
+	/* Class index is checked in class converter */
+	GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE);
+
+	return gt->engine_class[engine_class][instance];
+}
+
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+					 const u32 *msg, u32 len)
+{
+	struct intel_engine_cs *engine;
+	u8 guc_class, instance;
+	u32 reason;
+
+	if (unlikely(len != 3)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	guc_class = msg[0];
+	instance = msg[1];
+	reason = msg[2];
+
+	engine = guc_lookup_engine(guc, guc_class, instance);
+	if (unlikely(!engine)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Invalid engine %d:%d", guc_class, instance);
+		return -EPROTO;
+	}
+
+	intel_gt_handle_error(guc_to_gt(guc), engine->mask,
+			      I915_ERROR_CAPTURE,
+			      "GuC failed to reset %s (reason=0x%08x)\n",
+			      engine->name, reason);
+
+	return 0;
+}
+
 void intel_guc_submission_print_info(struct intel_guc *guc,
 				     struct drm_printer *p)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 31/51] drm/i915/guc: Handle engine reset failure notification
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

GuC will notify the driver, via G2H, if it fails to
reset an engine. We recover by resorting to a full GPU
reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 43 +++++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index f23a3a618550..7f14e1873010 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -264,6 +264,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 				     const u32 *msg, u32 len);
 int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 					const u32 *msg, u32 len);
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+					 const u32 *msg, u32 len);
 
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index c4f9b44b9f86..d16381784ee2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -984,6 +984,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
 	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
 		ret = intel_guc_context_reset_process_msg(guc, payload, len);
 		break;
+	case INTEL_GUC_ACTION_ENGINE_FAILURE_NOTIFICATION:
+		ret = intel_guc_engine_failure_process_msg(guc, payload, len);
+		break;
 	default:
 		ret = -EOPNOTSUPP;
 		break;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index feaf1ca61eaa..035633f567b5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2232,6 +2232,49 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static struct intel_engine_cs *
+guc_lookup_engine(struct intel_guc *guc, u8 guc_class, u8 instance)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	u8 engine_class = guc_class_to_engine_class(guc_class);
+
+	/* Class index is checked in class converter */
+	GEM_BUG_ON(instance > MAX_ENGINE_INSTANCE);
+
+	return gt->engine_class[engine_class][instance];
+}
+
+int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
+					 const u32 *msg, u32 len)
+{
+	struct intel_engine_cs *engine;
+	u8 guc_class, instance;
+	u32 reason;
+
+	if (unlikely(len != 3)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
+		return -EPROTO;
+	}
+
+	guc_class = msg[0];
+	instance = msg[1];
+	reason = msg[2];
+
+	engine = guc_lookup_engine(guc, guc_class, instance);
+	if (unlikely(!engine)) {
+		drm_dbg(&guc_to_gt(guc)->i915->drm,
+			"Invalid engine %d:%d", guc_class, instance);
+		return -EPROTO;
+	}
+
+	intel_gt_handle_error(guc_to_gt(guc), engine->mask,
+			      I915_ERROR_CAPTURE,
+			      "GuC failed to reset %s (reason=0x%08x)\n",
+			      engine->name, reason);
+
+	return 0;
+}
+
 void intel_guc_submission_print_info(struct intel_guc *guc,
 				     struct drm_printer *p)
 {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 32/51] drm/i915/guc: Enable the timer expired interrupt for GuC
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

The GuC can implement execution qunatums, detect hung contexts and
other such things but it requires the timer expired interrupt to do so.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index 06e9a8ed4e03..0c8e7f2b06f0 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1877,6 +1877,10 @@ void intel_rps_init(struct intel_rps *rps)
 
 	if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) < 11)
 		rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC;
+
+	/* GuC needs ARAT expired interrupt unmasked */
+	if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc))
+		rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK;
 }
 
 void intel_rps_sanitize(struct intel_rps *rps)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 32/51] drm/i915/guc: Enable the timer expired interrupt for GuC
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

The GuC can implement execution qunatums, detect hung contexts and
other such things but it requires the timer expired interrupt to do so.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
index 06e9a8ed4e03..0c8e7f2b06f0 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1877,6 +1877,10 @@ void intel_rps_init(struct intel_rps *rps)
 
 	if (GRAPHICS_VER(i915) >= 8 && GRAPHICS_VER(i915) < 11)
 		rps->pm_intrmsk_mbz |= GEN8_PMINTR_DISABLE_REDIRECT_TO_GUC;
+
+	/* GuC needs ARAT expired interrupt unmasked */
+	if (intel_uc_uses_guc_submission(&rps_to_gt(rps)->uc))
+		rps->pm_intrmsk_mbz |= ARAT_EXPIRED_INTRMSK;
 }
 
 void intel_rps_sanitize(struct intel_rps *rps)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 33/51] drm/i915/guc: Provide mmio list to be saved/restored on engine reset
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

The driver must provide GuC with a list of mmio registers
that should be saved/restored during a GuC-based engine reset.
Unfortunately, the list must be dynamically allocated as its size is
variable. That means the driver must generate the list twice - once to
work out the size and a second time to actually save it.

v2:
 (Alan / CI)
  - GEN7_GT_MODE -> GEN6_GT_MODE to fix WA selftest failure

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |  46 ++--
 .../gpu/drm/i915/gt/intel_workarounds_types.h |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 199 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h               |   1 +
 5 files changed, 222 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 72562c233ad2..34738ccab8bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -150,13 +150,14 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 }
 
 static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
-		   u32 clear, u32 set, u32 read_mask)
+		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
 {
 	struct i915_wa wa = {
 		.reg  = reg,
 		.clr  = clear,
 		.set  = set,
 		.read = read_mask,
+		.masked_reg = masked_reg,
 	};
 
 	_wa_add(wal, &wa);
@@ -165,7 +166,7 @@ static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
 static void
 wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set)
 {
-	wa_add(wal, reg, clear, set, clear);
+	wa_add(wal, reg, clear, set, clear, false);
 }
 
 static void
@@ -200,20 +201,20 @@ wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr)
 static void
 wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val);
+	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true);
 }
 
 static void
 wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val);
+	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true);
 }
 
 static void
 wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg,
 		    u32 mask, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask);
+	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true);
 }
 
 static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -583,10 +584,10 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs *engine,
 			     GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC);
 
 	/* WaEnableFloatBlendOptimization:icl */
-	wa_write_clr_set(wal,
-			 GEN10_CACHE_MODE_SS,
-			 0, /* write-only, so skip validation */
-			 _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE));
+	wa_add(wal, GEN10_CACHE_MODE_SS, 0,
+	       _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE),
+	       0 /* write-only, so skip validation */,
+	       true);
 
 	/* WaDisableGPGPUMidThreadPreemption:icl */
 	wa_masked_field_set(wal, GEN8_CS_CHICKEN1,
@@ -631,7 +632,7 @@ static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine,
 	       FF_MODE2,
 	       FF_MODE2_TDS_TIMER_MASK,
 	       FF_MODE2_TDS_TIMER_128,
-	       0);
+	       0, false);
 }
 
 static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -669,7 +670,7 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
 	       FF_MODE2,
 	       FF_MODE2_GS_TIMER_MASK,
 	       FF_MODE2_GS_TIMER_224,
-	       0);
+	       0, false);
 
 	/*
 	 * Wa_14012131227:dg1
@@ -847,7 +848,7 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
 	wa_add(wal,
 	       HSW_ROW_CHICKEN3, 0,
 	       _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE),
-		0 /* XXX does this reg exist? */);
+	       0 /* XXX does this reg exist? */, true);
 
 	/* WaVSRefCountFullforceMissDisable:hsw */
 	wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
@@ -1937,10 +1938,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 * disable bit, which we don't touch here, but it's good
 		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 		 */
-		wa_add(wal, GEN7_GT_MODE, 0,
-		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK,
-				     GEN6_WIZ_HASHING_16x4),
-		       GEN6_WIZ_HASHING_16x4);
+		wa_masked_field_set(wal,
+				    GEN7_GT_MODE,
+				    GEN6_WIZ_HASHING_MASK,
+				    GEN6_WIZ_HASHING_16x4);
 	}
 
 	if (IS_GRAPHICS_VER(i915, 6, 7))
@@ -1990,10 +1991,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 * disable bit, which we don't touch here, but it's good
 		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 		 */
-		wa_add(wal,
-		       GEN6_GT_MODE, 0,
-		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4),
-		       GEN6_WIZ_HASHING_16x4);
+		wa_masked_field_set(wal,
+				    GEN6_GT_MODE,
+				    GEN6_WIZ_HASHING_MASK,
+				    GEN6_WIZ_HASHING_16x4);
 
 		/* WaDisable_RenderCache_OperationalFlush:snb */
 		wa_masked_dis(wal, CACHE_MODE_0, RC_OP_FLUSH_ENABLE);
@@ -2014,7 +2015,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		wa_add(wal, MI_MODE,
 		       0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH),
 		       /* XXX bit doesn't stick on Broadwater */
-		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH);
+		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH, true);
 
 	if (GRAPHICS_VER(i915) == 4)
 		/*
@@ -2029,7 +2030,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 */
 		wa_add(wal, ECOSKPD,
 		       0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE),
-		       0 /* XXX bit doesn't stick on Broadwater */);
+		       0 /* XXX bit doesn't stick on Broadwater */,
+		       true);
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
index c214111ea367..1e873681795d 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
@@ -15,6 +15,7 @@ struct i915_wa {
 	u32		clr;
 	u32		set;
 	u32		read;
+	bool		masked_reg;
 };
 
 struct i915_wa_list {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 7f14e1873010..3897abb59dba 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -59,6 +59,7 @@ struct intel_guc {
 
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
+	u32 ads_regset_size;
 
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index b82145652d57..9fd3c911f5fb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -3,6 +3,8 @@
  * Copyright © 2014-2019 Intel Corporation
  */
 
+#include <linux/bsearch.h>
+
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc.h"
 #include "intel_guc_ads.h"
@@ -23,7 +25,12 @@
  *      | guc_policies                          |
  *      +---------------------------------------+
  *      | guc_gt_system_info                    |
- *      +---------------------------------------+
+ *      +---------------------------------------+ <== static
+ *      | guc_mmio_reg[countA] (engine 0.0)     |
+ *      | guc_mmio_reg[countB] (engine 0.1)     |
+ *      | guc_mmio_reg[countC] (engine 1.0)     |
+ *      |   ...                                 |
+ *      +---------------------------------------+ <== dynamic
  *      | padding                               |
  *      +---------------------------------------+ <== 4K aligned
  *      | private data                          |
@@ -35,16 +42,33 @@ struct __guc_ads_blob {
 	struct guc_ads ads;
 	struct guc_policies policies;
 	struct guc_gt_system_info system_info;
+	/* From here on, location is dynamic! Refer to above diagram. */
+	struct guc_mmio_reg regset[0];
 } __packed;
 
+static u32 guc_ads_regset_size(struct intel_guc *guc)
+{
+	GEM_BUG_ON(!guc->ads_regset_size);
+	return guc->ads_regset_size;
+}
+
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
 {
 	return PAGE_ALIGN(guc->fw.private_data_size);
 }
 
+static u32 guc_ads_regset_offset(struct intel_guc *guc)
+{
+	return offsetof(struct __guc_ads_blob, regset);
+}
+
 static u32 guc_ads_private_data_offset(struct intel_guc *guc)
 {
-	return PAGE_ALIGN(sizeof(struct __guc_ads_blob));
+	u32 offset;
+
+	offset = guc_ads_regset_offset(guc) +
+		 guc_ads_regset_size(guc);
+	return PAGE_ALIGN(offset);
 }
 
 static u32 guc_ads_blob_size(struct intel_guc *guc)
@@ -83,6 +107,165 @@ static void guc_mapping_table_init(struct intel_gt *gt,
 	}
 }
 
+/*
+ * The save/restore register list must be pre-calculated to a temporary
+ * buffer of driver defined size before it can be generated in place
+ * inside the ADS.
+ */
+#define MAX_MMIO_REGS	128	/* Arbitrary size, increase as needed */
+struct temp_regset {
+	struct guc_mmio_reg *registers;
+	u32 used;
+	u32 size;
+};
+
+static int guc_mmio_reg_cmp(const void *a, const void *b)
+{
+	const struct guc_mmio_reg *ra = a;
+	const struct guc_mmio_reg *rb = b;
+
+	return (int)ra->offset - (int)rb->offset;
+}
+
+static void guc_mmio_reg_add(struct temp_regset *regset,
+			     u32 offset, u32 flags)
+{
+	u32 count = regset->used;
+	struct guc_mmio_reg reg = {
+		.offset = offset,
+		.flags = flags,
+	};
+	struct guc_mmio_reg *slot;
+
+	GEM_BUG_ON(count >= regset->size);
+
+	/*
+	 * The mmio list is built using separate lists within the driver.
+	 * It's possible that at some point we may attempt to add the same
+	 * register more than once. Do not consider this an error; silently
+	 * move on if the register is already in the list.
+	 */
+	if (bsearch(&reg, regset->registers, count,
+		    sizeof(reg), guc_mmio_reg_cmp))
+		return;
+
+	slot = &regset->registers[count];
+	regset->used++;
+	*slot = reg;
+
+	while (slot-- > regset->registers) {
+		GEM_BUG_ON(slot[0].offset == slot[1].offset);
+		if (slot[1].offset > slot[0].offset)
+			break;
+
+		swap(slot[1], slot[0]);
+	}
+}
+
+#define GUC_MMIO_REG_ADD(regset, reg, masked) \
+	guc_mmio_reg_add(regset, \
+			 i915_mmio_reg_offset((reg)), \
+			 (masked) ? GUC_REGSET_MASKED : 0)
+
+static void guc_mmio_regset_init(struct temp_regset *regset,
+				 struct intel_engine_cs *engine)
+{
+	const u32 base = engine->mmio_base;
+	struct i915_wa_list *wal = &engine->wa_list;
+	struct i915_wa *wa;
+	unsigned int i;
+
+	regset->used = 0;
+
+	GUC_MMIO_REG_ADD(regset, RING_MODE_GEN7(base), true);
+	GUC_MMIO_REG_ADD(regset, RING_HWS_PGA(base), false);
+	GUC_MMIO_REG_ADD(regset, RING_IMR(base), false);
+
+	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
+		GUC_MMIO_REG_ADD(regset, wa->reg, wa->masked_reg);
+
+	/* Be extra paranoid and include all whitelist registers. */
+	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
+		GUC_MMIO_REG_ADD(regset,
+				 RING_FORCE_TO_NONPRIV(base, i),
+				 false);
+
+	/* add in local MOCS registers */
+	for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
+		GUC_MMIO_REG_ADD(regset, GEN9_LNCFCMOCS(i), false);
+}
+
+static int guc_mmio_reg_state_query(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct temp_regset temp_set;
+	u32 total;
+
+	/*
+	 * Need to actually build the list in order to filter out
+	 * duplicates and other such data dependent constructions.
+	 */
+	temp_set.size = MAX_MMIO_REGS;
+	temp_set.registers = kmalloc_array(temp_set.size,
+					  sizeof(*temp_set.registers),
+					  GFP_KERNEL);
+	if (!temp_set.registers)
+		return -ENOMEM;
+
+	total = 0;
+	for_each_engine(engine, gt, id) {
+		guc_mmio_regset_init(&temp_set, engine);
+		total += temp_set.used;
+	}
+
+	kfree(temp_set.registers);
+
+	return total * sizeof(struct guc_mmio_reg);
+}
+
+static void guc_mmio_reg_state_init(struct intel_guc *guc,
+				    struct __guc_ads_blob *blob)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct temp_regset temp_set;
+	struct guc_mmio_reg_set *ads_reg_set;
+	u32 addr_ggtt, offset;
+	u8 guc_class;
+
+	offset = guc_ads_regset_offset(guc);
+	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+	temp_set.registers = (struct guc_mmio_reg *) (((u8 *) blob) + offset);
+	temp_set.size = guc->ads_regset_size / sizeof(temp_set.registers[0]);
+
+	for_each_engine(engine, gt, id) {
+		/* Class index is checked in class converter */
+		GEM_BUG_ON(engine->instance >= GUC_MAX_INSTANCES_PER_CLASS);
+
+		guc_class = engine_class_to_guc_class(engine->class);
+		ads_reg_set = &blob->ads.reg_state_list[guc_class][engine->instance];
+
+		guc_mmio_regset_init(&temp_set, engine);
+		if (!temp_set.used) {
+			ads_reg_set->address = 0;
+			ads_reg_set->count = 0;
+			continue;
+		}
+
+		ads_reg_set->address = addr_ggtt;
+		ads_reg_set->count = temp_set.used;
+
+		temp_set.size -= temp_set.used;
+		temp_set.registers += temp_set.used;
+		addr_ggtt += temp_set.used * sizeof(struct guc_mmio_reg);
+	}
+
+	GEM_BUG_ON(temp_set.size);
+}
+
 /*
  * The first 80 dwords of the register state context, containing the
  * execlists and ppgtt registers.
@@ -121,8 +304,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 		 */
 		blob->ads.golden_context_lrca[guc_class] = 0;
 		blob->ads.eng_state_size[guc_class] =
-			intel_engine_context_size(guc_to_gt(guc),
-						  engine_class) -
+			intel_engine_context_size(gt, engine_class) -
 			skipped_size;
 	}
 
@@ -153,6 +335,9 @@ static void __guc_ads_init(struct intel_guc *guc)
 	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
 	blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
 
+	/* MMIO save/restore list */
+	guc_mmio_reg_state_init(guc, blob);
+
 	/* Private Data */
 	blob->ads.private_data = base + guc_ads_private_data_offset(guc);
 
@@ -173,6 +358,12 @@ int intel_guc_ads_create(struct intel_guc *guc)
 
 	GEM_BUG_ON(guc->ads_vma);
 
+	/* Need to calculate the reg state size dynamically: */
+	ret = guc_mmio_reg_state_query(guc);
+	if (ret < 0)
+		return ret;
+	guc->ads_regset_size = ret;
+
 	size = guc_ads_blob_size(guc);
 
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 204c95c39353..3584e4d03dc3 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -12309,6 +12309,7 @@ enum skl_power_gate {
 
 /* MOCS (Memory Object Control State) registers */
 #define GEN9_LNCFCMOCS(i)	_MMIO(0xb020 + (i) * 4)	/* L3 Cache Control */
+#define GEN9_LNCFCMOCS_REG_COUNT	32
 
 #define __GEN9_RCS0_MOCS0	0xc800
 #define GEN9_GFX_MOCS(i)	_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 33/51] drm/i915/guc: Provide mmio list to be saved/restored on engine reset
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

The driver must provide GuC with a list of mmio registers
that should be saved/restored during a GuC-based engine reset.
Unfortunately, the list must be dynamically allocated as its size is
variable. That means the driver must generate the list twice - once to
work out the size and a second time to actually save it.

v2:
 (Alan / CI)
  - GEN7_GT_MODE -> GEN6_GT_MODE to fix WA selftest failure

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_workarounds.c   |  46 ++--
 .../gpu/drm/i915/gt/intel_workarounds_types.h |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 199 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h               |   1 +
 5 files changed, 222 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 72562c233ad2..34738ccab8bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -150,13 +150,14 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
 }
 
 static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
-		   u32 clear, u32 set, u32 read_mask)
+		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
 {
 	struct i915_wa wa = {
 		.reg  = reg,
 		.clr  = clear,
 		.set  = set,
 		.read = read_mask,
+		.masked_reg = masked_reg,
 	};
 
 	_wa_add(wal, &wa);
@@ -165,7 +166,7 @@ static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
 static void
 wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set)
 {
-	wa_add(wal, reg, clear, set, clear);
+	wa_add(wal, reg, clear, set, clear, false);
 }
 
 static void
@@ -200,20 +201,20 @@ wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr)
 static void
 wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val);
+	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true);
 }
 
 static void
 wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val);
+	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true);
 }
 
 static void
 wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg,
 		    u32 mask, u32 val)
 {
-	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask);
+	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true);
 }
 
 static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -583,10 +584,10 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs *engine,
 			     GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC);
 
 	/* WaEnableFloatBlendOptimization:icl */
-	wa_write_clr_set(wal,
-			 GEN10_CACHE_MODE_SS,
-			 0, /* write-only, so skip validation */
-			 _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE));
+	wa_add(wal, GEN10_CACHE_MODE_SS, 0,
+	       _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE),
+	       0 /* write-only, so skip validation */,
+	       true);
 
 	/* WaDisableGPGPUMidThreadPreemption:icl */
 	wa_masked_field_set(wal, GEN8_CS_CHICKEN1,
@@ -631,7 +632,7 @@ static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine,
 	       FF_MODE2,
 	       FF_MODE2_TDS_TIMER_MASK,
 	       FF_MODE2_TDS_TIMER_128,
-	       0);
+	       0, false);
 }
 
 static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -669,7 +670,7 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
 	       FF_MODE2,
 	       FF_MODE2_GS_TIMER_MASK,
 	       FF_MODE2_GS_TIMER_224,
-	       0);
+	       0, false);
 
 	/*
 	 * Wa_14012131227:dg1
@@ -847,7 +848,7 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
 	wa_add(wal,
 	       HSW_ROW_CHICKEN3, 0,
 	       _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE),
-		0 /* XXX does this reg exist? */);
+	       0 /* XXX does this reg exist? */, true);
 
 	/* WaVSRefCountFullforceMissDisable:hsw */
 	wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
@@ -1937,10 +1938,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 * disable bit, which we don't touch here, but it's good
 		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 		 */
-		wa_add(wal, GEN7_GT_MODE, 0,
-		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK,
-				     GEN6_WIZ_HASHING_16x4),
-		       GEN6_WIZ_HASHING_16x4);
+		wa_masked_field_set(wal,
+				    GEN7_GT_MODE,
+				    GEN6_WIZ_HASHING_MASK,
+				    GEN6_WIZ_HASHING_16x4);
 	}
 
 	if (IS_GRAPHICS_VER(i915, 6, 7))
@@ -1990,10 +1991,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 * disable bit, which we don't touch here, but it's good
 		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 		 */
-		wa_add(wal,
-		       GEN6_GT_MODE, 0,
-		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4),
-		       GEN6_WIZ_HASHING_16x4);
+		wa_masked_field_set(wal,
+				    GEN6_GT_MODE,
+				    GEN6_WIZ_HASHING_MASK,
+				    GEN6_WIZ_HASHING_16x4);
 
 		/* WaDisable_RenderCache_OperationalFlush:snb */
 		wa_masked_dis(wal, CACHE_MODE_0, RC_OP_FLUSH_ENABLE);
@@ -2014,7 +2015,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		wa_add(wal, MI_MODE,
 		       0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH),
 		       /* XXX bit doesn't stick on Broadwater */
-		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH);
+		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH, true);
 
 	if (GRAPHICS_VER(i915) == 4)
 		/*
@@ -2029,7 +2030,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
 		 */
 		wa_add(wal, ECOSKPD,
 		       0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE),
-		       0 /* XXX bit doesn't stick on Broadwater */);
+		       0 /* XXX bit doesn't stick on Broadwater */,
+		       true);
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
index c214111ea367..1e873681795d 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
@@ -15,6 +15,7 @@ struct i915_wa {
 	u32		clr;
 	u32		set;
 	u32		read;
+	bool		masked_reg;
 };
 
 struct i915_wa_list {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 7f14e1873010..3897abb59dba 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -59,6 +59,7 @@ struct intel_guc {
 
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
+	u32 ads_regset_size;
 
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index b82145652d57..9fd3c911f5fb 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -3,6 +3,8 @@
  * Copyright © 2014-2019 Intel Corporation
  */
 
+#include <linux/bsearch.h>
+
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc.h"
 #include "intel_guc_ads.h"
@@ -23,7 +25,12 @@
  *      | guc_policies                          |
  *      +---------------------------------------+
  *      | guc_gt_system_info                    |
- *      +---------------------------------------+
+ *      +---------------------------------------+ <== static
+ *      | guc_mmio_reg[countA] (engine 0.0)     |
+ *      | guc_mmio_reg[countB] (engine 0.1)     |
+ *      | guc_mmio_reg[countC] (engine 1.0)     |
+ *      |   ...                                 |
+ *      +---------------------------------------+ <== dynamic
  *      | padding                               |
  *      +---------------------------------------+ <== 4K aligned
  *      | private data                          |
@@ -35,16 +42,33 @@ struct __guc_ads_blob {
 	struct guc_ads ads;
 	struct guc_policies policies;
 	struct guc_gt_system_info system_info;
+	/* From here on, location is dynamic! Refer to above diagram. */
+	struct guc_mmio_reg regset[0];
 } __packed;
 
+static u32 guc_ads_regset_size(struct intel_guc *guc)
+{
+	GEM_BUG_ON(!guc->ads_regset_size);
+	return guc->ads_regset_size;
+}
+
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
 {
 	return PAGE_ALIGN(guc->fw.private_data_size);
 }
 
+static u32 guc_ads_regset_offset(struct intel_guc *guc)
+{
+	return offsetof(struct __guc_ads_blob, regset);
+}
+
 static u32 guc_ads_private_data_offset(struct intel_guc *guc)
 {
-	return PAGE_ALIGN(sizeof(struct __guc_ads_blob));
+	u32 offset;
+
+	offset = guc_ads_regset_offset(guc) +
+		 guc_ads_regset_size(guc);
+	return PAGE_ALIGN(offset);
 }
 
 static u32 guc_ads_blob_size(struct intel_guc *guc)
@@ -83,6 +107,165 @@ static void guc_mapping_table_init(struct intel_gt *gt,
 	}
 }
 
+/*
+ * The save/restore register list must be pre-calculated to a temporary
+ * buffer of driver defined size before it can be generated in place
+ * inside the ADS.
+ */
+#define MAX_MMIO_REGS	128	/* Arbitrary size, increase as needed */
+struct temp_regset {
+	struct guc_mmio_reg *registers;
+	u32 used;
+	u32 size;
+};
+
+static int guc_mmio_reg_cmp(const void *a, const void *b)
+{
+	const struct guc_mmio_reg *ra = a;
+	const struct guc_mmio_reg *rb = b;
+
+	return (int)ra->offset - (int)rb->offset;
+}
+
+static void guc_mmio_reg_add(struct temp_regset *regset,
+			     u32 offset, u32 flags)
+{
+	u32 count = regset->used;
+	struct guc_mmio_reg reg = {
+		.offset = offset,
+		.flags = flags,
+	};
+	struct guc_mmio_reg *slot;
+
+	GEM_BUG_ON(count >= regset->size);
+
+	/*
+	 * The mmio list is built using separate lists within the driver.
+	 * It's possible that at some point we may attempt to add the same
+	 * register more than once. Do not consider this an error; silently
+	 * move on if the register is already in the list.
+	 */
+	if (bsearch(&reg, regset->registers, count,
+		    sizeof(reg), guc_mmio_reg_cmp))
+		return;
+
+	slot = &regset->registers[count];
+	regset->used++;
+	*slot = reg;
+
+	while (slot-- > regset->registers) {
+		GEM_BUG_ON(slot[0].offset == slot[1].offset);
+		if (slot[1].offset > slot[0].offset)
+			break;
+
+		swap(slot[1], slot[0]);
+	}
+}
+
+#define GUC_MMIO_REG_ADD(regset, reg, masked) \
+	guc_mmio_reg_add(regset, \
+			 i915_mmio_reg_offset((reg)), \
+			 (masked) ? GUC_REGSET_MASKED : 0)
+
+static void guc_mmio_regset_init(struct temp_regset *regset,
+				 struct intel_engine_cs *engine)
+{
+	const u32 base = engine->mmio_base;
+	struct i915_wa_list *wal = &engine->wa_list;
+	struct i915_wa *wa;
+	unsigned int i;
+
+	regset->used = 0;
+
+	GUC_MMIO_REG_ADD(regset, RING_MODE_GEN7(base), true);
+	GUC_MMIO_REG_ADD(regset, RING_HWS_PGA(base), false);
+	GUC_MMIO_REG_ADD(regset, RING_IMR(base), false);
+
+	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
+		GUC_MMIO_REG_ADD(regset, wa->reg, wa->masked_reg);
+
+	/* Be extra paranoid and include all whitelist registers. */
+	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
+		GUC_MMIO_REG_ADD(regset,
+				 RING_FORCE_TO_NONPRIV(base, i),
+				 false);
+
+	/* add in local MOCS registers */
+	for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
+		GUC_MMIO_REG_ADD(regset, GEN9_LNCFCMOCS(i), false);
+}
+
+static int guc_mmio_reg_state_query(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct temp_regset temp_set;
+	u32 total;
+
+	/*
+	 * Need to actually build the list in order to filter out
+	 * duplicates and other such data dependent constructions.
+	 */
+	temp_set.size = MAX_MMIO_REGS;
+	temp_set.registers = kmalloc_array(temp_set.size,
+					  sizeof(*temp_set.registers),
+					  GFP_KERNEL);
+	if (!temp_set.registers)
+		return -ENOMEM;
+
+	total = 0;
+	for_each_engine(engine, gt, id) {
+		guc_mmio_regset_init(&temp_set, engine);
+		total += temp_set.used;
+	}
+
+	kfree(temp_set.registers);
+
+	return total * sizeof(struct guc_mmio_reg);
+}
+
+static void guc_mmio_reg_state_init(struct intel_guc *guc,
+				    struct __guc_ads_blob *blob)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct temp_regset temp_set;
+	struct guc_mmio_reg_set *ads_reg_set;
+	u32 addr_ggtt, offset;
+	u8 guc_class;
+
+	offset = guc_ads_regset_offset(guc);
+	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+	temp_set.registers = (struct guc_mmio_reg *) (((u8 *) blob) + offset);
+	temp_set.size = guc->ads_regset_size / sizeof(temp_set.registers[0]);
+
+	for_each_engine(engine, gt, id) {
+		/* Class index is checked in class converter */
+		GEM_BUG_ON(engine->instance >= GUC_MAX_INSTANCES_PER_CLASS);
+
+		guc_class = engine_class_to_guc_class(engine->class);
+		ads_reg_set = &blob->ads.reg_state_list[guc_class][engine->instance];
+
+		guc_mmio_regset_init(&temp_set, engine);
+		if (!temp_set.used) {
+			ads_reg_set->address = 0;
+			ads_reg_set->count = 0;
+			continue;
+		}
+
+		ads_reg_set->address = addr_ggtt;
+		ads_reg_set->count = temp_set.used;
+
+		temp_set.size -= temp_set.used;
+		temp_set.registers += temp_set.used;
+		addr_ggtt += temp_set.used * sizeof(struct guc_mmio_reg);
+	}
+
+	GEM_BUG_ON(temp_set.size);
+}
+
 /*
  * The first 80 dwords of the register state context, containing the
  * execlists and ppgtt registers.
@@ -121,8 +304,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 		 */
 		blob->ads.golden_context_lrca[guc_class] = 0;
 		blob->ads.eng_state_size[guc_class] =
-			intel_engine_context_size(guc_to_gt(guc),
-						  engine_class) -
+			intel_engine_context_size(gt, engine_class) -
 			skipped_size;
 	}
 
@@ -153,6 +335,9 @@ static void __guc_ads_init(struct intel_guc *guc)
 	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
 	blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
 
+	/* MMIO save/restore list */
+	guc_mmio_reg_state_init(guc, blob);
+
 	/* Private Data */
 	blob->ads.private_data = base + guc_ads_private_data_offset(guc);
 
@@ -173,6 +358,12 @@ int intel_guc_ads_create(struct intel_guc *guc)
 
 	GEM_BUG_ON(guc->ads_vma);
 
+	/* Need to calculate the reg state size dynamically: */
+	ret = guc_mmio_reg_state_query(guc);
+	if (ret < 0)
+		return ret;
+	guc->ads_regset_size = ret;
+
 	size = guc_ads_blob_size(guc);
 
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 204c95c39353..3584e4d03dc3 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -12309,6 +12309,7 @@ enum skl_power_gate {
 
 /* MOCS (Memory Object Control State) registers */
 #define GEN9_LNCFCMOCS(i)	_MMIO(0xb020 + (i) * 4)	/* L3 Cache Control */
+#define GEN9_LNCFCMOCS_REG_COUNT	32
 
 #define __GEN9_RCS0_MOCS0	0xc800
 #define GEN9_GFX_MOCS(i)	_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 34/51] drm/i915/guc: Don't complain about reset races
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

It is impossible to seal all race conditions of resets occurring
concurrent to other operations. At least, not without introducing
excesive mutex locking. Instead, don't complain if it occurs. In
particular, don't complain if trying to send a H2G during a reset.
Whatever the H2G was about should get redone once the reset is over.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++++-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c     | 3 +++
 drivers/gpu/drm/i915/gt/uc/intel_uc.h     | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index d16381784ee2..92976d205478 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -757,7 +757,10 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 	int ret;
 
 	if (unlikely(!ct->enabled)) {
-		WARN(1, "Unexpected send: action=%#x\n", *action);
+		struct intel_guc *guc = ct_to_guc(ct);
+		struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
+
+		WARN(!uc->reset_in_progress, "Unexpected send: action=%#x\n", *action);
 		return -ENODEV;
 	}
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index b523a8521351..77c1fe2ed883 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -550,6 +550,7 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
+	uc->reset_in_progress = true;
 
 	/* Nothing to do if GuC isn't supported */
 	if (!intel_uc_supports_guc(uc))
@@ -579,6 +580,8 @@ void intel_uc_reset_finish(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
+	uc->reset_in_progress = false;
+
 	/* Firmware expected to be running when this function is called */
 	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
 		intel_guc_submission_reset_finish(guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index eaa3202192ac..91315e3f1c58 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -30,6 +30,8 @@ struct intel_uc {
 
 	/* Snapshot of GuC log from last failed load */
 	struct drm_i915_gem_object *load_err_log;
+
+	bool reset_in_progress;
 };
 
 void intel_uc_init_early(struct intel_uc *uc);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 34/51] drm/i915/guc: Don't complain about reset races
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

It is impossible to seal all race conditions of resets occurring
concurrent to other operations. At least, not without introducing
excesive mutex locking. Instead, don't complain if it occurs. In
particular, don't complain if trying to send a H2G during a reset.
Whatever the H2G was about should get redone once the reset is over.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 5 ++++-
 drivers/gpu/drm/i915/gt/uc/intel_uc.c     | 3 +++
 drivers/gpu/drm/i915/gt/uc/intel_uc.h     | 2 ++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index d16381784ee2..92976d205478 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -757,7 +757,10 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
 	int ret;
 
 	if (unlikely(!ct->enabled)) {
-		WARN(1, "Unexpected send: action=%#x\n", *action);
+		struct intel_guc *guc = ct_to_guc(ct);
+		struct intel_uc *uc = container_of(guc, struct intel_uc, guc);
+
+		WARN(!uc->reset_in_progress, "Unexpected send: action=%#x\n", *action);
 		return -ENODEV;
 	}
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index b523a8521351..77c1fe2ed883 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -550,6 +550,7 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
+	uc->reset_in_progress = true;
 
 	/* Nothing to do if GuC isn't supported */
 	if (!intel_uc_supports_guc(uc))
@@ -579,6 +580,8 @@ void intel_uc_reset_finish(struct intel_uc *uc)
 {
 	struct intel_guc *guc = &uc->guc;
 
+	uc->reset_in_progress = false;
+
 	/* Firmware expected to be running when this function is called */
 	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
 		intel_guc_submission_reset_finish(guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index eaa3202192ac..91315e3f1c58 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -30,6 +30,8 @@ struct intel_uc {
 
 	/* Snapshot of GuC log from last failed load */
 	struct drm_i915_gem_object *load_err_log;
+
+	bool reset_in_progress;
 };
 
 void intel_uc_init_early(struct intel_uc *uc);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 35/51] drm/i915/guc: Enable GuC engine reset
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Clear the 'disable resets' flag to allow GuC to reset hung contexts
(detected via pre-emption timeout).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 9fd3c911f5fb..d3e86ab7508f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -81,8 +81,7 @@ static void guc_policies_init(struct guc_policies *policies)
 {
 	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
 	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
-	/* Disable automatic resets as not yet supported. */
-	policies->global_flags = GLOBAL_POLICY_DISABLE_ENGINE_RESET;
+	policies->global_flags = 0;
 	policies->is_valid = 1;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 35/51] drm/i915/guc: Enable GuC engine reset
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Clear the 'disable resets' flag to allow GuC to reset hung contexts
(detected via pre-emption timeout).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 9fd3c911f5fb..d3e86ab7508f 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -81,8 +81,7 @@ static void guc_policies_init(struct guc_policies *policies)
 {
 	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
 	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
-	/* Disable automatic resets as not yet supported. */
-	policies->global_flags = GLOBAL_POLICY_DISABLE_ENGINE_RESET;
+	policies->global_flags = 0;
 	policies->is_valid = 1;
 }
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 36/51] drm/i915/guc: Capture error state on context reset
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

We receive notification of an engine reset from GuC at its
completion. Meaning GuC has potentially cleared any HW state
we may have been interested in capturing. GuC resumes scheduling
on the engine post-reset, as the resets are meant to be transparent,
further muddling our error state.

There is ongoing work to define an API for a GuC debug state dump. The
suggestion for now is to manually disable FW initiated resets in cases
where debug state is needed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 20 +++++++++++
 drivers/gpu/drm/i915/gt/intel_context.h       |  3 ++
 drivers/gpu/drm/i915/gt/intel_engine.h        | 21 ++++++++++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 11 ++++--
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++----------
 drivers/gpu/drm/i915/i915_gpu_error.c         | 25 ++++++++++---
 7 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index bfb05d8697d1..dd078a80c3a3 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -515,6 +515,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
 	return rq;
 }
 
+struct i915_request *intel_context_find_active_request(struct intel_context *ce)
+{
+	struct i915_request *rq, *active = NULL;
+	unsigned long flags;
+
+	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
+
+	spin_lock_irqsave(&ce->guc_active.lock, flags);
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+				    sched.link) {
+		if (i915_request_completed(rq))
+			break;
+
+		active = rq;
+	}
+	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+
+	return active;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 974ef85320c2..2ed9bf5f91a5 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -200,6 +200,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
 
 struct i915_request *intel_context_create_request(struct intel_context *ce);
 
+struct i915_request *
+intel_context_find_active_request(struct intel_context *ce);
+
 static inline bool intel_context_is_barrier(const struct intel_context *ce)
 {
 	return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index edbde6171bca..8b5425612e8b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -245,7 +245,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
 				   ktime_t *now);
 
 struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine);
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
 
 u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
 struct intel_context *
@@ -310,4 +310,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
 	return engine->cops->get_sibling(engine, sibling);
 }
 
+static inline void
+intel_engine_set_hung_context(struct intel_engine_cs *engine,
+			      struct intel_context *ce)
+{
+	engine->hung_ce = ce;
+}
+
+static inline void
+intel_engine_clear_hung_context(struct intel_engine_cs *engine)
+{
+	intel_engine_set_hung_context(engine, NULL);
+}
+
+static inline struct intel_context *
+intel_engine_get_hung_context(struct intel_engine_cs *engine)
+{
+	return engine->hung_ce;
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index d95d666407f5..c1f2e57aa789 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1672,7 +1672,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	drm_printf(m, "\tRequests:\n");
 
 	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_find_active_request(engine);
+	rq = intel_engine_execlist_find_hung_request(engine);
 	if (rq) {
 		struct intel_timeline *tl = get_timeline(rq);
 
@@ -1783,10 +1783,17 @@ static bool match_ring(struct i915_request *rq)
 }
 
 struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine)
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 {
 	struct i915_request *request, *active = NULL;
 
+	/*
+	 * This search does not work in GuC submission mode. However, the GuC
+	 * will report the hanging context directly to the driver itself. So
+	 * the driver should never get here when in GuC mode.
+	 */
+	GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
+
 	/*
 	 * We are called by the error capture, reset and to dump engine
 	 * state at random points in time. In particular, note that neither is
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 950fc73ed6af..d66b732a91c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -304,6 +304,8 @@ struct intel_engine_cs {
 	/* keep a request in reserve for a [pm] barrier under oom */
 	struct i915_request *request_pool;
 
+	struct intel_context *hung_ce;
+
 	struct llist_head barrier_tasks;
 
 	struct intel_context *kernel_context; /* pinned */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 035633f567b5..dbb24585483d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -724,24 +724,6 @@ __unwind_incomplete_requests(struct intel_context *ce)
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static struct i915_request *context_find_active_request(struct intel_context *ce)
-{
-	struct i915_request *rq, *active = NULL;
-	unsigned long flags;
-
-	spin_lock_irqsave(&ce->guc_active.lock, flags);
-	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
-				    sched.link) {
-		if (i915_request_completed(rq))
-			break;
-
-		active = rq;
-	}
-	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
-
-	return active;
-}
-
 static void __guc_reset_context(struct intel_context *ce, bool stalled)
 {
 	struct i915_request *rq;
@@ -755,7 +737,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 	 */
 	clr_context_enabled(ce);
 
-	rq = context_find_active_request(ce);
+	rq = intel_context_find_active_request(ce);
 	if (!rq) {
 		head = ce->ring->tail;
 		stalled = false;
@@ -2196,6 +2178,20 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static void capture_error_state(struct intel_guc *guc,
+				struct intel_context *ce)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
+	intel_wakeref_t wakeref;
+
+	intel_engine_set_hung_context(engine, ce);
+	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
+		i915_capture_error_state(gt, engine->mask);
+	atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]);
+}
+
 static void guc_context_replay(struct intel_context *ce)
 {
 	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
@@ -2208,6 +2204,7 @@ static void guc_handle_context_reset(struct intel_guc *guc,
 				     struct intel_context *ce)
 {
 	trace_intel_context_reset(ce);
+	capture_error_state(guc, ce);
 	guc_context_replay(ce);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a2c58b54a592..0f08bcfbe964 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1429,20 +1429,37 @@ capture_engine(struct intel_engine_cs *engine,
 {
 	struct intel_engine_capture_vma *capture = NULL;
 	struct intel_engine_coredump *ee;
-	struct i915_request *rq;
+	struct intel_context *ce;
+	struct i915_request *rq = NULL;
 	unsigned long flags;
 
 	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
 	if (!ee)
 		return NULL;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_find_active_request(engine);
+	ce = intel_engine_get_hung_context(engine);
+	if (ce) {
+		intel_engine_clear_hung_context(engine);
+		rq = intel_context_find_active_request(ce);
+		if (!rq || !i915_request_started(rq))
+			goto no_request_capture;
+	} else {
+		/*
+		 * Getting here with GuC enabled means it is a forced error capture
+		 * with no actual hang. So, no need to attempt the execlist search.
+		 */
+		if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
+			spin_lock_irqsave(&engine->sched_engine->lock, flags);
+			rq = intel_engine_execlist_find_hung_request(engine);
+			spin_unlock_irqrestore(&engine->sched_engine->lock,
+					       flags);
+		}
+	}
 	if (rq)
 		capture = intel_engine_coredump_add_request(ee, rq,
 							    ATOMIC_MAYFAIL);
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 	if (!capture) {
+no_request_capture:
 		kfree(ee);
 		return NULL;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 36/51] drm/i915/guc: Capture error state on context reset
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

We receive notification of an engine reset from GuC at its
completion. Meaning GuC has potentially cleared any HW state
we may have been interested in capturing. GuC resumes scheduling
on the engine post-reset, as the resets are meant to be transparent,
further muddling our error state.

There is ongoing work to define an API for a GuC debug state dump. The
suggestion for now is to manually disable FW initiated resets in cases
where debug state is needed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 20 +++++++++++
 drivers/gpu/drm/i915/gt/intel_context.h       |  3 ++
 drivers/gpu/drm/i915/gt/intel_engine.h        | 21 ++++++++++-
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 11 ++++--
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 35 +++++++++----------
 drivers/gpu/drm/i915/i915_gpu_error.c         | 25 ++++++++++---
 7 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index bfb05d8697d1..dd078a80c3a3 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -515,6 +515,26 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
 	return rq;
 }
 
+struct i915_request *intel_context_find_active_request(struct intel_context *ce)
+{
+	struct i915_request *rq, *active = NULL;
+	unsigned long flags;
+
+	GEM_BUG_ON(!intel_engine_uses_guc(ce->engine));
+
+	spin_lock_irqsave(&ce->guc_active.lock, flags);
+	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
+				    sched.link) {
+		if (i915_request_completed(rq))
+			break;
+
+		active = rq;
+	}
+	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+
+	return active;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 974ef85320c2..2ed9bf5f91a5 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -200,6 +200,9 @@ int intel_context_prepare_remote_request(struct intel_context *ce,
 
 struct i915_request *intel_context_create_request(struct intel_context *ce);
 
+struct i915_request *
+intel_context_find_active_request(struct intel_context *ce);
+
 static inline bool intel_context_is_barrier(const struct intel_context *ce)
 {
 	return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index edbde6171bca..8b5425612e8b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -245,7 +245,7 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
 				   ktime_t *now);
 
 struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine);
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine);
 
 u32 intel_engine_context_size(struct intel_gt *gt, u8 class);
 struct intel_context *
@@ -310,4 +310,23 @@ intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
 	return engine->cops->get_sibling(engine, sibling);
 }
 
+static inline void
+intel_engine_set_hung_context(struct intel_engine_cs *engine,
+			      struct intel_context *ce)
+{
+	engine->hung_ce = ce;
+}
+
+static inline void
+intel_engine_clear_hung_context(struct intel_engine_cs *engine)
+{
+	intel_engine_set_hung_context(engine, NULL);
+}
+
+static inline struct intel_context *
+intel_engine_get_hung_context(struct intel_engine_cs *engine)
+{
+	return engine->hung_ce;
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index d95d666407f5..c1f2e57aa789 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1672,7 +1672,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	drm_printf(m, "\tRequests:\n");
 
 	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_find_active_request(engine);
+	rq = intel_engine_execlist_find_hung_request(engine);
 	if (rq) {
 		struct intel_timeline *tl = get_timeline(rq);
 
@@ -1783,10 +1783,17 @@ static bool match_ring(struct i915_request *rq)
 }
 
 struct i915_request *
-intel_engine_find_active_request(struct intel_engine_cs *engine)
+intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 {
 	struct i915_request *request, *active = NULL;
 
+	/*
+	 * This search does not work in GuC submission mode. However, the GuC
+	 * will report the hanging context directly to the driver itself. So
+	 * the driver should never get here when in GuC mode.
+	 */
+	GEM_BUG_ON(intel_uc_uses_guc_submission(&engine->gt->uc));
+
 	/*
 	 * We are called by the error capture, reset and to dump engine
 	 * state at random points in time. In particular, note that neither is
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 950fc73ed6af..d66b732a91c2 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -304,6 +304,8 @@ struct intel_engine_cs {
 	/* keep a request in reserve for a [pm] barrier under oom */
 	struct i915_request *request_pool;
 
+	struct intel_context *hung_ce;
+
 	struct llist_head barrier_tasks;
 
 	struct intel_context *kernel_context; /* pinned */
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 035633f567b5..dbb24585483d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -724,24 +724,6 @@ __unwind_incomplete_requests(struct intel_context *ce)
 	spin_unlock_irqrestore(&sched_engine->lock, flags);
 }
 
-static struct i915_request *context_find_active_request(struct intel_context *ce)
-{
-	struct i915_request *rq, *active = NULL;
-	unsigned long flags;
-
-	spin_lock_irqsave(&ce->guc_active.lock, flags);
-	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
-				    sched.link) {
-		if (i915_request_completed(rq))
-			break;
-
-		active = rq;
-	}
-	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
-
-	return active;
-}
-
 static void __guc_reset_context(struct intel_context *ce, bool stalled)
 {
 	struct i915_request *rq;
@@ -755,7 +737,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 	 */
 	clr_context_enabled(ce);
 
-	rq = context_find_active_request(ce);
+	rq = intel_context_find_active_request(ce);
 	if (!rq) {
 		head = ce->ring->tail;
 		stalled = false;
@@ -2196,6 +2178,20 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+static void capture_error_state(struct intel_guc *guc,
+				struct intel_context *ce)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
+	intel_wakeref_t wakeref;
+
+	intel_engine_set_hung_context(engine, ce);
+	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
+		i915_capture_error_state(gt, engine->mask);
+	atomic_inc(&i915->gpu_error.reset_engine_count[engine->uabi_class]);
+}
+
 static void guc_context_replay(struct intel_context *ce)
 {
 	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
@@ -2208,6 +2204,7 @@ static void guc_handle_context_reset(struct intel_guc *guc,
 				     struct intel_context *ce)
 {
 	trace_intel_context_reset(ce);
+	capture_error_state(guc, ce);
 	guc_context_replay(ce);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a2c58b54a592..0f08bcfbe964 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1429,20 +1429,37 @@ capture_engine(struct intel_engine_cs *engine,
 {
 	struct intel_engine_capture_vma *capture = NULL;
 	struct intel_engine_coredump *ee;
-	struct i915_request *rq;
+	struct intel_context *ce;
+	struct i915_request *rq = NULL;
 	unsigned long flags;
 
 	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
 	if (!ee)
 		return NULL;
 
-	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_find_active_request(engine);
+	ce = intel_engine_get_hung_context(engine);
+	if (ce) {
+		intel_engine_clear_hung_context(engine);
+		rq = intel_context_find_active_request(ce);
+		if (!rq || !i915_request_started(rq))
+			goto no_request_capture;
+	} else {
+		/*
+		 * Getting here with GuC enabled means it is a forced error capture
+		 * with no actual hang. So, no need to attempt the execlist search.
+		 */
+		if (!intel_uc_uses_guc_submission(&engine->gt->uc)) {
+			spin_lock_irqsave(&engine->sched_engine->lock, flags);
+			rq = intel_engine_execlist_find_hung_request(engine);
+			spin_unlock_irqrestore(&engine->sched_engine->lock,
+					       flags);
+		}
+	}
 	if (rq)
 		capture = intel_engine_coredump_add_request(ee, rq,
 							    ATOMIC_MAYFAIL);
-	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 	if (!capture) {
+no_request_capture:
 		kfree(ee);
 		return NULL;
 	}
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 37/51] drm/i915/guc: Fix for error capture after full GPU reset with GuC
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

In the case of a full GPU reset (e.g. because GuC has died or because
GuC's hang detection has been disabled), the driver can't rely on GuC
reporting the guilty context. Instead, the driver needs to scan all
active contexts and find one that is currently executing, as per the
execlist mode behaviour. In GuC mode, this scan is different to
execlist mode as the active request list is handled very differently.

Similarly, the request state dump in debugfs needs to be handled
differently when in GuC submission mode.

Also refactured some of the request scanning code to avoid duplication
across the multiple code paths that are now replicating it.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine.h        |   3 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 139 ++++++++++++------
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   8 +
 drivers/gpu/drm/i915/gt/intel_reset.c         |   2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  67 +++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   3 +
 drivers/gpu/drm/i915/i915_request.c           |  41 ++++++
 drivers/gpu/drm/i915/i915_request.h           |  11 ++
 9 files changed, 229 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 8b5425612e8b..2310ccda8058 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -240,6 +240,9 @@ __printf(3, 4)
 void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...);
+void intel_engine_dump_active_requests(struct list_head *requests,
+				       struct i915_request *hung_rq,
+				       struct drm_printer *m);
 
 ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
 				   ktime_t *now);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index c1f2e57aa789..51a0d860d551 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1625,6 +1625,97 @@ static void print_properties(struct intel_engine_cs *engine,
 			   read_ul(&engine->defaults, p->offset));
 }
 
+static void engine_dump_request(struct i915_request *rq, struct drm_printer *m, const char *msg)
+{
+	struct intel_timeline *tl = get_timeline(rq);
+
+	i915_request_show(m, rq, msg, 0);
+
+	drm_printf(m, "\t\tring->start:  0x%08x\n",
+		   i915_ggtt_offset(rq->ring->vma));
+	drm_printf(m, "\t\tring->head:   0x%08x\n",
+		   rq->ring->head);
+	drm_printf(m, "\t\tring->tail:   0x%08x\n",
+		   rq->ring->tail);
+	drm_printf(m, "\t\tring->emit:   0x%08x\n",
+		   rq->ring->emit);
+	drm_printf(m, "\t\tring->space:  0x%08x\n",
+		   rq->ring->space);
+
+	if (tl) {
+		drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
+			   tl->hwsp_offset);
+		intel_timeline_put(tl);
+	}
+
+	print_request_ring(m, rq);
+
+	if (rq->context->lrc_reg_state) {
+		drm_printf(m, "Logical Ring Context:\n");
+		hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
+	}
+}
+
+void intel_engine_dump_active_requests(struct list_head *requests,
+				       struct i915_request *hung_rq,
+				       struct drm_printer *m)
+{
+	struct i915_request *rq;
+	const char *msg;
+	enum i915_request_state state;
+
+	list_for_each_entry(rq, requests, sched.link) {
+		if (rq == hung_rq)
+			continue;
+
+		state = i915_test_request_state(rq);
+		if (state < I915_REQUEST_QUEUED)
+			continue;
+
+		if (state == I915_REQUEST_ACTIVE)
+			msg = "\t\tactive on engine";
+		else
+			msg = "\t\tactive in queue";
+
+		engine_dump_request(rq, m, msg);
+	}
+}
+
+static void engine_dump_active_requests(struct intel_engine_cs *engine, struct drm_printer *m)
+{
+	struct i915_request *hung_rq = NULL;
+	struct intel_context *ce;
+	bool guc;
+
+	/*
+	 * No need for an engine->irq_seqno_barrier() before the seqno reads.
+	 * The GPU is still running so requests are still executing and any
+	 * hardware reads will be out of date by the time they are reported.
+	 * But the intention here is just to report an instantaneous snapshot
+	 * so that's fine.
+	 */
+	lockdep_assert_held(&engine->sched_engine->lock);
+
+	drm_printf(m, "\tRequests:\n");
+
+	guc = intel_uc_uses_guc_submission(&engine->gt->uc);
+	if (guc) {
+		ce = intel_engine_get_hung_context(engine);
+		if (ce)
+			hung_rq = intel_context_find_active_request(ce);
+	} else
+		hung_rq = intel_engine_execlist_find_hung_request(engine);
+
+	if (hung_rq)
+		engine_dump_request(hung_rq, m, "\t\thung");
+
+	if (guc)
+		intel_guc_dump_active_requests(engine, hung_rq, m);
+	else
+		intel_engine_dump_active_requests(&engine->sched_engine->requests,
+						  hung_rq, m);
+}
+
 void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...)
@@ -1669,39 +1760,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		   i915_reset_count(error));
 	print_properties(engine, m);
 
-	drm_printf(m, "\tRequests:\n");
-
 	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_execlist_find_hung_request(engine);
-	if (rq) {
-		struct intel_timeline *tl = get_timeline(rq);
-
-		i915_request_show(m, rq, "\t\tactive ", 0);
-
-		drm_printf(m, "\t\tring->start:  0x%08x\n",
-			   i915_ggtt_offset(rq->ring->vma));
-		drm_printf(m, "\t\tring->head:   0x%08x\n",
-			   rq->ring->head);
-		drm_printf(m, "\t\tring->tail:   0x%08x\n",
-			   rq->ring->tail);
-		drm_printf(m, "\t\tring->emit:   0x%08x\n",
-			   rq->ring->emit);
-		drm_printf(m, "\t\tring->space:  0x%08x\n",
-			   rq->ring->space);
-
-		if (tl) {
-			drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
-				   tl->hwsp_offset);
-			intel_timeline_put(tl);
-		}
-
-		print_request_ring(m, rq);
+	engine_dump_active_requests(engine, m);
 
-		if (rq->context->lrc_reg_state) {
-			drm_printf(m, "Logical Ring Context:\n");
-			hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
-		}
-	}
 	drm_printf(m, "\tOn hold?: %lu\n",
 		   list_count(&engine->sched_engine->hold));
 	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
@@ -1775,13 +1836,6 @@ intel_engine_create_virtual(struct intel_engine_cs **siblings,
 	return siblings[0]->cops->create_virtual(siblings, count);
 }
 
-static bool match_ring(struct i915_request *rq)
-{
-	u32 ring = ENGINE_READ(rq->engine, RING_START);
-
-	return ring == i915_ggtt_offset(rq->ring->vma);
-}
-
 struct i915_request *
 intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 {
@@ -1825,14 +1879,7 @@ intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 
 	list_for_each_entry(request, &engine->sched_engine->requests,
 			    sched.link) {
-		if (__i915_request_is_complete(request))
-			continue;
-
-		if (!__i915_request_has_started(request))
-			continue;
-
-		/* More than one preemptible request may match! */
-		if (!match_ring(request))
+		if (i915_test_request_state(request) != I915_REQUEST_ACTIVE)
 			continue;
 
 		active = request;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index a8495364d906..f0768824de6f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -90,6 +90,14 @@ reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
 	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		show_heartbeat(rq, engine);
 
+	if (intel_engine_uses_guc(engine))
+		/*
+		 * GuC itself is toast or GuC's hang detection
+		 * is disabled. Either way, need to find the
+		 * hang culprit manually.
+		 */
+		intel_guc_find_hung_context(engine);
+
 	intel_gt_handle_error(engine->gt, engine->mask,
 			      I915_ERROR_CAPTURE,
 			      "stopped heartbeat on %s",
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 2987282dff6d..f3cdbf4ba5c8 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -156,7 +156,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
 	if (guilty) {
 		i915_request_set_error_once(rq, -EIO);
 		__i915_request_skip(rq);
-		if (mark_guilty(rq))
+		if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine))
 			skip_context(rq);
 	} else {
 		i915_request_set_error_once(rq, -EAGAIN);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 3897abb59dba..62187c3dcda9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -268,6 +268,8 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 					 const u32 *msg, u32 len);
 
+void intel_guc_find_hung_context(struct intel_engine_cs *engine);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index dbb24585483d..c8e1fc80f58e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2272,6 +2272,73 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+void intel_guc_find_hung_context(struct intel_engine_cs *engine)
+{
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct intel_context *ce;
+	struct i915_request *rq;
+	unsigned long index;
+
+	/* Reset called during driver load? GuC not yet initialised! */
+	if (unlikely(!guc_submission_initialized(guc)))
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		if (!intel_context_is_pinned(ce))
+			continue;
+
+		if (intel_engine_is_virtual(ce->engine)) {
+			if (!(ce->engine->mask & engine->mask))
+				continue;
+		} else {
+			if (ce->engine != engine)
+				continue;
+		}
+
+		list_for_each_entry(rq, &ce->guc_active.requests, sched.link) {
+			if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE)
+				continue;
+
+			intel_engine_set_hung_context(engine, ce);
+
+			/* Can only cope with one hang at a time... */
+			return;
+		}
+	}
+}
+
+void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
+				    struct i915_request *hung_rq,
+				    struct drm_printer *m)
+{
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct intel_context *ce;
+	unsigned long index;
+	unsigned long flags;
+
+	/* Reset called during driver load? GuC not yet initialised! */
+	if (unlikely(!guc_submission_initialized(guc)))
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		if (!intel_context_is_pinned(ce))
+			continue;
+
+		if (intel_engine_is_virtual(ce->engine)) {
+			if (!(ce->engine->mask & engine->mask))
+				continue;
+		} else {
+			if (ce->engine != engine)
+				continue;
+		}
+
+		spin_lock_irqsave(&ce->guc_active.lock, flags);
+		intel_engine_dump_active_requests(&ce->guc_active.requests,
+						  hung_rq, m);
+		spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+	}
+}
+
 void intel_guc_submission_print_info(struct intel_guc *guc,
 				     struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 08ff77c5c50e..03bc1c83a4d2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -25,6 +25,9 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 				     struct drm_printer *p);
 void intel_guc_submission_print_context_info(struct intel_guc *guc,
 					     struct drm_printer *p);
+void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
+				    struct i915_request *hung_rq,
+				    struct drm_printer *m);
 
 bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 4eba848b20e3..eb109f93ebcb 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -2048,6 +2048,47 @@ void i915_request_show(struct drm_printer *m,
 		   name);
 }
 
+static bool engine_match_ring(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	u32 ring = ENGINE_READ(engine, RING_START);
+
+	return ring == i915_ggtt_offset(rq->ring->vma);
+}
+
+static bool match_ring(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine;
+	bool found;
+	int i;
+
+	if (!intel_engine_is_virtual(rq->engine))
+		return engine_match_ring(rq->engine, rq);
+
+	found = false;
+	i = 0;
+	while ((engine = intel_engine_get_sibling(rq->engine, i++))) {
+		found = engine_match_ring(engine, rq);
+		if (found)
+			break;
+	}
+
+	return found;
+}
+
+enum i915_request_state i915_test_request_state(struct i915_request *rq)
+{
+	if (i915_request_completed(rq))
+		return I915_REQUEST_COMPLETE;
+
+	if (!i915_request_started(rq))
+		return I915_REQUEST_PENDING;
+
+	if (match_ring(rq))
+		return I915_REQUEST_ACTIVE;
+
+	return I915_REQUEST_QUEUED;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_request.c"
 #include "selftests/i915_request.c"
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 128030f43bbf..a3d4728ea06c 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -649,4 +649,15 @@ i915_request_active_engine(struct i915_request *rq,
 
 void i915_request_notify_execute_cb_imm(struct i915_request *rq);
 
+enum i915_request_state
+{
+	I915_REQUEST_UNKNOWN = 0,
+	I915_REQUEST_COMPLETE,
+	I915_REQUEST_PENDING,
+	I915_REQUEST_QUEUED,
+	I915_REQUEST_ACTIVE,
+};
+
+enum i915_request_state i915_test_request_state(struct i915_request *rq);
+
 #endif /* I915_REQUEST_H */
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 37/51] drm/i915/guc: Fix for error capture after full GPU reset with GuC
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

In the case of a full GPU reset (e.g. because GuC has died or because
GuC's hang detection has been disabled), the driver can't rely on GuC
reporting the guilty context. Instead, the driver needs to scan all
active contexts and find one that is currently executing, as per the
execlist mode behaviour. In GuC mode, this scan is different to
execlist mode as the active request list is handled very differently.

Similarly, the request state dump in debugfs needs to be handled
differently when in GuC submission mode.

Also refactured some of the request scanning code to avoid duplication
across the multiple code paths that are now replicating it.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine.h        |   3 +
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 139 ++++++++++++------
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   8 +
 drivers/gpu/drm/i915/gt/intel_reset.c         |   2 +-
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  67 +++++++++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   3 +
 drivers/gpu/drm/i915/i915_request.c           |  41 ++++++
 drivers/gpu/drm/i915/i915_request.h           |  11 ++
 9 files changed, 229 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
index 8b5425612e8b..2310ccda8058 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine.h
@@ -240,6 +240,9 @@ __printf(3, 4)
 void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...);
+void intel_engine_dump_active_requests(struct list_head *requests,
+				       struct i915_request *hung_rq,
+				       struct drm_printer *m);
 
 ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine,
 				   ktime_t *now);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index c1f2e57aa789..51a0d860d551 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1625,6 +1625,97 @@ static void print_properties(struct intel_engine_cs *engine,
 			   read_ul(&engine->defaults, p->offset));
 }
 
+static void engine_dump_request(struct i915_request *rq, struct drm_printer *m, const char *msg)
+{
+	struct intel_timeline *tl = get_timeline(rq);
+
+	i915_request_show(m, rq, msg, 0);
+
+	drm_printf(m, "\t\tring->start:  0x%08x\n",
+		   i915_ggtt_offset(rq->ring->vma));
+	drm_printf(m, "\t\tring->head:   0x%08x\n",
+		   rq->ring->head);
+	drm_printf(m, "\t\tring->tail:   0x%08x\n",
+		   rq->ring->tail);
+	drm_printf(m, "\t\tring->emit:   0x%08x\n",
+		   rq->ring->emit);
+	drm_printf(m, "\t\tring->space:  0x%08x\n",
+		   rq->ring->space);
+
+	if (tl) {
+		drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
+			   tl->hwsp_offset);
+		intel_timeline_put(tl);
+	}
+
+	print_request_ring(m, rq);
+
+	if (rq->context->lrc_reg_state) {
+		drm_printf(m, "Logical Ring Context:\n");
+		hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
+	}
+}
+
+void intel_engine_dump_active_requests(struct list_head *requests,
+				       struct i915_request *hung_rq,
+				       struct drm_printer *m)
+{
+	struct i915_request *rq;
+	const char *msg;
+	enum i915_request_state state;
+
+	list_for_each_entry(rq, requests, sched.link) {
+		if (rq == hung_rq)
+			continue;
+
+		state = i915_test_request_state(rq);
+		if (state < I915_REQUEST_QUEUED)
+			continue;
+
+		if (state == I915_REQUEST_ACTIVE)
+			msg = "\t\tactive on engine";
+		else
+			msg = "\t\tactive in queue";
+
+		engine_dump_request(rq, m, msg);
+	}
+}
+
+static void engine_dump_active_requests(struct intel_engine_cs *engine, struct drm_printer *m)
+{
+	struct i915_request *hung_rq = NULL;
+	struct intel_context *ce;
+	bool guc;
+
+	/*
+	 * No need for an engine->irq_seqno_barrier() before the seqno reads.
+	 * The GPU is still running so requests are still executing and any
+	 * hardware reads will be out of date by the time they are reported.
+	 * But the intention here is just to report an instantaneous snapshot
+	 * so that's fine.
+	 */
+	lockdep_assert_held(&engine->sched_engine->lock);
+
+	drm_printf(m, "\tRequests:\n");
+
+	guc = intel_uc_uses_guc_submission(&engine->gt->uc);
+	if (guc) {
+		ce = intel_engine_get_hung_context(engine);
+		if (ce)
+			hung_rq = intel_context_find_active_request(ce);
+	} else
+		hung_rq = intel_engine_execlist_find_hung_request(engine);
+
+	if (hung_rq)
+		engine_dump_request(hung_rq, m, "\t\thung");
+
+	if (guc)
+		intel_guc_dump_active_requests(engine, hung_rq, m);
+	else
+		intel_engine_dump_active_requests(&engine->sched_engine->requests,
+						  hung_rq, m);
+}
+
 void intel_engine_dump(struct intel_engine_cs *engine,
 		       struct drm_printer *m,
 		       const char *header, ...)
@@ -1669,39 +1760,9 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		   i915_reset_count(error));
 	print_properties(engine, m);
 
-	drm_printf(m, "\tRequests:\n");
-
 	spin_lock_irqsave(&engine->sched_engine->lock, flags);
-	rq = intel_engine_execlist_find_hung_request(engine);
-	if (rq) {
-		struct intel_timeline *tl = get_timeline(rq);
-
-		i915_request_show(m, rq, "\t\tactive ", 0);
-
-		drm_printf(m, "\t\tring->start:  0x%08x\n",
-			   i915_ggtt_offset(rq->ring->vma));
-		drm_printf(m, "\t\tring->head:   0x%08x\n",
-			   rq->ring->head);
-		drm_printf(m, "\t\tring->tail:   0x%08x\n",
-			   rq->ring->tail);
-		drm_printf(m, "\t\tring->emit:   0x%08x\n",
-			   rq->ring->emit);
-		drm_printf(m, "\t\tring->space:  0x%08x\n",
-			   rq->ring->space);
-
-		if (tl) {
-			drm_printf(m, "\t\tring->hwsp:   0x%08x\n",
-				   tl->hwsp_offset);
-			intel_timeline_put(tl);
-		}
-
-		print_request_ring(m, rq);
+	engine_dump_active_requests(engine, m);
 
-		if (rq->context->lrc_reg_state) {
-			drm_printf(m, "Logical Ring Context:\n");
-			hexdump(m, rq->context->lrc_reg_state, PAGE_SIZE);
-		}
-	}
 	drm_printf(m, "\tOn hold?: %lu\n",
 		   list_count(&engine->sched_engine->hold));
 	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
@@ -1775,13 +1836,6 @@ intel_engine_create_virtual(struct intel_engine_cs **siblings,
 	return siblings[0]->cops->create_virtual(siblings, count);
 }
 
-static bool match_ring(struct i915_request *rq)
-{
-	u32 ring = ENGINE_READ(rq->engine, RING_START);
-
-	return ring == i915_ggtt_offset(rq->ring->vma);
-}
-
 struct i915_request *
 intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 {
@@ -1825,14 +1879,7 @@ intel_engine_execlist_find_hung_request(struct intel_engine_cs *engine)
 
 	list_for_each_entry(request, &engine->sched_engine->requests,
 			    sched.link) {
-		if (__i915_request_is_complete(request))
-			continue;
-
-		if (!__i915_request_has_started(request))
-			continue;
-
-		/* More than one preemptible request may match! */
-		if (!match_ring(request))
+		if (i915_test_request_state(request) != I915_REQUEST_ACTIVE)
 			continue;
 
 		active = request;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index a8495364d906..f0768824de6f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -90,6 +90,14 @@ reset_engine(struct intel_engine_cs *engine, struct i915_request *rq)
 	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		show_heartbeat(rq, engine);
 
+	if (intel_engine_uses_guc(engine))
+		/*
+		 * GuC itself is toast or GuC's hang detection
+		 * is disabled. Either way, need to find the
+		 * hang culprit manually.
+		 */
+		intel_guc_find_hung_context(engine);
+
 	intel_gt_handle_error(engine->gt, engine->mask,
 			      I915_ERROR_CAPTURE,
 			      "stopped heartbeat on %s",
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 2987282dff6d..f3cdbf4ba5c8 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -156,7 +156,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
 	if (guilty) {
 		i915_request_set_error_once(rq, -EIO);
 		__i915_request_skip(rq);
-		if (mark_guilty(rq))
+		if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine))
 			skip_context(rq);
 	} else {
 		i915_request_set_error_once(rq, -EAGAIN);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 3897abb59dba..62187c3dcda9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -268,6 +268,8 @@ int intel_guc_context_reset_process_msg(struct intel_guc *guc,
 int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 					 const u32 *msg, u32 len);
 
+void intel_guc_find_hung_context(struct intel_engine_cs *engine);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index dbb24585483d..c8e1fc80f58e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2272,6 +2272,73 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 	return 0;
 }
 
+void intel_guc_find_hung_context(struct intel_engine_cs *engine)
+{
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct intel_context *ce;
+	struct i915_request *rq;
+	unsigned long index;
+
+	/* Reset called during driver load? GuC not yet initialised! */
+	if (unlikely(!guc_submission_initialized(guc)))
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		if (!intel_context_is_pinned(ce))
+			continue;
+
+		if (intel_engine_is_virtual(ce->engine)) {
+			if (!(ce->engine->mask & engine->mask))
+				continue;
+		} else {
+			if (ce->engine != engine)
+				continue;
+		}
+
+		list_for_each_entry(rq, &ce->guc_active.requests, sched.link) {
+			if (i915_test_request_state(rq) != I915_REQUEST_ACTIVE)
+				continue;
+
+			intel_engine_set_hung_context(engine, ce);
+
+			/* Can only cope with one hang at a time... */
+			return;
+		}
+	}
+}
+
+void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
+				    struct i915_request *hung_rq,
+				    struct drm_printer *m)
+{
+	struct intel_guc *guc = &engine->gt->uc.guc;
+	struct intel_context *ce;
+	unsigned long index;
+	unsigned long flags;
+
+	/* Reset called during driver load? GuC not yet initialised! */
+	if (unlikely(!guc_submission_initialized(guc)))
+		return;
+
+	xa_for_each(&guc->context_lookup, index, ce) {
+		if (!intel_context_is_pinned(ce))
+			continue;
+
+		if (intel_engine_is_virtual(ce->engine)) {
+			if (!(ce->engine->mask & engine->mask))
+				continue;
+		} else {
+			if (ce->engine != engine)
+				continue;
+		}
+
+		spin_lock_irqsave(&ce->guc_active.lock, flags);
+		intel_engine_dump_active_requests(&ce->guc_active.requests,
+						  hung_rq, m);
+		spin_unlock_irqrestore(&ce->guc_active.lock, flags);
+	}
+}
+
 void intel_guc_submission_print_info(struct intel_guc *guc,
 				     struct drm_printer *p)
 {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 08ff77c5c50e..03bc1c83a4d2 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -25,6 +25,9 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 				     struct drm_printer *p);
 void intel_guc_submission_print_context_info(struct intel_guc *guc,
 					     struct drm_printer *p);
+void intel_guc_dump_active_requests(struct intel_engine_cs *engine,
+				    struct i915_request *hung_rq,
+				    struct drm_printer *m);
 
 bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 4eba848b20e3..eb109f93ebcb 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -2048,6 +2048,47 @@ void i915_request_show(struct drm_printer *m,
 		   name);
 }
 
+static bool engine_match_ring(struct intel_engine_cs *engine, struct i915_request *rq)
+{
+	u32 ring = ENGINE_READ(engine, RING_START);
+
+	return ring == i915_ggtt_offset(rq->ring->vma);
+}
+
+static bool match_ring(struct i915_request *rq)
+{
+	struct intel_engine_cs *engine;
+	bool found;
+	int i;
+
+	if (!intel_engine_is_virtual(rq->engine))
+		return engine_match_ring(rq->engine, rq);
+
+	found = false;
+	i = 0;
+	while ((engine = intel_engine_get_sibling(rq->engine, i++))) {
+		found = engine_match_ring(engine, rq);
+		if (found)
+			break;
+	}
+
+	return found;
+}
+
+enum i915_request_state i915_test_request_state(struct i915_request *rq)
+{
+	if (i915_request_completed(rq))
+		return I915_REQUEST_COMPLETE;
+
+	if (!i915_request_started(rq))
+		return I915_REQUEST_PENDING;
+
+	if (match_ring(rq))
+		return I915_REQUEST_ACTIVE;
+
+	return I915_REQUEST_QUEUED;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_request.c"
 #include "selftests/i915_request.c"
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 128030f43bbf..a3d4728ea06c 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -649,4 +649,15 @@ i915_request_active_engine(struct i915_request *rq,
 
 void i915_request_notify_execute_cb_imm(struct i915_request *rq);
 
+enum i915_request_state
+{
+	I915_REQUEST_UNKNOWN = 0,
+	I915_REQUEST_COMPLETE,
+	I915_REQUEST_PENDING,
+	I915_REQUEST_QUEUED,
+	I915_REQUEST_ACTIVE,
+};
+
+enum i915_request_state i915_test_request_state(struct i915_request *rq);
+
 #endif /* I915_REQUEST_H */
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 38/51] drm/i915/guc: Hook GuC scheduling policies up
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Use the official driver default scheduling policies for configuring
the GuC scheduler rather than a bunch of hardcoded values.

v2:
 (Matthew Brost)
  - Move I915_ENGINE_WANT_FORCED_PREEMPTION to later patch

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Jose Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 44 ++++++++++++++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 ++--
 3 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 62187c3dcda9..bc71635c70b9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -270,6 +270,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 
 void intel_guc_find_hung_context(struct intel_engine_cs *engine);
 
+int intel_guc_global_policies_update(struct intel_guc *guc);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index d3e86ab7508f..2ad5fcd4e1b7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -77,14 +77,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc)
 	       guc_ads_private_data_size(guc);
 }
 
-static void guc_policies_init(struct guc_policies *policies)
+static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies)
 {
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+
 	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
 	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
+
 	policies->global_flags = 0;
+	if (i915->params.reset < 2)
+		policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
+
 	policies->is_valid = 1;
 }
 
+static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE,
+		policy_offset
+	};
+
+	return intel_guc_send(guc, action, ARRAY_SIZE(action));
+}
+
+int intel_guc_global_policies_update(struct intel_guc *guc)
+{
+	struct __guc_ads_blob *blob = guc->ads_blob;
+	struct intel_gt *gt = guc_to_gt(guc);
+	intel_wakeref_t wakeref;
+	int ret;
+
+	if (!blob)
+		return -ENOTSUPP;
+
+	GEM_BUG_ON(!blob->ads.scheduler_policies);
+
+	guc_policies_init(guc, &blob->policies);
+
+	if (!intel_guc_is_ready(guc))
+		return 0;
+
+	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
+		ret = guc_action_policies_update(guc, blob->ads.scheduler_policies);
+
+	return ret;
+}
+
 static void guc_mapping_table_init(struct intel_gt *gt,
 				   struct guc_gt_system_info *system_info)
 {
@@ -281,7 +321,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 	u8 engine_class, guc_class;
 
 	/* GuC scheduling policies */
-	guc_policies_init(&blob->policies);
+	guc_policies_init(guc, &blob->policies);
 
 	/*
 	 * GuC expects a per-engine-class context image and size
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c8e1fc80f58e..6536bd6807a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -870,6 +870,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
 	atomic_set(&guc->outstanding_submission_g2h, 0);
 
+	intel_guc_global_policies_update(guc);
 	enable_submission(guc);
 	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
@@ -1160,8 +1161,9 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 {
 	desc->policy_flags = 0;
 
-	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
-	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+	/* NB: For both of these, zero means disabled. */
+	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
+	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
@@ -1937,13 +1939,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->set_default_submission = guc_set_default_submission;
 
 	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+	engine->flags |= I915_ENGINE_HAS_TIMESLICES;
 
 	/*
 	 * TODO: GuC supports timeslicing and semaphores as well, but they're
 	 * handled by the firmware so some minor tweaks are required before
 	 * enabling.
 	 *
-	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
 	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	 */
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 38/51] drm/i915/guc: Hook GuC scheduling policies up
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Use the official driver default scheduling policies for configuring
the GuC scheduler rather than a bunch of hardcoded values.

v2:
 (Matthew Brost)
  - Move I915_ENGINE_WANT_FORCED_PREEMPTION to later patch

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Jose Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 44 ++++++++++++++++++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 ++--
 3 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index 62187c3dcda9..bc71635c70b9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -270,6 +270,8 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
 
 void intel_guc_find_hung_context(struct intel_engine_cs *engine);
 
+int intel_guc_global_policies_update(struct intel_guc *guc);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index d3e86ab7508f..2ad5fcd4e1b7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -77,14 +77,54 @@ static u32 guc_ads_blob_size(struct intel_guc *guc)
 	       guc_ads_private_data_size(guc);
 }
 
-static void guc_policies_init(struct guc_policies *policies)
+static void guc_policies_init(struct intel_guc *guc, struct guc_policies *policies)
 {
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+
 	policies->dpc_promote_time = GLOBAL_POLICY_DEFAULT_DPC_PROMOTE_TIME_US;
 	policies->max_num_work_items = GLOBAL_POLICY_MAX_NUM_WI;
+
 	policies->global_flags = 0;
+	if (i915->params.reset < 2)
+		policies->global_flags |= GLOBAL_POLICY_DISABLE_ENGINE_RESET;
+
 	policies->is_valid = 1;
 }
 
+static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_GLOBAL_SCHED_POLICY_CHANGE,
+		policy_offset
+	};
+
+	return intel_guc_send(guc, action, ARRAY_SIZE(action));
+}
+
+int intel_guc_global_policies_update(struct intel_guc *guc)
+{
+	struct __guc_ads_blob *blob = guc->ads_blob;
+	struct intel_gt *gt = guc_to_gt(guc);
+	intel_wakeref_t wakeref;
+	int ret;
+
+	if (!blob)
+		return -ENOTSUPP;
+
+	GEM_BUG_ON(!blob->ads.scheduler_policies);
+
+	guc_policies_init(guc, &blob->policies);
+
+	if (!intel_guc_is_ready(guc))
+		return 0;
+
+	with_intel_runtime_pm(&gt->i915->runtime_pm, wakeref)
+		ret = guc_action_policies_update(guc, blob->ads.scheduler_policies);
+
+	return ret;
+}
+
 static void guc_mapping_table_init(struct intel_gt *gt,
 				   struct guc_gt_system_info *system_info)
 {
@@ -281,7 +321,7 @@ static void __guc_ads_init(struct intel_guc *guc)
 	u8 engine_class, guc_class;
 
 	/* GuC scheduling policies */
-	guc_policies_init(&blob->policies);
+	guc_policies_init(guc, &blob->policies);
 
 	/*
 	 * GuC expects a per-engine-class context image and size
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index c8e1fc80f58e..6536bd6807a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -870,6 +870,7 @@ void intel_guc_submission_reset_finish(struct intel_guc *guc)
 	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
 	atomic_set(&guc->outstanding_submission_g2h, 0);
 
+	intel_guc_global_policies_update(guc);
 	enable_submission(guc);
 	intel_gt_unpark_heartbeats(guc_to_gt(guc));
 }
@@ -1160,8 +1161,9 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 {
 	desc->policy_flags = 0;
 
-	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
-	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
+	/* NB: For both of these, zero means disabled. */
+	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
+	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
 }
 
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
@@ -1937,13 +1939,13 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
 	engine->set_default_submission = guc_set_default_submission;
 
 	engine->flags |= I915_ENGINE_HAS_PREEMPTION;
+	engine->flags |= I915_ENGINE_HAS_TIMESLICES;
 
 	/*
 	 * TODO: GuC supports timeslicing and semaphores as well, but they're
 	 * handled by the firmware so some minor tweaks are required before
 	 * enabling.
 	 *
-	 * engine->flags |= I915_ENGINE_HAS_TIMESLICES;
 	 * engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	 */
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 39/51] drm/i915/guc: Connect reset modparam updates to GuC policy flags
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Changing the reset module parameter has no effect on a running GuC.
The corresponding entry in the ADS must be updated and then the GuC
informed via a Host2GuC message.

The new debugfs interface to module parameters allows this to happen.
However, connecting the parameter data address back to anything useful
is messy. One option would be to pass a new private data structure
address through instead of just the parameter pointer. However, that
means having a new (and different) data structure for each parameter
and a new (and different) write function for each parameter. This
method keeps everything generic by instead using a string lookup on
the directory entry name.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c |  2 +-
 drivers/gpu/drm/i915/i915_debugfs_params.c | 31 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 2ad5fcd4e1b7..c6d0b762d82c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -99,7 +99,7 @@ static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
 		policy_offset
 	};
 
-	return intel_guc_send(guc, action, ARRAY_SIZE(action));
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 int intel_guc_global_policies_update(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/i915_debugfs_params.c b/drivers/gpu/drm/i915/i915_debugfs_params.c
index 4e2b077692cb..8ecd8b42f048 100644
--- a/drivers/gpu/drm/i915/i915_debugfs_params.c
+++ b/drivers/gpu/drm/i915/i915_debugfs_params.c
@@ -6,9 +6,20 @@
 #include <linux/kernel.h>
 
 #include "i915_debugfs_params.h"
+#include "gt/intel_gt.h"
+#include "gt/uc/intel_guc.h"
 #include "i915_drv.h"
 #include "i915_params.h"
 
+#define MATCH_DEBUGFS_NODE_NAME(_file, _name)	(strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0)
+
+#define GET_I915(i915, name, ptr)	\
+	do {	\
+		struct i915_params *params;	\
+		params = container_of(((void *) (ptr)), typeof(*params), name);	\
+		(i915) = container_of(params, typeof(*(i915)), params);	\
+	} while(0)
+
 /* int param */
 static int i915_param_int_show(struct seq_file *m, void *data)
 {
@@ -24,6 +35,16 @@ static int i915_param_int_open(struct inode *inode, struct file *file)
 	return single_open(file, i915_param_int_show, inode->i_private);
 }
 
+static int notify_guc(struct drm_i915_private *i915)
+{
+	int ret = 0;
+
+	if (intel_uc_uses_guc_submission(&i915->gt.uc))
+		ret = intel_guc_global_policies_update(&i915->gt.uc.guc);
+
+	return ret;
+}
+
 static ssize_t i915_param_int_write(struct file *file,
 				    const char __user *ubuf, size_t len,
 				    loff_t *offp)
@@ -81,8 +102,10 @@ static ssize_t i915_param_uint_write(struct file *file,
 				     const char __user *ubuf, size_t len,
 				     loff_t *offp)
 {
+	struct drm_i915_private *i915;
 	struct seq_file *m = file->private_data;
 	unsigned int *value = m->private;
+	unsigned int old = *value;
 	int ret;
 
 	ret = kstrtouint_from_user(ubuf, len, 0, value);
@@ -95,6 +118,14 @@ static ssize_t i915_param_uint_write(struct file *file,
 			*value = b;
 	}
 
+	if (!ret && MATCH_DEBUGFS_NODE_NAME(file, "reset")) {
+		GET_I915(i915, reset, value);
+
+		ret = notify_guc(i915);
+		if (ret)
+			*value = old;
+	}
+
 	return ret ?: len;
 }
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 39/51] drm/i915/guc: Connect reset modparam updates to GuC policy flags
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Changing the reset module parameter has no effect on a running GuC.
The corresponding entry in the ADS must be updated and then the GuC
informed via a Host2GuC message.

The new debugfs interface to module parameters allows this to happen.
However, connecting the parameter data address back to anything useful
is messy. One option would be to pass a new private data structure
address through instead of just the parameter pointer. However, that
means having a new (and different) data structure for each parameter
and a new (and different) write function for each parameter. This
method keeps everything generic by instead using a string lookup on
the directory entry name.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c |  2 +-
 drivers/gpu/drm/i915/i915_debugfs_params.c | 31 ++++++++++++++++++++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 2ad5fcd4e1b7..c6d0b762d82c 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -99,7 +99,7 @@ static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
 		policy_offset
 	};
 
-	return intel_guc_send(guc, action, ARRAY_SIZE(action));
+	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
 }
 
 int intel_guc_global_policies_update(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/i915_debugfs_params.c b/drivers/gpu/drm/i915/i915_debugfs_params.c
index 4e2b077692cb..8ecd8b42f048 100644
--- a/drivers/gpu/drm/i915/i915_debugfs_params.c
+++ b/drivers/gpu/drm/i915/i915_debugfs_params.c
@@ -6,9 +6,20 @@
 #include <linux/kernel.h>
 
 #include "i915_debugfs_params.h"
+#include "gt/intel_gt.h"
+#include "gt/uc/intel_guc.h"
 #include "i915_drv.h"
 #include "i915_params.h"
 
+#define MATCH_DEBUGFS_NODE_NAME(_file, _name)	(strcmp((_file)->f_path.dentry->d_name.name, (_name)) == 0)
+
+#define GET_I915(i915, name, ptr)	\
+	do {	\
+		struct i915_params *params;	\
+		params = container_of(((void *) (ptr)), typeof(*params), name);	\
+		(i915) = container_of(params, typeof(*(i915)), params);	\
+	} while(0)
+
 /* int param */
 static int i915_param_int_show(struct seq_file *m, void *data)
 {
@@ -24,6 +35,16 @@ static int i915_param_int_open(struct inode *inode, struct file *file)
 	return single_open(file, i915_param_int_show, inode->i_private);
 }
 
+static int notify_guc(struct drm_i915_private *i915)
+{
+	int ret = 0;
+
+	if (intel_uc_uses_guc_submission(&i915->gt.uc))
+		ret = intel_guc_global_policies_update(&i915->gt.uc.guc);
+
+	return ret;
+}
+
 static ssize_t i915_param_int_write(struct file *file,
 				    const char __user *ubuf, size_t len,
 				    loff_t *offp)
@@ -81,8 +102,10 @@ static ssize_t i915_param_uint_write(struct file *file,
 				     const char __user *ubuf, size_t len,
 				     loff_t *offp)
 {
+	struct drm_i915_private *i915;
 	struct seq_file *m = file->private_data;
 	unsigned int *value = m->private;
+	unsigned int old = *value;
 	int ret;
 
 	ret = kstrtouint_from_user(ubuf, len, 0, value);
@@ -95,6 +118,14 @@ static ssize_t i915_param_uint_write(struct file *file,
 			*value = b;
 	}
 
+	if (!ret && MATCH_DEBUGFS_NODE_NAME(file, "reset")) {
+		GET_I915(i915, reset, value);
+
+		ret = notify_guc(i915);
+		if (ret)
+			*value = old;
+	}
+
 	return ret ?: len;
 }
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 40/51] drm/i915/guc: Include scheduling policies in the debugfs state dump
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Added the scheduling policy parameters to the 'guc_info' debugfs state
dump.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c     | 14 ++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h     |  3 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c |  2 ++
 3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index c6d0b762d82c..93b0ac35a508 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -92,6 +92,20 @@ static void guc_policies_init(struct intel_guc *guc, struct guc_policies *polici
 	policies->is_valid = 1;
 }
 
+void intel_guc_ads_print_policy_info(struct intel_guc *guc,
+				     struct drm_printer *dp)
+{
+	struct __guc_ads_blob *blob = guc->ads_blob;
+
+	if (unlikely(!blob))
+		return;
+
+	drm_printf(dp, "Global scheduling policies:\n");
+	drm_printf(dp, "  DPC promote time   = %u\n", blob->policies.dpc_promote_time);
+	drm_printf(dp, "  Max num work items = %u\n", blob->policies.max_num_work_items);
+	drm_printf(dp, "  Flags              = %u\n", blob->policies.global_flags);
+}
+
 static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
 {
 	u32 action[] = {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
index b00d3ae1113a..bdcb339a5321 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
@@ -7,9 +7,12 @@
 #define _INTEL_GUC_ADS_H_
 
 struct intel_guc;
+struct drm_printer;
 
 int intel_guc_ads_create(struct intel_guc *guc);
 void intel_guc_ads_destroy(struct intel_guc *guc);
 void intel_guc_ads_reset(struct intel_guc *guc);
+void intel_guc_ads_print_policy_info(struct intel_guc *guc,
+				     struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index 7a454c91a736..72ddfff42f7d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -10,6 +10,7 @@
 #include "intel_guc_debugfs.h"
 #include "intel_guc_log_debugfs.h"
 #include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_ads.h"
 #include "gt/uc/intel_guc_submission.h"
 
 static int guc_info_show(struct seq_file *m, void *data)
@@ -29,6 +30,7 @@ static int guc_info_show(struct seq_file *m, void *data)
 
 	intel_guc_ct_print_info(&guc->ct, &p);
 	intel_guc_submission_print_info(guc, &p);
+	intel_guc_ads_print_policy_info(guc, &p);
 
 	return 0;
 }
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 40/51] drm/i915/guc: Include scheduling policies in the debugfs state dump
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Added the scheduling policy parameters to the 'guc_info' debugfs state
dump.

Signed-off-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c     | 14 ++++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h     |  3 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c |  2 ++
 3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index c6d0b762d82c..93b0ac35a508 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -92,6 +92,20 @@ static void guc_policies_init(struct intel_guc *guc, struct guc_policies *polici
 	policies->is_valid = 1;
 }
 
+void intel_guc_ads_print_policy_info(struct intel_guc *guc,
+				     struct drm_printer *dp)
+{
+	struct __guc_ads_blob *blob = guc->ads_blob;
+
+	if (unlikely(!blob))
+		return;
+
+	drm_printf(dp, "Global scheduling policies:\n");
+	drm_printf(dp, "  DPC promote time   = %u\n", blob->policies.dpc_promote_time);
+	drm_printf(dp, "  Max num work items = %u\n", blob->policies.max_num_work_items);
+	drm_printf(dp, "  Flags              = %u\n", blob->policies.global_flags);
+}
+
 static int guc_action_policies_update(struct intel_guc *guc, u32 policy_offset)
 {
 	u32 action[] = {
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
index b00d3ae1113a..bdcb339a5321 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
@@ -7,9 +7,12 @@
 #define _INTEL_GUC_ADS_H_
 
 struct intel_guc;
+struct drm_printer;
 
 int intel_guc_ads_create(struct intel_guc *guc);
 void intel_guc_ads_destroy(struct intel_guc *guc);
 void intel_guc_ads_reset(struct intel_guc *guc);
+void intel_guc_ads_print_policy_info(struct intel_guc *guc,
+				     struct drm_printer *p);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
index 7a454c91a736..72ddfff42f7d 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
@@ -10,6 +10,7 @@
 #include "intel_guc_debugfs.h"
 #include "intel_guc_log_debugfs.h"
 #include "gt/uc/intel_guc_ct.h"
+#include "gt/uc/intel_guc_ads.h"
 #include "gt/uc/intel_guc_submission.h"
 
 static int guc_info_show(struct seq_file *m, void *data)
@@ -29,6 +30,7 @@ static int guc_info_show(struct seq_file *m, void *data)
 
 	intel_guc_ct_print_info(&guc->ct, &p);
 	intel_guc_submission_print_info(guc, &p);
+	intel_guc_ads_print_policy_info(guc, &p);
 
 	return 0;
 }
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

The media watchdog mechanism involves GuC doing a silent reset and
continue of the hung context. This requires the i915 driver provide a
golden context to GuC in the ADS.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
 7 files changed, 199 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index acfdd53b2678..ceeb517ba259 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
 	if (err)
 		goto err_gt;
 
+	intel_uc_init_late(&gt->uc);
+
 	err = i915_inject_probe_error(gt->i915, -EIO);
 	if (err)
 		goto err_gt;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 68266cbffd1f..979128e28372 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
 	}
 }
 
+void intel_guc_init_late(struct intel_guc *guc)
+{
+	intel_guc_ads_init_late(guc);
+}
+
 static u32 guc_ctl_debug_flags(struct intel_guc *guc)
 {
 	u32 level = intel_guc_log_get_level(&guc->log);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index bc71635c70b9..dc18ac510ac8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -60,6 +60,7 @@ struct intel_guc {
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
 	u32 ads_regset_size;
+	u32 ads_golden_ctxt_size;
 
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
@@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
 }
 
 void intel_guc_init_early(struct intel_guc *guc);
+void intel_guc_init_late(struct intel_guc *guc);
 void intel_guc_init_send_regs(struct intel_guc *guc);
 void intel_guc_write_params(struct intel_guc *guc);
 int intel_guc_init(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 93b0ac35a508..241b3089b658 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -7,6 +7,7 @@
 
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc.h"
+#include "gt/shmem_utils.h"
 #include "intel_guc_ads.h"
 #include "intel_guc_fwif.h"
 #include "intel_uc.h"
@@ -33,6 +34,10 @@
  *      +---------------------------------------+ <== dynamic
  *      | padding                               |
  *      +---------------------------------------+ <== 4K aligned
+ *      | golden contexts                       |
+ *      +---------------------------------------+
+ *      | padding                               |
+ *      +---------------------------------------+ <== 4K aligned
  *      | private data                          |
  *      +---------------------------------------+
  *      | padding                               |
@@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
 	return guc->ads_regset_size;
 }
 
+static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
+{
+	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
+}
+
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
 {
 	return PAGE_ALIGN(guc->fw.private_data_size);
@@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
 	return offsetof(struct __guc_ads_blob, regset);
 }
 
-static u32 guc_ads_private_data_offset(struct intel_guc *guc)
+static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
 {
 	u32 offset;
 
 	offset = guc_ads_regset_offset(guc) +
 		 guc_ads_regset_size(guc);
+
+	return PAGE_ALIGN(offset);
+}
+
+static u32 guc_ads_private_data_offset(struct intel_guc *guc)
+{
+	u32 offset;
+
+	offset = guc_ads_golden_ctxt_offset(guc) +
+		 guc_ads_golden_ctxt_size(guc);
+
 	return PAGE_ALIGN(offset);
 }
 
@@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
 	GEM_BUG_ON(temp_set.size);
 }
 
-/*
- * The first 80 dwords of the register state context, containing the
- * execlists and ppgtt registers.
- */
-#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
+static void fill_engine_enable_masks(struct intel_gt *gt,
+				     struct guc_gt_system_info *info)
+{
+	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
+	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
+}
 
-static void __guc_ads_init(struct intel_guc *guc)
+/* Skip execlist and PPGTT registers */
+#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
+#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
+
+static int guc_prep_golden_context(struct intel_guc *guc,
+				   struct __guc_ads_blob *blob)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
-	struct drm_i915_private *i915 = gt->i915;
+	u32 addr_ggtt, offset;
+	u32 total_size = 0, alloc_size, real_size;
+	u8 engine_class, guc_class;
+	struct guc_gt_system_info *info, local_info;
+
+	/*
+	 * Reserve the memory for the golden contexts and point GuC at it but
+	 * leave it empty for now. The context data will be filled in later
+	 * once there is something available to put there.
+	 *
+	 * Note that the HWSP and ring context are not included.
+	 *
+	 * Note also that the storage must be pinned in the GGTT, so that the
+	 * address won't change after GuC has been told where to find it. The
+	 * GuC will also validate that the LRC base + size fall within the
+	 * allowed GGTT range.
+	 */
+	if (blob) {
+		offset = guc_ads_golden_ctxt_offset(guc);
+		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+		info = &blob->system_info;
+	} else {
+		memset(&local_info, 0, sizeof(local_info));
+		info = &local_info;
+		fill_engine_enable_masks(gt, info);
+	}
+
+	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
+		if (engine_class == OTHER_CLASS)
+			continue;
+
+		guc_class = engine_class_to_guc_class(engine_class);
+
+		if (!info->engine_enabled_masks[guc_class])
+			continue;
+
+		real_size = intel_engine_context_size(gt, engine_class);
+		alloc_size = PAGE_ALIGN(real_size);
+		total_size += alloc_size;
+
+		if (!blob)
+			continue;
+
+		blob->ads.eng_state_size[guc_class] = real_size;
+		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
+		addr_ggtt += alloc_size;
+	}
+
+	if (!blob)
+		return total_size;
+
+	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
+	return total_size;
+}
+
+static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id) {
+		if (engine->class != engine_class)
+			continue;
+
+		if (!engine->default_state)
+			continue;
+
+		return engine;
+	}
+
+	return NULL;
+}
+
+static void guc_init_golden_context(struct intel_guc *guc)
+{
 	struct __guc_ads_blob *blob = guc->ads_blob;
-	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
-	u32 base;
+	struct intel_engine_cs *engine;
+	struct intel_gt *gt = guc_to_gt(guc);
+	u32 addr_ggtt, offset;
+	u32 total_size = 0, alloc_size, real_size;
 	u8 engine_class, guc_class;
+	u8 *ptr;
 
-	/* GuC scheduling policies */
-	guc_policies_init(guc, &blob->policies);
+	if (!intel_uc_uses_guc_submission(&gt->uc))
+		return;
+
+	GEM_BUG_ON(!blob);
 
 	/*
-	 * GuC expects a per-engine-class context image and size
-	 * (minus hwsp and ring context). The context image will be
-	 * used to reinitialize engines after a reset. It must exist
-	 * and be pinned in the GGTT, so that the address won't change after
-	 * we have told GuC where to find it. The context size will be used
-	 * to validate that the LRC base + size fall within allowed GGTT.
+	 * Go back and fill in the golden context data now that it is
+	 * available.
 	 */
+	offset = guc_ads_golden_ctxt_offset(guc);
+	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+	ptr = ((u8 *) blob) + offset;
+
 	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
 		if (engine_class == OTHER_CLASS)
 			continue;
 
 		guc_class = engine_class_to_guc_class(engine_class);
 
-		/*
-		 * TODO: Set context pointer to default state to allow
-		 * GuC to re-init guilty contexts after internal reset.
-		 */
-		blob->ads.golden_context_lrca[guc_class] = 0;
-		blob->ads.eng_state_size[guc_class] =
-			intel_engine_context_size(gt, engine_class) -
-			skipped_size;
+		if (!blob->system_info.engine_enabled_masks[guc_class])
+			continue;
+
+		real_size = intel_engine_context_size(gt, engine_class);
+		alloc_size = PAGE_ALIGN(real_size);
+		total_size += alloc_size;
+
+		engine = find_engine_state(gt, engine_class);
+		if (!engine) {
+			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
+			blob->ads.eng_state_size[guc_class] = 0;
+			blob->ads.golden_context_lrca[guc_class] = 0;
+			continue;
+		}
+
+		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
+		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
+		addr_ggtt += alloc_size;
+
+		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);
+		ptr += alloc_size;
 	}
 
+	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
+}
+
+static void __guc_ads_init(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+	struct __guc_ads_blob *blob = guc->ads_blob;
+	u32 base;
+
+	/* GuC scheduling policies */
+	guc_policies_init(guc, &blob->policies);
+
 	/* System info */
-	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
-	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
+	fill_engine_enable_masks(gt, &blob->system_info);
 
 	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
 		hweight8(gt->info.sseu.slice_mask);
@@ -380,6 +511,9 @@ static void __guc_ads_init(struct intel_guc *guc)
 			 GEN12_DOORBELLS_PER_SQIDI) + 1;
 	}
 
+	/* Golden contexts for re-initialising after a watchdog reset */
+	guc_prep_golden_context(guc, blob);
+
 	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
 
 	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
@@ -417,6 +551,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
 		return ret;
 	guc->ads_regset_size = ret;
 
+	/* Likewise the golden contexts: */
+	ret = guc_prep_golden_context(guc, NULL);
+	if (ret < 0)
+		return ret;
+	guc->ads_golden_ctxt_size = ret;
+
+	/* Now the total size can be determined: */
 	size = guc_ads_blob_size(guc);
 
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
@@ -429,6 +570,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
 	return 0;
 }
 
+void intel_guc_ads_init_late(struct intel_guc *guc)
+{
+	/*
+	 * The golden context setup requires the saved engine state from
+	 * __engines_record_defaults(). However, that requires engines to be
+	 * operational which means the ADS must already have been configured.
+	 * Fortunately, the golden context state is not needed until a hang
+	 * occurs, so it can be filled in during this late init phase.
+	 */
+	guc_init_golden_context(guc);
+}
+
 void intel_guc_ads_destroy(struct intel_guc *guc)
 {
 	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
index bdcb339a5321..3d85051d57e4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
@@ -11,6 +11,7 @@ struct drm_printer;
 
 int intel_guc_ads_create(struct intel_guc *guc);
 void intel_guc_ads_destroy(struct intel_guc *guc);
+void intel_guc_ads_init_late(struct intel_guc *guc);
 void intel_guc_ads_reset(struct intel_guc *guc);
 void intel_guc_ads_print_policy_info(struct intel_guc *guc,
 				     struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 77c1fe2ed883..7a69c3c027e9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
 		uc->ops = &uc_ops_off;
 }
 
+void intel_uc_init_late(struct intel_uc *uc)
+{
+	intel_guc_init_late(&uc->guc);
+}
+
 void intel_uc_driver_late_release(struct intel_uc *uc)
 {
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index 91315e3f1c58..e2da2b6e76e1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -35,6 +35,7 @@ struct intel_uc {
 };
 
 void intel_uc_init_early(struct intel_uc *uc);
+void intel_uc_init_late(struct intel_uc *uc);
 void intel_uc_driver_late_release(struct intel_uc *uc);
 void intel_uc_driver_remove(struct intel_uc *uc);
 void intel_uc_init_mmio(struct intel_uc *uc);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

The media watchdog mechanism involves GuC doing a silent reset and
continue of the hung context. This requires the i915 driver provide a
golden context to GuC in the ADS.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
 drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
 drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
 7 files changed, 199 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
index acfdd53b2678..ceeb517ba259 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
 	if (err)
 		goto err_gt;
 
+	intel_uc_init_late(&gt->uc);
+
 	err = i915_inject_probe_error(gt->i915, -EIO);
 	if (err)
 		goto err_gt;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index 68266cbffd1f..979128e28372 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
 	}
 }
 
+void intel_guc_init_late(struct intel_guc *guc)
+{
+	intel_guc_ads_init_late(guc);
+}
+
 static u32 guc_ctl_debug_flags(struct intel_guc *guc)
 {
 	u32 level = intel_guc_log_get_level(&guc->log);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index bc71635c70b9..dc18ac510ac8 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -60,6 +60,7 @@ struct intel_guc {
 	struct i915_vma *ads_vma;
 	struct __guc_ads_blob *ads_blob;
 	u32 ads_regset_size;
+	u32 ads_golden_ctxt_size;
 
 	struct i915_vma *lrc_desc_pool;
 	void *lrc_desc_pool_vaddr;
@@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
 }
 
 void intel_guc_init_early(struct intel_guc *guc);
+void intel_guc_init_late(struct intel_guc *guc);
 void intel_guc_init_send_regs(struct intel_guc *guc);
 void intel_guc_write_params(struct intel_guc *guc);
 int intel_guc_init(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
index 93b0ac35a508..241b3089b658 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
@@ -7,6 +7,7 @@
 
 #include "gt/intel_gt.h"
 #include "gt/intel_lrc.h"
+#include "gt/shmem_utils.h"
 #include "intel_guc_ads.h"
 #include "intel_guc_fwif.h"
 #include "intel_uc.h"
@@ -33,6 +34,10 @@
  *      +---------------------------------------+ <== dynamic
  *      | padding                               |
  *      +---------------------------------------+ <== 4K aligned
+ *      | golden contexts                       |
+ *      +---------------------------------------+
+ *      | padding                               |
+ *      +---------------------------------------+ <== 4K aligned
  *      | private data                          |
  *      +---------------------------------------+
  *      | padding                               |
@@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
 	return guc->ads_regset_size;
 }
 
+static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
+{
+	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
+}
+
 static u32 guc_ads_private_data_size(struct intel_guc *guc)
 {
 	return PAGE_ALIGN(guc->fw.private_data_size);
@@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
 	return offsetof(struct __guc_ads_blob, regset);
 }
 
-static u32 guc_ads_private_data_offset(struct intel_guc *guc)
+static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
 {
 	u32 offset;
 
 	offset = guc_ads_regset_offset(guc) +
 		 guc_ads_regset_size(guc);
+
+	return PAGE_ALIGN(offset);
+}
+
+static u32 guc_ads_private_data_offset(struct intel_guc *guc)
+{
+	u32 offset;
+
+	offset = guc_ads_golden_ctxt_offset(guc) +
+		 guc_ads_golden_ctxt_size(guc);
+
 	return PAGE_ALIGN(offset);
 }
 
@@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
 	GEM_BUG_ON(temp_set.size);
 }
 
-/*
- * The first 80 dwords of the register state context, containing the
- * execlists and ppgtt registers.
- */
-#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
+static void fill_engine_enable_masks(struct intel_gt *gt,
+				     struct guc_gt_system_info *info)
+{
+	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
+	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
+	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
+}
 
-static void __guc_ads_init(struct intel_guc *guc)
+/* Skip execlist and PPGTT registers */
+#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
+#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
+
+static int guc_prep_golden_context(struct intel_guc *guc,
+				   struct __guc_ads_blob *blob)
 {
 	struct intel_gt *gt = guc_to_gt(guc);
-	struct drm_i915_private *i915 = gt->i915;
+	u32 addr_ggtt, offset;
+	u32 total_size = 0, alloc_size, real_size;
+	u8 engine_class, guc_class;
+	struct guc_gt_system_info *info, local_info;
+
+	/*
+	 * Reserve the memory for the golden contexts and point GuC at it but
+	 * leave it empty for now. The context data will be filled in later
+	 * once there is something available to put there.
+	 *
+	 * Note that the HWSP and ring context are not included.
+	 *
+	 * Note also that the storage must be pinned in the GGTT, so that the
+	 * address won't change after GuC has been told where to find it. The
+	 * GuC will also validate that the LRC base + size fall within the
+	 * allowed GGTT range.
+	 */
+	if (blob) {
+		offset = guc_ads_golden_ctxt_offset(guc);
+		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+		info = &blob->system_info;
+	} else {
+		memset(&local_info, 0, sizeof(local_info));
+		info = &local_info;
+		fill_engine_enable_masks(gt, info);
+	}
+
+	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
+		if (engine_class == OTHER_CLASS)
+			continue;
+
+		guc_class = engine_class_to_guc_class(engine_class);
+
+		if (!info->engine_enabled_masks[guc_class])
+			continue;
+
+		real_size = intel_engine_context_size(gt, engine_class);
+		alloc_size = PAGE_ALIGN(real_size);
+		total_size += alloc_size;
+
+		if (!blob)
+			continue;
+
+		blob->ads.eng_state_size[guc_class] = real_size;
+		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
+		addr_ggtt += alloc_size;
+	}
+
+	if (!blob)
+		return total_size;
+
+	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
+	return total_size;
+}
+
+static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
+{
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, gt, id) {
+		if (engine->class != engine_class)
+			continue;
+
+		if (!engine->default_state)
+			continue;
+
+		return engine;
+	}
+
+	return NULL;
+}
+
+static void guc_init_golden_context(struct intel_guc *guc)
+{
 	struct __guc_ads_blob *blob = guc->ads_blob;
-	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
-	u32 base;
+	struct intel_engine_cs *engine;
+	struct intel_gt *gt = guc_to_gt(guc);
+	u32 addr_ggtt, offset;
+	u32 total_size = 0, alloc_size, real_size;
 	u8 engine_class, guc_class;
+	u8 *ptr;
 
-	/* GuC scheduling policies */
-	guc_policies_init(guc, &blob->policies);
+	if (!intel_uc_uses_guc_submission(&gt->uc))
+		return;
+
+	GEM_BUG_ON(!blob);
 
 	/*
-	 * GuC expects a per-engine-class context image and size
-	 * (minus hwsp and ring context). The context image will be
-	 * used to reinitialize engines after a reset. It must exist
-	 * and be pinned in the GGTT, so that the address won't change after
-	 * we have told GuC where to find it. The context size will be used
-	 * to validate that the LRC base + size fall within allowed GGTT.
+	 * Go back and fill in the golden context data now that it is
+	 * available.
 	 */
+	offset = guc_ads_golden_ctxt_offset(guc);
+	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
+	ptr = ((u8 *) blob) + offset;
+
 	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
 		if (engine_class == OTHER_CLASS)
 			continue;
 
 		guc_class = engine_class_to_guc_class(engine_class);
 
-		/*
-		 * TODO: Set context pointer to default state to allow
-		 * GuC to re-init guilty contexts after internal reset.
-		 */
-		blob->ads.golden_context_lrca[guc_class] = 0;
-		blob->ads.eng_state_size[guc_class] =
-			intel_engine_context_size(gt, engine_class) -
-			skipped_size;
+		if (!blob->system_info.engine_enabled_masks[guc_class])
+			continue;
+
+		real_size = intel_engine_context_size(gt, engine_class);
+		alloc_size = PAGE_ALIGN(real_size);
+		total_size += alloc_size;
+
+		engine = find_engine_state(gt, engine_class);
+		if (!engine) {
+			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
+			blob->ads.eng_state_size[guc_class] = 0;
+			blob->ads.golden_context_lrca[guc_class] = 0;
+			continue;
+		}
+
+		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
+		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
+		addr_ggtt += alloc_size;
+
+		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);
+		ptr += alloc_size;
 	}
 
+	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
+}
+
+static void __guc_ads_init(struct intel_guc *guc)
+{
+	struct intel_gt *gt = guc_to_gt(guc);
+	struct drm_i915_private *i915 = gt->i915;
+	struct __guc_ads_blob *blob = guc->ads_blob;
+	u32 base;
+
+	/* GuC scheduling policies */
+	guc_policies_init(guc, &blob->policies);
+
 	/* System info */
-	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
-	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
-	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
+	fill_engine_enable_masks(gt, &blob->system_info);
 
 	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
 		hweight8(gt->info.sseu.slice_mask);
@@ -380,6 +511,9 @@ static void __guc_ads_init(struct intel_guc *guc)
 			 GEN12_DOORBELLS_PER_SQIDI) + 1;
 	}
 
+	/* Golden contexts for re-initialising after a watchdog reset */
+	guc_prep_golden_context(guc, blob);
+
 	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
 
 	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
@@ -417,6 +551,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
 		return ret;
 	guc->ads_regset_size = ret;
 
+	/* Likewise the golden contexts: */
+	ret = guc_prep_golden_context(guc, NULL);
+	if (ret < 0)
+		return ret;
+	guc->ads_golden_ctxt_size = ret;
+
+	/* Now the total size can be determined: */
 	size = guc_ads_blob_size(guc);
 
 	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
@@ -429,6 +570,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
 	return 0;
 }
 
+void intel_guc_ads_init_late(struct intel_guc *guc)
+{
+	/*
+	 * The golden context setup requires the saved engine state from
+	 * __engines_record_defaults(). However, that requires engines to be
+	 * operational which means the ADS must already have been configured.
+	 * Fortunately, the golden context state is not needed until a hang
+	 * occurs, so it can be filled in during this late init phase.
+	 */
+	guc_init_golden_context(guc);
+}
+
 void intel_guc_ads_destroy(struct intel_guc *guc)
 {
 	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
index bdcb339a5321..3d85051d57e4 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
@@ -11,6 +11,7 @@ struct drm_printer;
 
 int intel_guc_ads_create(struct intel_guc *guc);
 void intel_guc_ads_destroy(struct intel_guc *guc);
+void intel_guc_ads_init_late(struct intel_guc *guc);
 void intel_guc_ads_reset(struct intel_guc *guc);
 void intel_guc_ads_print_policy_info(struct intel_guc *guc,
 				     struct drm_printer *p);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 77c1fe2ed883..7a69c3c027e9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
 		uc->ops = &uc_ops_off;
 }
 
+void intel_uc_init_late(struct intel_uc *uc)
+{
+	intel_guc_init_late(&uc->guc);
+}
+
 void intel_uc_driver_late_release(struct intel_uc *uc)
 {
 }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
index 91315e3f1c58..e2da2b6e76e1 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
@@ -35,6 +35,7 @@ struct intel_uc {
 };
 
 void intel_uc_init_early(struct intel_uc *uc);
+void intel_uc_init_late(struct intel_uc *uc);
 void intel_uc_driver_late_release(struct intel_uc *uc);
 void intel_uc_driver_remove(struct intel_uc *uc);
 void intel_uc_init_mmio(struct intel_uc *uc);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 42/51] drm/i915/guc: Implement banned contexts for GuC submission
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

When using GuC submission, if a context gets banned disable scheduling
and mark all inflight requests as complete.

Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |  13 ++
 drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
 drivers/gpu/drm/i915/gt/intel_reset.c         |  32 +---
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  20 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 151 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_trace.h             |  10 ++
 8 files changed, 195 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 28c62f7ccfc7..d87a4c6da5bc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1084,7 +1084,7 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban)
 	for_each_gem_engine(ce, engines, it) {
 		struct intel_engine_cs *engine;
 
-		if (ban && intel_context_set_banned(ce))
+		if (ban && intel_context_ban(ce, NULL))
 			continue;
 
 		/*
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 2ed9bf5f91a5..814d9277096a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -16,6 +16,7 @@
 #include "intel_engine_types.h"
 #include "intel_ring_types.h"
 #include "intel_timeline_types.h"
+#include "i915_trace.h"
 
 #define CE_TRACE(ce, fmt, ...) do {					\
 	const struct intel_context *ce__ = (ce);			\
@@ -243,6 +244,18 @@ static inline bool intel_context_set_banned(struct intel_context *ce)
 	return test_and_set_bit(CONTEXT_BANNED, &ce->flags);
 }
 
+static inline bool intel_context_ban(struct intel_context *ce,
+				     struct i915_request *rq)
+{
+	bool ret = intel_context_set_banned(ce);
+
+	trace_intel_context_ban(ce);
+	if (ce->ops->ban)
+		ce->ops->ban(ce, rq);
+
+	return ret;
+}
+
 static inline bool
 intel_context_force_single_submission(const struct intel_context *ce)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 035108c10b2c..57c19ee3e313 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -35,6 +35,8 @@ struct intel_context_ops {
 
 	int (*alloc)(struct intel_context *ce);
 
+	void (*ban)(struct intel_context *ce, struct i915_request *rq);
+
 	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
 	int (*pin)(struct intel_context *ce, void *vaddr);
 	void (*unpin)(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index f3cdbf4ba5c8..3ed694cab5af 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -22,7 +22,6 @@
 #include "intel_reset.h"
 
 #include "uc/intel_guc.h"
-#include "uc/intel_guc_submission.h"
 
 #define RESET_MAX_RETRIES 3
 
@@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr)
 	intel_uncore_rmw_fw(uncore, reg, clr, 0);
 }
 
-static void skip_context(struct i915_request *rq)
-{
-	struct intel_context *hung_ctx = rq->context;
-
-	list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) {
-		if (!i915_request_is_active(rq))
-			return;
-
-		if (rq->context == hung_ctx) {
-			i915_request_set_error_once(rq, -EIO);
-			__i915_request_skip(rq);
-		}
-	}
-}
-
 static void client_mark_guilty(struct i915_gem_context *ctx, bool banned)
 {
 	struct drm_i915_file_private *file_priv = ctx->file_priv;
@@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq)
 	bool banned;
 	int i;
 
-	if (intel_context_is_closed(rq->context)) {
-		intel_context_set_banned(rq->context);
+	if (intel_context_is_closed(rq->context))
 		return true;
-	}
 
 	rcu_read_lock();
 	ctx = rcu_dereference(rq->context->gem_context);
@@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq)
 	banned = !i915_gem_context_is_recoverable(ctx);
 	if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES))
 		banned = true;
-	if (banned) {
+	if (banned)
 		drm_dbg(&ctx->i915->drm, "context %s: guilty %d, banned\n",
 			ctx->name, atomic_read(&ctx->guilty_count));
-		intel_context_set_banned(rq->context);
-	}
 
 	client_mark_guilty(ctx, banned);
 
@@ -149,6 +129,8 @@ static void mark_innocent(struct i915_request *rq)
 
 void __i915_request_reset(struct i915_request *rq, bool guilty)
 {
+	bool banned = false;
+
 	RQ_TRACE(rq, "guilty? %s\n", yesno(guilty));
 	GEM_BUG_ON(__i915_request_is_complete(rq));
 
@@ -156,13 +138,15 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
 	if (guilty) {
 		i915_request_set_error_once(rq, -EIO);
 		__i915_request_skip(rq);
-		if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine))
-			skip_context(rq);
+		banned = mark_guilty(rq);
 	} else {
 		i915_request_set_error_once(rq, -EAGAIN);
 		mark_innocent(rq);
 	}
 	rcu_read_unlock();
+
+	if (banned)
+		intel_context_ban(rq->context, rq);
 }
 
 static bool i915_in_reset(struct pci_dev *pdev)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 3b1471c50d40..03939be4297e 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -586,9 +586,29 @@ static void ring_context_reset(struct intel_context *ce)
 	clear_bit(CONTEXT_VALID_BIT, &ce->flags);
 }
 
+static void ring_context_ban(struct intel_context *ce,
+			     struct i915_request *rq)
+{
+	struct intel_engine_cs *engine;
+
+	if (!rq || !i915_request_is_active(rq))
+		return;
+
+	engine = rq->engine;
+	lockdep_assert_held(&engine->sched_engine->lock);
+	list_for_each_entry_continue(rq, &engine->sched_engine->requests,
+				     sched.link)
+		if (rq->context == ce) {
+			i915_request_set_error_once(rq, -EIO);
+			__i915_request_skip(rq);
+		}
+}
+
 static const struct intel_context_ops ring_context_ops = {
 	.alloc = ring_context_alloc,
 
+	.ban = ring_context_ban,
+
 	.pre_pin = ring_context_pre_pin,
 	.pin = ring_context_pin,
 	.unpin = ring_context_unpin,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index dc18ac510ac8..eb6062f95d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -274,6 +274,8 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine);
 
 int intel_guc_global_policies_update(struct intel_guc *guc);
 
+void intel_guc_context_ban(struct intel_context *ce, struct i915_request *rq);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 6536bd6807a0..149990196e3a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -125,6 +125,7 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
 #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
 #define SCHED_STATE_DESTROYED				BIT(1)
 #define SCHED_STATE_PENDING_DISABLE			BIT(2)
+#define SCHED_STATE_BANNED				BIT(3)
 static inline void init_sched_state(struct intel_context *ce)
 {
 	/* Only should be called from guc_lrc_desc_pin() */
@@ -187,6 +188,23 @@ static inline void clr_context_pending_disable(struct intel_context *ce)
 		(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
 }
 
+static inline bool context_banned(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_BANNED);
+}
+
+static inline void set_context_banned(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_BANNED;
+}
+
+static inline void clr_context_banned(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state &= ~SCHED_STATE_BANNED;
+}
+
 static inline bool context_guc_id_invalid(struct intel_context *ce)
 {
 	return (ce->guc_id == GUC_INVALID_LRC_ID);
@@ -356,13 +374,23 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
 
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
-	int err;
+	int err = 0;
 	struct intel_context *ce = rq->context;
 	u32 action[3];
 	int len = 0;
 	u32 g2h_len_dw = 0;
 	bool enabled;
 
+	/*
+	 * Corner case where requests were sitting in the priority list or a
+	 * request resubmitted after the context was banned.
+	 */
+	if (unlikely(intel_context_is_banned(ce))) {
+		i915_request_put(i915_request_mark_eio(rq));
+		intel_engine_signal_breadcrumbs(ce->engine);
+		goto out;
+	}
+
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
 
@@ -398,6 +426,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		clr_context_pending_enable(ce);
 		intel_context_put(ce);
 	}
+	if (likely(!err))
+		trace_i915_request_guc_submit(rq);
 
 out:
 	return err;
@@ -462,7 +492,6 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 			guc->stalled_request = last;
 			return false;
 		}
-		trace_i915_request_guc_submit(last);
 	}
 
 	guc->stalled_request = NULL;
@@ -501,12 +530,13 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 static void __guc_context_destroy(struct intel_context *ce);
 static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
 static void guc_signal_context_fence(struct intel_context *ce);
+static void guc_cancel_context_requests(struct intel_context *ce);
 
 static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 {
 	struct intel_context *ce;
 	unsigned long index, flags;
-	bool pending_disable, pending_enable, deregister, destroyed;
+	bool pending_disable, pending_enable, deregister, destroyed, banned;
 
 	xa_for_each(&guc->context_lookup, index, ce) {
 		/* Flush context */
@@ -524,6 +554,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 		pending_enable = context_pending_enable(ce);
 		pending_disable = context_pending_disable(ce);
 		deregister = context_wait_for_deregister_to_register(ce);
+		banned = context_banned(ce);
 		init_sched_state(ce);
 
 		if (pending_enable || destroyed || deregister) {
@@ -541,6 +572,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 		/* Not mutualy exclusive with above if statement. */
 		if (pending_disable) {
 			guc_signal_context_fence(ce);
+			if (banned) {
+				guc_cancel_context_requests(ce);
+				intel_engine_signal_breadcrumbs(ce->engine);
+			}
 			intel_context_sched_disable_unpin(ce);
 			atomic_dec(&guc->outstanding_submission_g2h);
 			intel_context_put(ce);
@@ -659,6 +694,9 @@ static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
 {
 	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
 
+	if (intel_context_is_banned(ce))
+		return;
+
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
 
 	/*
@@ -729,6 +767,8 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 	struct i915_request *rq;
 	u32 head;
 
+	intel_context_get(ce);
+
 	/*
 	 * GuC will implicitly mark the context as non-schedulable
 	 * when it sends the reset notification. Make sure our state
@@ -754,6 +794,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 out_replay:
 	guc_reset_state(ce, head, stalled);
 	__unwind_incomplete_requests(ce);
+	intel_context_put(ce);
 }
 
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
@@ -936,8 +977,6 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	ret = guc_add_request(guc, rq);
 	if (ret == -EBUSY)
 		guc->stalled_request = rq;
-	else
-		trace_i915_request_guc_submit(rq);
 
 	if (unlikely(ret == -EPIPE))
 		disable_submission(guc);
@@ -1329,13 +1368,77 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
 	return ce->guc_id;
 }
 
+static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
+						 u16 guc_id,
+						 u32 preemption_timeout)
+{
+	u32 action [] = {
+		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
+		guc_id,
+		preemption_timeout
+	};
+
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+}
+
+static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct intel_runtime_pm *runtime_pm =
+		&ce->engine->gt->i915->runtime_pm;
+	intel_wakeref_t wakeref;
+	unsigned long flags;
+
+	guc_flush_submissions(guc);
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	set_context_banned(ce);
+
+	if (submission_disabled(guc) || (!context_enabled(ce) &&
+	    !context_pending_disable(ce))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		guc_cancel_context_requests(ce);
+		intel_engine_signal_breadcrumbs(ce->engine);
+	} else if (!context_pending_disable(ce)) {
+		u16 guc_id;
+
+		/*
+		 * We add +2 here as the schedule disable complete CTB handler
+		 * calls intel_context_sched_disable_unpin (-2 to pin_count).
+		 */
+		atomic_add(2, &ce->pin_count);
+
+		guc_id = prep_context_pending_disable(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		/*
+		 * In addition to disabling scheduling, set the preemption
+		 * timeout to the minimum value (1 us) so the banned context
+		 * gets kicked off the HW ASAP.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref) {
+			__guc_context_set_preemption_timeout(guc, guc_id, 1);
+			__guc_context_sched_disable(guc, ce, guc_id);
+		}
+	} else {
+		if (!context_guc_id_invalid(ce))
+			with_intel_runtime_pm(runtime_pm, wakeref)
+				__guc_context_set_preemption_timeout(guc,
+								     ce->guc_id,
+								     1);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	}
+}
+
 static void guc_context_sched_disable(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
-	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
 	unsigned long flags;
-	u16 guc_id;
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
 	intel_wakeref_t wakeref;
+	u16 guc_id;
+	bool enabled;
 
 	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
@@ -1349,14 +1452,22 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 
 	/*
-	 * We have to check if the context has been pinned again as another pin
-	 * operation is allowed to pass this function. Checking the pin count,
-	 * within ce->guc_state.lock, synchronizes this function with
+	 * We have to check if the context has been disabled by another thread.
+	 * We also have to check if the context has been pinned again as another
+	 * pin operation is allowed to pass this function. Checking the pin
+	 * count, within ce->guc_state.lock, synchronizes this function with
 	 * guc_request_alloc ensuring a request doesn't slip through the
 	 * 'context_pending_disable' fence. Checking within the spin lock (can't
 	 * sleep) ensures another process doesn't pin this context and generate
 	 * a request before we set the 'context_pending_disable' flag here.
 	 */
+	enabled = context_enabled(ce);
+	if (unlikely(!enabled || submission_disabled(guc))) {
+		if (enabled)
+			clr_context_enabled(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		goto unpin;
+	}
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
@@ -1513,6 +1624,8 @@ static const struct intel_context_ops guc_context_ops = {
 	.unpin = guc_context_unpin,
 	.post_unpin = guc_context_post_unpin,
 
+	.ban = guc_context_ban,
+
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
@@ -1706,6 +1819,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 	.unpin = guc_context_unpin,
 	.post_unpin = guc_context_post_unpin,
 
+	.ban = guc_context_ban,
+
 	.enter = guc_virtual_context_enter,
 	.exit = guc_virtual_context_exit,
 
@@ -2159,6 +2274,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
+		bool banned;
+
 		/*
 		 * Unpin must be done before __guc_signal_context_fence,
 		 * otherwise a race exists between the requests getting
@@ -2169,9 +2286,16 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		intel_context_sched_disable_unpin(ce);
 
 		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		banned = context_banned(ce);
+		clr_context_banned(ce);
 		clr_context_pending_disable(ce);
 		__guc_signal_context_fence(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		if (banned) {
+			guc_cancel_context_requests(ce);
+			intel_engine_signal_breadcrumbs(ce->engine);
+		}
 	}
 
 	decr_outstanding_submission_g2h(guc);
@@ -2206,8 +2330,11 @@ static void guc_handle_context_reset(struct intel_guc *guc,
 				     struct intel_context *ce)
 {
 	trace_intel_context_reset(ce);
-	capture_error_state(guc, ce);
-	guc_context_replay(ce);
+
+	if (likely(!intel_context_is_banned(ce))) {
+		capture_error_state(guc, ce);
+		guc_context_replay(ce);
+	}
 }
 
 int intel_guc_context_reset_process_msg(struct intel_guc *guc,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index c095c4d39456..937d3706af9b 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -934,6 +934,11 @@ DEFINE_EVENT(intel_context, intel_context_reset,
 	     TP_ARGS(ce)
 );
 
+DEFINE_EVENT(intel_context, intel_context_ban,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 DEFINE_EVENT(intel_context, intel_context_register,
 	     TP_PROTO(struct intel_context *ce),
 	     TP_ARGS(ce)
@@ -1036,6 +1041,11 @@ trace_intel_context_reset(struct intel_context *ce)
 {
 }
 
+static inline void
+trace_intel_context_ban(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_register(struct intel_context *ce)
 {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 42/51] drm/i915/guc: Implement banned contexts for GuC submission
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

When using GuC submission, if a context gets banned disable scheduling
and mark all inflight requests as complete.

Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |  13 ++
 drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
 drivers/gpu/drm/i915/gt/intel_reset.c         |  32 +---
 .../gpu/drm/i915/gt/intel_ring_submission.c   |  20 +++
 drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 151 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_trace.h             |  10 ++
 8 files changed, 195 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 28c62f7ccfc7..d87a4c6da5bc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1084,7 +1084,7 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban)
 	for_each_gem_engine(ce, engines, it) {
 		struct intel_engine_cs *engine;
 
-		if (ban && intel_context_set_banned(ce))
+		if (ban && intel_context_ban(ce, NULL))
 			continue;
 
 		/*
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 2ed9bf5f91a5..814d9277096a 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -16,6 +16,7 @@
 #include "intel_engine_types.h"
 #include "intel_ring_types.h"
 #include "intel_timeline_types.h"
+#include "i915_trace.h"
 
 #define CE_TRACE(ce, fmt, ...) do {					\
 	const struct intel_context *ce__ = (ce);			\
@@ -243,6 +244,18 @@ static inline bool intel_context_set_banned(struct intel_context *ce)
 	return test_and_set_bit(CONTEXT_BANNED, &ce->flags);
 }
 
+static inline bool intel_context_ban(struct intel_context *ce,
+				     struct i915_request *rq)
+{
+	bool ret = intel_context_set_banned(ce);
+
+	trace_intel_context_ban(ce);
+	if (ce->ops->ban)
+		ce->ops->ban(ce, rq);
+
+	return ret;
+}
+
 static inline bool
 intel_context_force_single_submission(const struct intel_context *ce)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 035108c10b2c..57c19ee3e313 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -35,6 +35,8 @@ struct intel_context_ops {
 
 	int (*alloc)(struct intel_context *ce);
 
+	void (*ban)(struct intel_context *ce, struct i915_request *rq);
+
 	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
 	int (*pin)(struct intel_context *ce, void *vaddr);
 	void (*unpin)(struct intel_context *ce);
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index f3cdbf4ba5c8..3ed694cab5af 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -22,7 +22,6 @@
 #include "intel_reset.h"
 
 #include "uc/intel_guc.h"
-#include "uc/intel_guc_submission.h"
 
 #define RESET_MAX_RETRIES 3
 
@@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr)
 	intel_uncore_rmw_fw(uncore, reg, clr, 0);
 }
 
-static void skip_context(struct i915_request *rq)
-{
-	struct intel_context *hung_ctx = rq->context;
-
-	list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) {
-		if (!i915_request_is_active(rq))
-			return;
-
-		if (rq->context == hung_ctx) {
-			i915_request_set_error_once(rq, -EIO);
-			__i915_request_skip(rq);
-		}
-	}
-}
-
 static void client_mark_guilty(struct i915_gem_context *ctx, bool banned)
 {
 	struct drm_i915_file_private *file_priv = ctx->file_priv;
@@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq)
 	bool banned;
 	int i;
 
-	if (intel_context_is_closed(rq->context)) {
-		intel_context_set_banned(rq->context);
+	if (intel_context_is_closed(rq->context))
 		return true;
-	}
 
 	rcu_read_lock();
 	ctx = rcu_dereference(rq->context->gem_context);
@@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq)
 	banned = !i915_gem_context_is_recoverable(ctx);
 	if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES))
 		banned = true;
-	if (banned) {
+	if (banned)
 		drm_dbg(&ctx->i915->drm, "context %s: guilty %d, banned\n",
 			ctx->name, atomic_read(&ctx->guilty_count));
-		intel_context_set_banned(rq->context);
-	}
 
 	client_mark_guilty(ctx, banned);
 
@@ -149,6 +129,8 @@ static void mark_innocent(struct i915_request *rq)
 
 void __i915_request_reset(struct i915_request *rq, bool guilty)
 {
+	bool banned = false;
+
 	RQ_TRACE(rq, "guilty? %s\n", yesno(guilty));
 	GEM_BUG_ON(__i915_request_is_complete(rq));
 
@@ -156,13 +138,15 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
 	if (guilty) {
 		i915_request_set_error_once(rq, -EIO);
 		__i915_request_skip(rq);
-		if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine))
-			skip_context(rq);
+		banned = mark_guilty(rq);
 	} else {
 		i915_request_set_error_once(rq, -EAGAIN);
 		mark_innocent(rq);
 	}
 	rcu_read_unlock();
+
+	if (banned)
+		intel_context_ban(rq->context, rq);
 }
 
 static bool i915_in_reset(struct pci_dev *pdev)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
index 3b1471c50d40..03939be4297e 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
@@ -586,9 +586,29 @@ static void ring_context_reset(struct intel_context *ce)
 	clear_bit(CONTEXT_VALID_BIT, &ce->flags);
 }
 
+static void ring_context_ban(struct intel_context *ce,
+			     struct i915_request *rq)
+{
+	struct intel_engine_cs *engine;
+
+	if (!rq || !i915_request_is_active(rq))
+		return;
+
+	engine = rq->engine;
+	lockdep_assert_held(&engine->sched_engine->lock);
+	list_for_each_entry_continue(rq, &engine->sched_engine->requests,
+				     sched.link)
+		if (rq->context == ce) {
+			i915_request_set_error_once(rq, -EIO);
+			__i915_request_skip(rq);
+		}
+}
+
 static const struct intel_context_ops ring_context_ops = {
 	.alloc = ring_context_alloc,
 
+	.ban = ring_context_ban,
+
 	.pre_pin = ring_context_pre_pin,
 	.pin = ring_context_pin,
 	.unpin = ring_context_unpin,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index dc18ac510ac8..eb6062f95d3b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -274,6 +274,8 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine);
 
 int intel_guc_global_policies_update(struct intel_guc *guc);
 
+void intel_guc_context_ban(struct intel_context *ce, struct i915_request *rq);
+
 void intel_guc_submission_reset_prepare(struct intel_guc *guc);
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
 void intel_guc_submission_reset_finish(struct intel_guc *guc);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 6536bd6807a0..149990196e3a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -125,6 +125,7 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
 #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
 #define SCHED_STATE_DESTROYED				BIT(1)
 #define SCHED_STATE_PENDING_DISABLE			BIT(2)
+#define SCHED_STATE_BANNED				BIT(3)
 static inline void init_sched_state(struct intel_context *ce)
 {
 	/* Only should be called from guc_lrc_desc_pin() */
@@ -187,6 +188,23 @@ static inline void clr_context_pending_disable(struct intel_context *ce)
 		(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
 }
 
+static inline bool context_banned(struct intel_context *ce)
+{
+	return (ce->guc_state.sched_state & SCHED_STATE_BANNED);
+}
+
+static inline void set_context_banned(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state |= SCHED_STATE_BANNED;
+}
+
+static inline void clr_context_banned(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	ce->guc_state.sched_state &= ~SCHED_STATE_BANNED;
+}
+
 static inline bool context_guc_id_invalid(struct intel_context *ce)
 {
 	return (ce->guc_id == GUC_INVALID_LRC_ID);
@@ -356,13 +374,23 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
 
 static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 {
-	int err;
+	int err = 0;
 	struct intel_context *ce = rq->context;
 	u32 action[3];
 	int len = 0;
 	u32 g2h_len_dw = 0;
 	bool enabled;
 
+	/*
+	 * Corner case where requests were sitting in the priority list or a
+	 * request resubmitted after the context was banned.
+	 */
+	if (unlikely(intel_context_is_banned(ce))) {
+		i915_request_put(i915_request_mark_eio(rq));
+		intel_engine_signal_breadcrumbs(ce->engine);
+		goto out;
+	}
+
 	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
 	GEM_BUG_ON(context_guc_id_invalid(ce));
 
@@ -398,6 +426,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		clr_context_pending_enable(ce);
 		intel_context_put(ce);
 	}
+	if (likely(!err))
+		trace_i915_request_guc_submit(rq);
 
 out:
 	return err;
@@ -462,7 +492,6 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
 			guc->stalled_request = last;
 			return false;
 		}
-		trace_i915_request_guc_submit(last);
 	}
 
 	guc->stalled_request = NULL;
@@ -501,12 +530,13 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
 static void __guc_context_destroy(struct intel_context *ce);
 static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
 static void guc_signal_context_fence(struct intel_context *ce);
+static void guc_cancel_context_requests(struct intel_context *ce);
 
 static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 {
 	struct intel_context *ce;
 	unsigned long index, flags;
-	bool pending_disable, pending_enable, deregister, destroyed;
+	bool pending_disable, pending_enable, deregister, destroyed, banned;
 
 	xa_for_each(&guc->context_lookup, index, ce) {
 		/* Flush context */
@@ -524,6 +554,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 		pending_enable = context_pending_enable(ce);
 		pending_disable = context_pending_disable(ce);
 		deregister = context_wait_for_deregister_to_register(ce);
+		banned = context_banned(ce);
 		init_sched_state(ce);
 
 		if (pending_enable || destroyed || deregister) {
@@ -541,6 +572,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 		/* Not mutualy exclusive with above if statement. */
 		if (pending_disable) {
 			guc_signal_context_fence(ce);
+			if (banned) {
+				guc_cancel_context_requests(ce);
+				intel_engine_signal_breadcrumbs(ce->engine);
+			}
 			intel_context_sched_disable_unpin(ce);
 			atomic_dec(&guc->outstanding_submission_g2h);
 			intel_context_put(ce);
@@ -659,6 +694,9 @@ static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
 {
 	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
 
+	if (intel_context_is_banned(ce))
+		return;
+
 	GEM_BUG_ON(!intel_context_is_pinned(ce));
 
 	/*
@@ -729,6 +767,8 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 	struct i915_request *rq;
 	u32 head;
 
+	intel_context_get(ce);
+
 	/*
 	 * GuC will implicitly mark the context as non-schedulable
 	 * when it sends the reset notification. Make sure our state
@@ -754,6 +794,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
 out_replay:
 	guc_reset_state(ce, head, stalled);
 	__unwind_incomplete_requests(ce);
+	intel_context_put(ce);
 }
 
 void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
@@ -936,8 +977,6 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
 	ret = guc_add_request(guc, rq);
 	if (ret == -EBUSY)
 		guc->stalled_request = rq;
-	else
-		trace_i915_request_guc_submit(rq);
 
 	if (unlikely(ret == -EPIPE))
 		disable_submission(guc);
@@ -1329,13 +1368,77 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
 	return ce->guc_id;
 }
 
+static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
+						 u16 guc_id,
+						 u32 preemption_timeout)
+{
+	u32 action [] = {
+		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
+		guc_id,
+		preemption_timeout
+	};
+
+	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+}
+
+static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct intel_runtime_pm *runtime_pm =
+		&ce->engine->gt->i915->runtime_pm;
+	intel_wakeref_t wakeref;
+	unsigned long flags;
+
+	guc_flush_submissions(guc);
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	set_context_banned(ce);
+
+	if (submission_disabled(guc) || (!context_enabled(ce) &&
+	    !context_pending_disable(ce))) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		guc_cancel_context_requests(ce);
+		intel_engine_signal_breadcrumbs(ce->engine);
+	} else if (!context_pending_disable(ce)) {
+		u16 guc_id;
+
+		/*
+		 * We add +2 here as the schedule disable complete CTB handler
+		 * calls intel_context_sched_disable_unpin (-2 to pin_count).
+		 */
+		atomic_add(2, &ce->pin_count);
+
+		guc_id = prep_context_pending_disable(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		/*
+		 * In addition to disabling scheduling, set the preemption
+		 * timeout to the minimum value (1 us) so the banned context
+		 * gets kicked off the HW ASAP.
+		 */
+		with_intel_runtime_pm(runtime_pm, wakeref) {
+			__guc_context_set_preemption_timeout(guc, guc_id, 1);
+			__guc_context_sched_disable(guc, ce, guc_id);
+		}
+	} else {
+		if (!context_guc_id_invalid(ce))
+			with_intel_runtime_pm(runtime_pm, wakeref)
+				__guc_context_set_preemption_timeout(guc,
+								     ce->guc_id,
+								     1);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+	}
+}
+
 static void guc_context_sched_disable(struct intel_context *ce)
 {
 	struct intel_guc *guc = ce_to_guc(ce);
-	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
 	unsigned long flags;
-	u16 guc_id;
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
 	intel_wakeref_t wakeref;
+	u16 guc_id;
+	bool enabled;
 
 	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
 	    !lrc_desc_registered(guc, ce->guc_id)) {
@@ -1349,14 +1452,22 @@ static void guc_context_sched_disable(struct intel_context *ce)
 	spin_lock_irqsave(&ce->guc_state.lock, flags);
 
 	/*
-	 * We have to check if the context has been pinned again as another pin
-	 * operation is allowed to pass this function. Checking the pin count,
-	 * within ce->guc_state.lock, synchronizes this function with
+	 * We have to check if the context has been disabled by another thread.
+	 * We also have to check if the context has been pinned again as another
+	 * pin operation is allowed to pass this function. Checking the pin
+	 * count, within ce->guc_state.lock, synchronizes this function with
 	 * guc_request_alloc ensuring a request doesn't slip through the
 	 * 'context_pending_disable' fence. Checking within the spin lock (can't
 	 * sleep) ensures another process doesn't pin this context and generate
 	 * a request before we set the 'context_pending_disable' flag here.
 	 */
+	enabled = context_enabled(ce);
+	if (unlikely(!enabled || submission_disabled(guc))) {
+		if (enabled)
+			clr_context_enabled(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		goto unpin;
+	}
 	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 		return;
@@ -1513,6 +1624,8 @@ static const struct intel_context_ops guc_context_ops = {
 	.unpin = guc_context_unpin,
 	.post_unpin = guc_context_post_unpin,
 
+	.ban = guc_context_ban,
+
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
@@ -1706,6 +1819,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 	.unpin = guc_context_unpin,
 	.post_unpin = guc_context_post_unpin,
 
+	.ban = guc_context_ban,
+
 	.enter = guc_virtual_context_enter,
 	.exit = guc_virtual_context_exit,
 
@@ -2159,6 +2274,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 	if (context_pending_enable(ce)) {
 		clr_context_pending_enable(ce);
 	} else if (context_pending_disable(ce)) {
+		bool banned;
+
 		/*
 		 * Unpin must be done before __guc_signal_context_fence,
 		 * otherwise a race exists between the requests getting
@@ -2169,9 +2286,16 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		intel_context_sched_disable_unpin(ce);
 
 		spin_lock_irqsave(&ce->guc_state.lock, flags);
+		banned = context_banned(ce);
+		clr_context_banned(ce);
 		clr_context_pending_disable(ce);
 		__guc_signal_context_fence(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+		if (banned) {
+			guc_cancel_context_requests(ce);
+			intel_engine_signal_breadcrumbs(ce->engine);
+		}
 	}
 
 	decr_outstanding_submission_g2h(guc);
@@ -2206,8 +2330,11 @@ static void guc_handle_context_reset(struct intel_guc *guc,
 				     struct intel_context *ce)
 {
 	trace_intel_context_reset(ce);
-	capture_error_state(guc, ce);
-	guc_context_replay(ce);
+
+	if (likely(!intel_context_is_banned(ce))) {
+		capture_error_state(guc, ce);
+		guc_context_replay(ce);
+	}
 }
 
 int intel_guc_context_reset_process_msg(struct intel_guc *guc,
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index c095c4d39456..937d3706af9b 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -934,6 +934,11 @@ DEFINE_EVENT(intel_context, intel_context_reset,
 	     TP_ARGS(ce)
 );
 
+DEFINE_EVENT(intel_context, intel_context_ban,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
+);
+
 DEFINE_EVENT(intel_context, intel_context_register,
 	     TP_PROTO(struct intel_context *ce),
 	     TP_ARGS(ce)
@@ -1036,6 +1041,11 @@ trace_intel_context_reset(struct intel_context *ce)
 {
 }
 
+static inline void
+trace_intel_context_ban(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_register(struct intel_context *ce)
 {
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 43/51] drm/i915/guc: Support request cancellation
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

This adds GuC backend support for i915_request_cancel(), which in turn
makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   9 +
 drivers/gpu/drm/i915/gt/intel_context.h       |   7 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
 .../drm/i915/gt/intel_execlists_submission.c  |  18 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 169 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c           |  14 +-
 6 files changed, 211 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index dd078a80c3a3..b1e3d00fb1f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -366,6 +366,12 @@ static int __intel_context_active(struct i915_active *active)
 	return 0;
 }
 
+static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
+				 enum i915_sw_fence_notify state)
+{
+	return NOTIFY_DONE;
+}
+
 void
 intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 {
@@ -399,6 +405,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
 
+	i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
+	i915_sw_fence_commit(&ce->guc_blocked);
+
 	i915_active_init(&ce->active,
 			 __intel_context_active, __intel_context_retire, 0);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 814d9277096a..876bdb08303c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -70,6 +70,13 @@ intel_context_is_pinned(struct intel_context *ce)
 	return atomic_read(&ce->pin_count);
 }
 
+static inline void intel_context_cancel_request(struct intel_context *ce,
+						struct i915_request *rq)
+{
+	GEM_BUG_ON(!ce->ops->cancel_request);
+	return ce->ops->cancel_request(ce, rq);
+}
+
 /**
  * intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status
  * @ce - the context
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 57c19ee3e313..005a64f2afa7 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -13,6 +13,7 @@
 #include <linux/types.h>
 
 #include "i915_active_types.h"
+#include "i915_sw_fence.h"
 #include "i915_utils.h"
 #include "intel_engine_types.h"
 #include "intel_sseu.h"
@@ -42,6 +43,9 @@ struct intel_context_ops {
 	void (*unpin)(struct intel_context *ce);
 	void (*post_unpin)(struct intel_context *ce);
 
+	void (*cancel_request)(struct intel_context *ce,
+			       struct i915_request *rq);
+
 	void (*enter)(struct intel_context *ce);
 	void (*exit)(struct intel_context *ce);
 
@@ -184,6 +188,9 @@ struct intel_context {
 	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
 	struct list_head guc_id_link;
+
+	/* GuC context blocked fence */
+	struct i915_sw_fence guc_blocked;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index f9b5f54a5abe..8f6dc0fb49a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -114,6 +114,7 @@
 #include "gen8_engine_cs.h"
 #include "intel_breadcrumbs.h"
 #include "intel_context.h"
+#include "intel_engine_heartbeat.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_stats.h"
 #include "intel_execlists_submission.h"
@@ -2536,11 +2537,26 @@ static int execlists_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void execlists_context_cancel_request(struct intel_context *ce,
+					     struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = NULL;
+
+	i915_request_active_engine(rq, &engine);
+
+	if (engine && intel_engine_pulse(engine))
+		intel_gt_handle_error(engine->gt, engine->mask, 0,
+				      "request cancellation by %s",
+				      current->comm);
+}
+
 static const struct intel_context_ops execlists_context_ops = {
 	.flags = COPS_HAS_INFLIGHT,
 
 	.alloc = execlists_context_alloc,
 
+	.cancel_request = execlists_context_cancel_request,
+
 	.pre_pin = execlists_context_pre_pin,
 	.pin = execlists_context_pin,
 	.unpin = lrc_unpin,
@@ -3558,6 +3574,8 @@ static const struct intel_context_ops virtual_context_ops = {
 
 	.alloc = virtual_context_alloc,
 
+	.cancel_request = execlists_context_cancel_request,
+
 	.pre_pin = virtual_context_pre_pin,
 	.pin = virtual_context_pin,
 	.unpin = lrc_unpin,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 149990196e3a..1c30d04733ff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -81,6 +81,11 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
  */
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
 #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
+#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
+#define SCHED_STATE_NO_LOCK_BLOCKED \
+	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
+#define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
+	(0xffff << SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -116,6 +121,27 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline u32 context_blocked(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_BLOCKED_MASK) >>
+		SCHED_STATE_NO_LOCK_BLOCKED_SHIFT;
+}
+
+static inline void incr_context_blocked(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->engine->sched_engine->lock);
+	atomic_add(SCHED_STATE_NO_LOCK_BLOCKED,
+		   &ce->guc_sched_state_no_lock);
+}
+
+static inline void decr_context_blocked(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->engine->sched_engine->lock);
+	atomic_sub(SCHED_STATE_NO_LOCK_BLOCKED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -403,6 +429,10 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		if (unlikely(err))
 			goto out;
 	}
+
+	if (unlikely(context_blocked(ce)))
+		goto out;
+
 	enabled = context_enabled(ce);
 
 	if (!enabled) {
@@ -531,6 +561,7 @@ static void __guc_context_destroy(struct intel_context *ce);
 static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
 static void guc_signal_context_fence(struct intel_context *ce);
 static void guc_cancel_context_requests(struct intel_context *ce);
+static void guc_blocked_fence_complete(struct intel_context *ce);
 
 static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 {
@@ -578,6 +609,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 			}
 			intel_context_sched_disable_unpin(ce);
 			atomic_dec(&guc->outstanding_submission_g2h);
+			spin_lock_irqsave(&ce->guc_state.lock, flags);
+			guc_blocked_fence_complete(ce);
+			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
 			intel_context_put(ce);
 		}
 	}
@@ -1339,6 +1374,21 @@ static void guc_context_post_unpin(struct intel_context *ce)
 	lrc_post_unpin(ce);
 }
 
+static void __guc_context_sched_enable(struct intel_guc *guc,
+				       struct intel_context *ce)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
+		ce->guc_id,
+		GUC_CONTEXT_ENABLE
+	};
+
+	trace_intel_context_sched_enable(ce);
+
+	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
+				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
+}
+
 static void __guc_context_sched_disable(struct intel_guc *guc,
 					struct intel_context *ce,
 					u16 guc_id)
@@ -1357,17 +1407,131 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
+static void guc_blocked_fence_complete(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+
+	if (!i915_sw_fence_done(&ce->guc_blocked))
+		i915_sw_fence_complete(&ce->guc_blocked);
+}
+
+static void guc_blocked_fence_reinit(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked));
+	i915_sw_fence_fini(&ce->guc_blocked);
+	i915_sw_fence_reinit(&ce->guc_blocked);
+	i915_sw_fence_await(&ce->guc_blocked);
+	i915_sw_fence_commit(&ce->guc_blocked);
+}
+
 static u16 prep_context_pending_disable(struct intel_context *ce)
 {
 	lockdep_assert_held(&ce->guc_state.lock);
 
 	set_context_pending_disable(ce);
 	clr_context_enabled(ce);
+	guc_blocked_fence_reinit(ce);
 	intel_context_get(ce);
 
 	return ce->guc_id;
 }
 
+static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+	unsigned long flags;
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	intel_wakeref_t wakeref;
+	u16 guc_id;
+	bool enabled;
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	incr_context_blocked(ce);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	enabled = context_enabled(ce);
+	if (unlikely(!enabled || submission_disabled(guc))) {
+		if (enabled)
+			clr_context_enabled(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		return &ce->guc_blocked;
+	}
+
+	/*
+	 * We add +2 here as the schedule disable complete CTB handler calls
+	 * intel_context_sched_disable_unpin (-2 to pin_count).
+	 */
+	atomic_add(2, &ce->pin_count);
+
+	guc_id = prep_context_pending_disable(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_disable(guc, ce, guc_id);
+
+	return &ce->guc_blocked;
+}
+
+static void guc_context_unblock(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+	unsigned long flags;
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	intel_wakeref_t wakeref;
+
+	GEM_BUG_ON(context_enabled(ce));
+
+	if (unlikely(context_blocked(ce) > 1)) {
+		spin_lock_irqsave(&sched_engine->lock, flags);
+		if (likely(context_blocked(ce) > 1))
+			goto decrement;
+		spin_unlock_irqrestore(&sched_engine->lock, flags);
+	}
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	if (unlikely(submission_disabled(guc) ||
+		     !intel_context_is_pinned(ce) ||
+		     context_pending_disable(ce) ||
+		     context_blocked(ce) > 1)) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		goto out;
+	}
+
+	set_context_pending_enable(ce);
+	set_context_enabled(ce);
+	intel_context_get(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_enable(guc, ce);
+
+out:
+	spin_lock_irqsave(&sched_engine->lock, flags);
+decrement:
+	decr_context_blocked(ce);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+static void guc_context_cancel_request(struct intel_context *ce,
+				       struct i915_request *rq)
+{
+	if (i915_sw_fence_signaled(&rq->submit)) {
+		struct i915_sw_fence *fence = guc_context_block(ce);
+
+		i915_sw_fence_wait(fence);
+		if (!i915_request_completed(rq)) {
+			__i915_request_skip(rq);
+			guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head),
+					true);
+		}
+		guc_context_unblock(ce);
+	}
+}
+
 static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
 						 u16 guc_id,
 						 u32 preemption_timeout)
@@ -1626,6 +1790,8 @@ static const struct intel_context_ops guc_context_ops = {
 
 	.ban = guc_context_ban,
 
+	.cancel_request = guc_context_cancel_request,
+
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
@@ -1821,6 +1987,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 
 	.ban = guc_context_ban,
 
+	.cancel_request = guc_context_cancel_request,
+
 	.enter = guc_virtual_context_enter,
 	.exit = guc_virtual_context_exit,
 
@@ -2290,6 +2458,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		clr_context_banned(ce);
 		clr_context_pending_disable(ce);
 		__guc_signal_context_fence(ce);
+		guc_blocked_fence_complete(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 		if (banned) {
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index eb109f93ebcb..f3552642b8a1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -708,18 +708,6 @@ void i915_request_unsubmit(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
-static void __cancel_request(struct i915_request *rq)
-{
-	struct intel_engine_cs *engine = NULL;
-
-	i915_request_active_engine(rq, &engine);
-
-	if (engine && intel_engine_pulse(engine))
-		intel_gt_handle_error(engine->gt, engine->mask, 0,
-				      "request cancellation by %s",
-				      current->comm);
-}
-
 void i915_request_cancel(struct i915_request *rq, int error)
 {
 	if (!i915_request_set_error_once(rq, error))
@@ -727,7 +715,7 @@ void i915_request_cancel(struct i915_request *rq, int error)
 
 	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
 
-	__cancel_request(rq);
+	intel_context_cancel_request(rq->context, rq);
 }
 
 static int __i915_sw_fence_call
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 43/51] drm/i915/guc: Support request cancellation
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

This adds GuC backend support for i915_request_cancel(), which in turn
makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |   9 +
 drivers/gpu/drm/i915/gt/intel_context.h       |   7 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
 .../drm/i915/gt/intel_execlists_submission.c  |  18 ++
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 169 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_request.c           |  14 +-
 6 files changed, 211 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index dd078a80c3a3..b1e3d00fb1f2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -366,6 +366,12 @@ static int __intel_context_active(struct i915_active *active)
 	return 0;
 }
 
+static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
+				 enum i915_sw_fence_notify state)
+{
+	return NOTIFY_DONE;
+}
+
 void
 intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 {
@@ -399,6 +405,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	ce->guc_id = GUC_INVALID_LRC_ID;
 	INIT_LIST_HEAD(&ce->guc_id_link);
 
+	i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
+	i915_sw_fence_commit(&ce->guc_blocked);
+
 	i915_active_init(&ce->active,
 			 __intel_context_active, __intel_context_retire, 0);
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index 814d9277096a..876bdb08303c 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -70,6 +70,13 @@ intel_context_is_pinned(struct intel_context *ce)
 	return atomic_read(&ce->pin_count);
 }
 
+static inline void intel_context_cancel_request(struct intel_context *ce,
+						struct i915_request *rq)
+{
+	GEM_BUG_ON(!ce->ops->cancel_request);
+	return ce->ops->cancel_request(ce, rq);
+}
+
 /**
  * intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status
  * @ce - the context
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 57c19ee3e313..005a64f2afa7 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -13,6 +13,7 @@
 #include <linux/types.h>
 
 #include "i915_active_types.h"
+#include "i915_sw_fence.h"
 #include "i915_utils.h"
 #include "intel_engine_types.h"
 #include "intel_sseu.h"
@@ -42,6 +43,9 @@ struct intel_context_ops {
 	void (*unpin)(struct intel_context *ce);
 	void (*post_unpin)(struct intel_context *ce);
 
+	void (*cancel_request)(struct intel_context *ce,
+			       struct i915_request *rq);
+
 	void (*enter)(struct intel_context *ce);
 	void (*exit)(struct intel_context *ce);
 
@@ -184,6 +188,9 @@ struct intel_context {
 	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
 	 */
 	struct list_head guc_id_link;
+
+	/* GuC context blocked fence */
+	struct i915_sw_fence guc_blocked;
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index f9b5f54a5abe..8f6dc0fb49a6 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -114,6 +114,7 @@
 #include "gen8_engine_cs.h"
 #include "intel_breadcrumbs.h"
 #include "intel_context.h"
+#include "intel_engine_heartbeat.h"
 #include "intel_engine_pm.h"
 #include "intel_engine_stats.h"
 #include "intel_execlists_submission.h"
@@ -2536,11 +2537,26 @@ static int execlists_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void execlists_context_cancel_request(struct intel_context *ce,
+					     struct i915_request *rq)
+{
+	struct intel_engine_cs *engine = NULL;
+
+	i915_request_active_engine(rq, &engine);
+
+	if (engine && intel_engine_pulse(engine))
+		intel_gt_handle_error(engine->gt, engine->mask, 0,
+				      "request cancellation by %s",
+				      current->comm);
+}
+
 static const struct intel_context_ops execlists_context_ops = {
 	.flags = COPS_HAS_INFLIGHT,
 
 	.alloc = execlists_context_alloc,
 
+	.cancel_request = execlists_context_cancel_request,
+
 	.pre_pin = execlists_context_pre_pin,
 	.pin = execlists_context_pin,
 	.unpin = lrc_unpin,
@@ -3558,6 +3574,8 @@ static const struct intel_context_ops virtual_context_ops = {
 
 	.alloc = virtual_context_alloc,
 
+	.cancel_request = execlists_context_cancel_request,
+
 	.pre_pin = virtual_context_pre_pin,
 	.pin = virtual_context_pin,
 	.unpin = lrc_unpin,
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 149990196e3a..1c30d04733ff 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -81,6 +81,11 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
  */
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
 #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
+#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
+#define SCHED_STATE_NO_LOCK_BLOCKED \
+	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
+#define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
+	(0xffff << SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
 static inline bool context_enabled(struct intel_context *ce)
 {
 	return (atomic_read(&ce->guc_sched_state_no_lock) &
@@ -116,6 +121,27 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline u32 context_blocked(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_BLOCKED_MASK) >>
+		SCHED_STATE_NO_LOCK_BLOCKED_SHIFT;
+}
+
+static inline void incr_context_blocked(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->engine->sched_engine->lock);
+	atomic_add(SCHED_STATE_NO_LOCK_BLOCKED,
+		   &ce->guc_sched_state_no_lock);
+}
+
+static inline void decr_context_blocked(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->engine->sched_engine->lock);
+	atomic_sub(SCHED_STATE_NO_LOCK_BLOCKED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -403,6 +429,10 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 		if (unlikely(err))
 			goto out;
 	}
+
+	if (unlikely(context_blocked(ce)))
+		goto out;
+
 	enabled = context_enabled(ce);
 
 	if (!enabled) {
@@ -531,6 +561,7 @@ static void __guc_context_destroy(struct intel_context *ce);
 static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
 static void guc_signal_context_fence(struct intel_context *ce);
 static void guc_cancel_context_requests(struct intel_context *ce);
+static void guc_blocked_fence_complete(struct intel_context *ce);
 
 static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 {
@@ -578,6 +609,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
 			}
 			intel_context_sched_disable_unpin(ce);
 			atomic_dec(&guc->outstanding_submission_g2h);
+			spin_lock_irqsave(&ce->guc_state.lock, flags);
+			guc_blocked_fence_complete(ce);
+			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
 			intel_context_put(ce);
 		}
 	}
@@ -1339,6 +1374,21 @@ static void guc_context_post_unpin(struct intel_context *ce)
 	lrc_post_unpin(ce);
 }
 
+static void __guc_context_sched_enable(struct intel_guc *guc,
+				       struct intel_context *ce)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
+		ce->guc_id,
+		GUC_CONTEXT_ENABLE
+	};
+
+	trace_intel_context_sched_enable(ce);
+
+	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
+				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
+}
+
 static void __guc_context_sched_disable(struct intel_guc *guc,
 					struct intel_context *ce,
 					u16 guc_id)
@@ -1357,17 +1407,131 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
 				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
 }
 
+static void guc_blocked_fence_complete(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+
+	if (!i915_sw_fence_done(&ce->guc_blocked))
+		i915_sw_fence_complete(&ce->guc_blocked);
+}
+
+static void guc_blocked_fence_reinit(struct intel_context *ce)
+{
+	lockdep_assert_held(&ce->guc_state.lock);
+	GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked));
+	i915_sw_fence_fini(&ce->guc_blocked);
+	i915_sw_fence_reinit(&ce->guc_blocked);
+	i915_sw_fence_await(&ce->guc_blocked);
+	i915_sw_fence_commit(&ce->guc_blocked);
+}
+
 static u16 prep_context_pending_disable(struct intel_context *ce)
 {
 	lockdep_assert_held(&ce->guc_state.lock);
 
 	set_context_pending_disable(ce);
 	clr_context_enabled(ce);
+	guc_blocked_fence_reinit(ce);
 	intel_context_get(ce);
 
 	return ce->guc_id;
 }
 
+static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+	unsigned long flags;
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	intel_wakeref_t wakeref;
+	u16 guc_id;
+	bool enabled;
+
+	spin_lock_irqsave(&sched_engine->lock, flags);
+	incr_context_blocked(ce);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	enabled = context_enabled(ce);
+	if (unlikely(!enabled || submission_disabled(guc))) {
+		if (enabled)
+			clr_context_enabled(ce);
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		return &ce->guc_blocked;
+	}
+
+	/*
+	 * We add +2 here as the schedule disable complete CTB handler calls
+	 * intel_context_sched_disable_unpin (-2 to pin_count).
+	 */
+	atomic_add(2, &ce->pin_count);
+
+	guc_id = prep_context_pending_disable(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_disable(guc, ce, guc_id);
+
+	return &ce->guc_blocked;
+}
+
+static void guc_context_unblock(struct intel_context *ce)
+{
+	struct intel_guc *guc = ce_to_guc(ce);
+	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
+	unsigned long flags;
+	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
+	intel_wakeref_t wakeref;
+
+	GEM_BUG_ON(context_enabled(ce));
+
+	if (unlikely(context_blocked(ce) > 1)) {
+		spin_lock_irqsave(&sched_engine->lock, flags);
+		if (likely(context_blocked(ce) > 1))
+			goto decrement;
+		spin_unlock_irqrestore(&sched_engine->lock, flags);
+	}
+
+	spin_lock_irqsave(&ce->guc_state.lock, flags);
+	if (unlikely(submission_disabled(guc) ||
+		     !intel_context_is_pinned(ce) ||
+		     context_pending_disable(ce) ||
+		     context_blocked(ce) > 1)) {
+		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+		goto out;
+	}
+
+	set_context_pending_enable(ce);
+	set_context_enabled(ce);
+	intel_context_get(ce);
+	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
+
+	with_intel_runtime_pm(runtime_pm, wakeref)
+		__guc_context_sched_enable(guc, ce);
+
+out:
+	spin_lock_irqsave(&sched_engine->lock, flags);
+decrement:
+	decr_context_blocked(ce);
+	spin_unlock_irqrestore(&sched_engine->lock, flags);
+}
+
+static void guc_context_cancel_request(struct intel_context *ce,
+				       struct i915_request *rq)
+{
+	if (i915_sw_fence_signaled(&rq->submit)) {
+		struct i915_sw_fence *fence = guc_context_block(ce);
+
+		i915_sw_fence_wait(fence);
+		if (!i915_request_completed(rq)) {
+			__i915_request_skip(rq);
+			guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head),
+					true);
+		}
+		guc_context_unblock(ce);
+	}
+}
+
 static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
 						 u16 guc_id,
 						 u32 preemption_timeout)
@@ -1626,6 +1790,8 @@ static const struct intel_context_ops guc_context_ops = {
 
 	.ban = guc_context_ban,
 
+	.cancel_request = guc_context_cancel_request,
+
 	.enter = intel_context_enter_engine,
 	.exit = intel_context_exit_engine,
 
@@ -1821,6 +1987,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
 
 	.ban = guc_context_ban,
 
+	.cancel_request = guc_context_cancel_request,
+
 	.enter = guc_virtual_context_enter,
 	.exit = guc_virtual_context_exit,
 
@@ -2290,6 +2458,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
 		clr_context_banned(ce);
 		clr_context_pending_disable(ce);
 		__guc_signal_context_fence(ce);
+		guc_blocked_fence_complete(ce);
 		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
 
 		if (banned) {
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index eb109f93ebcb..f3552642b8a1 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -708,18 +708,6 @@ void i915_request_unsubmit(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
 }
 
-static void __cancel_request(struct i915_request *rq)
-{
-	struct intel_engine_cs *engine = NULL;
-
-	i915_request_active_engine(rq, &engine);
-
-	if (engine && intel_engine_pulse(engine))
-		intel_gt_handle_error(engine->gt, engine->mask, 0,
-				      "request cancellation by %s",
-				      current->comm);
-}
-
 void i915_request_cancel(struct i915_request *rq, int error)
 {
 	if (!i915_request_set_error_once(rq, error))
@@ -727,7 +715,7 @@ void i915_request_cancel(struct i915_request *rq, int error)
 
 	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
 
-	__cancel_request(rq);
+	intel_context_cancel_request(rq->context, rq);
 }
 
 static int __i915_sw_fence_call
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 44/51] drm/i915/selftest: Better error reporting from hangcheck selftest
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

There are many ways in which the hangcheck selftest can fail. Very few
of them actually printed an error message to say what happened. So,
fill in the missing messages.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 89 ++++++++++++++++----
 1 file changed, 72 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 7aea10aa1fb4..0ed87cc4d063 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -378,6 +378,7 @@ static int igt_reset_nop(void *arg)
 			ce = intel_context_create(engine);
 			if (IS_ERR(ce)) {
 				err = PTR_ERR(ce);
+				pr_err("[%s] Create context failed: %d!\n", engine->name, err);
 				break;
 			}
 
@@ -387,6 +388,7 @@ static int igt_reset_nop(void *arg)
 				rq = intel_context_create_request(ce);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
+					pr_err("[%s] Create request failed: %d!\n", engine->name, err);
 					break;
 				}
 
@@ -401,24 +403,31 @@ static int igt_reset_nop(void *arg)
 		igt_global_reset_unlock(gt);
 
 		if (intel_gt_is_wedged(gt)) {
+			pr_err("[%s] GT is wedged!\n", engine->name);
 			err = -EIO;
 			break;
 		}
 
 		if (i915_reset_count(global) != reset_count + ++count) {
-			pr_err("Full GPU reset not recorded!\n");
+			pr_err("[%s] Reset not recorded: %d vs %d + %d!\n",
+			       engine->name, i915_reset_count(global), reset_count, count);
 			err = -EINVAL;
 			break;
 		}
 
 		err = igt_flush_test(gt->i915);
-		if (err)
+		if (err) {
+			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
 			break;
+		}
 	} while (time_before(jiffies, end_time));
 	pr_info("%s: %d resets\n", __func__, count);
 
-	if (igt_flush_test(gt->i915))
+	if (igt_flush_test(gt->i915)) {
+		pr_err("Post flush failed: %d!\n", err);
 		err = -EIO;
+	}
+
 	return err;
 }
 
@@ -441,8 +450,10 @@ static int igt_reset_nop_engine(void *arg)
 		int err;
 
 		ce = intel_context_create(engine);
-		if (IS_ERR(ce))
+		if (IS_ERR(ce)) {
+			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
 			return PTR_ERR(ce);
+		}
 
 		reset_count = i915_reset_count(global);
 		reset_engine_count = i915_reset_engine_count(global, engine);
@@ -550,8 +561,10 @@ static int igt_reset_fail_engine(void *arg)
 		int err;
 
 		ce = intel_context_create(engine);
-		if (IS_ERR(ce))
+		if (IS_ERR(ce)) {
+			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
 			return PTR_ERR(ce);
+		}
 
 		st_engine_heartbeat_disable(engine);
 		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
@@ -711,6 +724,7 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 				rq = hang_create_request(&h, engine);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
+					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 					break;
 				}
 
@@ -765,12 +779,16 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 			break;
 
 		err = igt_flush_test(gt->i915);
-		if (err)
+		if (err) {
+			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
 			break;
+		}
 	}
 
-	if (intel_gt_is_wedged(gt))
+	if (intel_gt_is_wedged(gt)) {
+		pr_err("GT is wedged!\n");
 		err = -EIO;
+	}
 
 	if (active)
 		hang_fini(&h);
@@ -837,6 +855,7 @@ static int active_engine(void *data)
 		ce[count] = intel_context_create(engine);
 		if (IS_ERR(ce[count])) {
 			err = PTR_ERR(ce[count]);
+			pr_err("[%s] Create context #%ld failed: %d!\n", engine->name, count, err);
 			while (--count)
 				intel_context_put(ce[count]);
 			return err;
@@ -852,6 +871,7 @@ static int active_engine(void *data)
 		new = intel_context_create_request(ce[idx]);
 		if (IS_ERR(new)) {
 			err = PTR_ERR(new);
+			pr_err("[%s] Create request #%d failed: %d!\n", engine->name, idx, err);
 			break;
 		}
 
@@ -867,8 +887,10 @@ static int active_engine(void *data)
 		}
 
 		err = active_request_put(old);
-		if (err)
+		if (err) {
+			pr_err("[%s] Request put failed: %d!\n", engine->name, err);
 			break;
+		}
 
 		cond_resched();
 	}
@@ -876,6 +898,9 @@ static int active_engine(void *data)
 	for (count = 0; count < ARRAY_SIZE(rq); count++) {
 		int err__ = active_request_put(rq[count]);
 
+		if (err)
+			pr_err("[%s] Request put #%ld failed: %d!\n", engine->name, count, err);
+
 		/* Keep the first error */
 		if (!err)
 			err = err__;
@@ -949,6 +974,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 					  "igt/%s", other->name);
 			if (IS_ERR(tsk)) {
 				err = PTR_ERR(tsk);
+				pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err);
 				goto unwind;
 			}
 
@@ -967,6 +993,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 				rq = hang_create_request(&h, engine);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
+					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 					break;
 				}
 
@@ -999,10 +1026,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			if (rq) {
 				if (rq->fence.error != -EIO) {
 					pr_err("i915_reset_engine(%s:%s):"
-					       " failed to reset request %llx:%lld\n",
+					       " failed to reset request %lld:%lld [0x%04X]\n",
 					       engine->name, test_name,
 					       rq->fence.context,
-					       rq->fence.seqno);
+					       rq->fence.seqno, rq->context->guc_id);
 					i915_request_put(rq);
 
 					GEM_TRACE_DUMP();
@@ -1101,8 +1128,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			break;
 
 		err = igt_flush_test(gt->i915);
-		if (err)
+		if (err) {
+			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
 			break;
+		}
 	}
 
 	if (intel_gt_is_wedged(gt))
@@ -1180,12 +1209,15 @@ static int igt_reset_wait(void *arg)
 	igt_global_reset_lock(gt);
 
 	err = hang_init(&h, gt);
-	if (err)
+	if (err) {
+		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
 		goto unlock;
+	}
 
 	rq = hang_create_request(&h, engine);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
+		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 		goto fini;
 	}
 
@@ -1310,12 +1342,15 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	/* Check that we can recover an unbind stuck on a hanging request */
 
 	err = hang_init(&h, gt);
-	if (err)
+	if (err) {
+		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
 		return err;
+	}
 
 	obj = i915_gem_object_create_internal(gt->i915, SZ_1M);
 	if (IS_ERR(obj)) {
 		err = PTR_ERR(obj);
+		pr_err("[%s] Create object failed: %d!\n", engine->name, err);
 		goto fini;
 	}
 
@@ -1330,12 +1365,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	arg.vma = i915_vma_instance(obj, vm, NULL);
 	if (IS_ERR(arg.vma)) {
 		err = PTR_ERR(arg.vma);
+		pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
 		goto out_obj;
 	}
 
 	rq = hang_create_request(&h, engine);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
+		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 		goto out_obj;
 	}
 
@@ -1347,6 +1384,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	err = i915_vma_pin(arg.vma, 0, 0, pin_flags);
 	if (err) {
 		i915_request_add(rq);
+		pr_err("[%s] VMA pin failed: %d!\n", engine->name, err);
 		goto out_obj;
 	}
 
@@ -1363,8 +1401,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	i915_vma_lock(arg.vma);
 	err = i915_request_await_object(rq, arg.vma->obj,
 					flags & EXEC_OBJECT_WRITE);
-	if (err == 0)
+	if (err == 0) {
 		err = i915_vma_move_to_active(arg.vma, rq, flags);
+		if (err)
+			pr_err("[%s] Move to active failed: %d!\n", engine->name, err);
+	} else {
+		pr_err("[%s] Request await failed: %d!\n", engine->name, err);
+	}
+
 	i915_vma_unlock(arg.vma);
 
 	if (flags & EXEC_OBJECT_NEEDS_FENCE)
@@ -1392,6 +1436,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	tsk = kthread_run(fn, &arg, "igt/evict_vma");
 	if (IS_ERR(tsk)) {
 		err = PTR_ERR(tsk);
+		pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err);
 		tsk = NULL;
 		goto out_reset;
 	}
@@ -1518,6 +1563,7 @@ static int igt_reset_queue(void *arg)
 		prev = hang_create_request(&h, engine);
 		if (IS_ERR(prev)) {
 			err = PTR_ERR(prev);
+			pr_err("[%s] Create 'prev' hang request failed: %d!\n", engine->name, err);
 			goto fini;
 		}
 
@@ -1532,6 +1578,7 @@ static int igt_reset_queue(void *arg)
 			rq = hang_create_request(&h, engine);
 			if (IS_ERR(rq)) {
 				err = PTR_ERR(rq);
+				pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 				goto fini;
 			}
 
@@ -1619,8 +1666,10 @@ static int igt_reset_queue(void *arg)
 		i915_request_put(prev);
 
 		err = igt_flush_test(gt->i915);
-		if (err)
+		if (err) {
+			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
 			break;
+		}
 	}
 
 fini:
@@ -1653,12 +1702,15 @@ static int igt_handle_error(void *arg)
 		return 0;
 
 	err = hang_init(&h, gt);
-	if (err)
+	if (err) {
+		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
 		return err;
+	}
 
 	rq = hang_create_request(&h, engine);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
+		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 		goto err_fini;
 	}
 
@@ -1743,12 +1795,15 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 		return err;
 
 	err = hang_init(&h, engine->gt);
-	if (err)
+	if (err) {
+		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
 		return err;
+	}
 
 	rq = hang_create_request(&h, engine);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
+		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 		goto out;
 	}
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 44/51] drm/i915/selftest: Better error reporting from hangcheck selftest
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

There are many ways in which the hangcheck selftest can fail. Very few
of them actually printed an error message to say what happened. So,
fill in the missing messages.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 89 ++++++++++++++++----
 1 file changed, 72 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 7aea10aa1fb4..0ed87cc4d063 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -378,6 +378,7 @@ static int igt_reset_nop(void *arg)
 			ce = intel_context_create(engine);
 			if (IS_ERR(ce)) {
 				err = PTR_ERR(ce);
+				pr_err("[%s] Create context failed: %d!\n", engine->name, err);
 				break;
 			}
 
@@ -387,6 +388,7 @@ static int igt_reset_nop(void *arg)
 				rq = intel_context_create_request(ce);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
+					pr_err("[%s] Create request failed: %d!\n", engine->name, err);
 					break;
 				}
 
@@ -401,24 +403,31 @@ static int igt_reset_nop(void *arg)
 		igt_global_reset_unlock(gt);
 
 		if (intel_gt_is_wedged(gt)) {
+			pr_err("[%s] GT is wedged!\n", engine->name);
 			err = -EIO;
 			break;
 		}
 
 		if (i915_reset_count(global) != reset_count + ++count) {
-			pr_err("Full GPU reset not recorded!\n");
+			pr_err("[%s] Reset not recorded: %d vs %d + %d!\n",
+			       engine->name, i915_reset_count(global), reset_count, count);
 			err = -EINVAL;
 			break;
 		}
 
 		err = igt_flush_test(gt->i915);
-		if (err)
+		if (err) {
+			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
 			break;
+		}
 	} while (time_before(jiffies, end_time));
 	pr_info("%s: %d resets\n", __func__, count);
 
-	if (igt_flush_test(gt->i915))
+	if (igt_flush_test(gt->i915)) {
+		pr_err("Post flush failed: %d!\n", err);
 		err = -EIO;
+	}
+
 	return err;
 }
 
@@ -441,8 +450,10 @@ static int igt_reset_nop_engine(void *arg)
 		int err;
 
 		ce = intel_context_create(engine);
-		if (IS_ERR(ce))
+		if (IS_ERR(ce)) {
+			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
 			return PTR_ERR(ce);
+		}
 
 		reset_count = i915_reset_count(global);
 		reset_engine_count = i915_reset_engine_count(global, engine);
@@ -550,8 +561,10 @@ static int igt_reset_fail_engine(void *arg)
 		int err;
 
 		ce = intel_context_create(engine);
-		if (IS_ERR(ce))
+		if (IS_ERR(ce)) {
+			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
 			return PTR_ERR(ce);
+		}
 
 		st_engine_heartbeat_disable(engine);
 		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
@@ -711,6 +724,7 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 				rq = hang_create_request(&h, engine);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
+					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 					break;
 				}
 
@@ -765,12 +779,16 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 			break;
 
 		err = igt_flush_test(gt->i915);
-		if (err)
+		if (err) {
+			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
 			break;
+		}
 	}
 
-	if (intel_gt_is_wedged(gt))
+	if (intel_gt_is_wedged(gt)) {
+		pr_err("GT is wedged!\n");
 		err = -EIO;
+	}
 
 	if (active)
 		hang_fini(&h);
@@ -837,6 +855,7 @@ static int active_engine(void *data)
 		ce[count] = intel_context_create(engine);
 		if (IS_ERR(ce[count])) {
 			err = PTR_ERR(ce[count]);
+			pr_err("[%s] Create context #%ld failed: %d!\n", engine->name, count, err);
 			while (--count)
 				intel_context_put(ce[count]);
 			return err;
@@ -852,6 +871,7 @@ static int active_engine(void *data)
 		new = intel_context_create_request(ce[idx]);
 		if (IS_ERR(new)) {
 			err = PTR_ERR(new);
+			pr_err("[%s] Create request #%d failed: %d!\n", engine->name, idx, err);
 			break;
 		}
 
@@ -867,8 +887,10 @@ static int active_engine(void *data)
 		}
 
 		err = active_request_put(old);
-		if (err)
+		if (err) {
+			pr_err("[%s] Request put failed: %d!\n", engine->name, err);
 			break;
+		}
 
 		cond_resched();
 	}
@@ -876,6 +898,9 @@ static int active_engine(void *data)
 	for (count = 0; count < ARRAY_SIZE(rq); count++) {
 		int err__ = active_request_put(rq[count]);
 
+		if (err)
+			pr_err("[%s] Request put #%ld failed: %d!\n", engine->name, count, err);
+
 		/* Keep the first error */
 		if (!err)
 			err = err__;
@@ -949,6 +974,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 					  "igt/%s", other->name);
 			if (IS_ERR(tsk)) {
 				err = PTR_ERR(tsk);
+				pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err);
 				goto unwind;
 			}
 
@@ -967,6 +993,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 				rq = hang_create_request(&h, engine);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
+					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 					break;
 				}
 
@@ -999,10 +1026,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			if (rq) {
 				if (rq->fence.error != -EIO) {
 					pr_err("i915_reset_engine(%s:%s):"
-					       " failed to reset request %llx:%lld\n",
+					       " failed to reset request %lld:%lld [0x%04X]\n",
 					       engine->name, test_name,
 					       rq->fence.context,
-					       rq->fence.seqno);
+					       rq->fence.seqno, rq->context->guc_id);
 					i915_request_put(rq);
 
 					GEM_TRACE_DUMP();
@@ -1101,8 +1128,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			break;
 
 		err = igt_flush_test(gt->i915);
-		if (err)
+		if (err) {
+			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
 			break;
+		}
 	}
 
 	if (intel_gt_is_wedged(gt))
@@ -1180,12 +1209,15 @@ static int igt_reset_wait(void *arg)
 	igt_global_reset_lock(gt);
 
 	err = hang_init(&h, gt);
-	if (err)
+	if (err) {
+		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
 		goto unlock;
+	}
 
 	rq = hang_create_request(&h, engine);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
+		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 		goto fini;
 	}
 
@@ -1310,12 +1342,15 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	/* Check that we can recover an unbind stuck on a hanging request */
 
 	err = hang_init(&h, gt);
-	if (err)
+	if (err) {
+		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
 		return err;
+	}
 
 	obj = i915_gem_object_create_internal(gt->i915, SZ_1M);
 	if (IS_ERR(obj)) {
 		err = PTR_ERR(obj);
+		pr_err("[%s] Create object failed: %d!\n", engine->name, err);
 		goto fini;
 	}
 
@@ -1330,12 +1365,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	arg.vma = i915_vma_instance(obj, vm, NULL);
 	if (IS_ERR(arg.vma)) {
 		err = PTR_ERR(arg.vma);
+		pr_err("[%s] VMA instance failed: %d!\n", engine->name, err);
 		goto out_obj;
 	}
 
 	rq = hang_create_request(&h, engine);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
+		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 		goto out_obj;
 	}
 
@@ -1347,6 +1384,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	err = i915_vma_pin(arg.vma, 0, 0, pin_flags);
 	if (err) {
 		i915_request_add(rq);
+		pr_err("[%s] VMA pin failed: %d!\n", engine->name, err);
 		goto out_obj;
 	}
 
@@ -1363,8 +1401,14 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	i915_vma_lock(arg.vma);
 	err = i915_request_await_object(rq, arg.vma->obj,
 					flags & EXEC_OBJECT_WRITE);
-	if (err == 0)
+	if (err == 0) {
 		err = i915_vma_move_to_active(arg.vma, rq, flags);
+		if (err)
+			pr_err("[%s] Move to active failed: %d!\n", engine->name, err);
+	} else {
+		pr_err("[%s] Request await failed: %d!\n", engine->name, err);
+	}
+
 	i915_vma_unlock(arg.vma);
 
 	if (flags & EXEC_OBJECT_NEEDS_FENCE)
@@ -1392,6 +1436,7 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
 	tsk = kthread_run(fn, &arg, "igt/evict_vma");
 	if (IS_ERR(tsk)) {
 		err = PTR_ERR(tsk);
+		pr_err("[%s] Thread spawn failed: %d!\n", engine->name, err);
 		tsk = NULL;
 		goto out_reset;
 	}
@@ -1518,6 +1563,7 @@ static int igt_reset_queue(void *arg)
 		prev = hang_create_request(&h, engine);
 		if (IS_ERR(prev)) {
 			err = PTR_ERR(prev);
+			pr_err("[%s] Create 'prev' hang request failed: %d!\n", engine->name, err);
 			goto fini;
 		}
 
@@ -1532,6 +1578,7 @@ static int igt_reset_queue(void *arg)
 			rq = hang_create_request(&h, engine);
 			if (IS_ERR(rq)) {
 				err = PTR_ERR(rq);
+				pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 				goto fini;
 			}
 
@@ -1619,8 +1666,10 @@ static int igt_reset_queue(void *arg)
 		i915_request_put(prev);
 
 		err = igt_flush_test(gt->i915);
-		if (err)
+		if (err) {
+			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
 			break;
+		}
 	}
 
 fini:
@@ -1653,12 +1702,15 @@ static int igt_handle_error(void *arg)
 		return 0;
 
 	err = hang_init(&h, gt);
-	if (err)
+	if (err) {
+		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
 		return err;
+	}
 
 	rq = hang_create_request(&h, engine);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
+		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 		goto err_fini;
 	}
 
@@ -1743,12 +1795,15 @@ static int igt_atomic_reset_engine(struct intel_engine_cs *engine,
 		return err;
 
 	err = hang_init(&h, engine->gt);
-	if (err)
+	if (err) {
+		pr_err("[%s] Hang init failed: %d!\n", engine->name, err);
 		return err;
+	}
 
 	rq = hang_create_request(&h, engine);
 	if (IS_ERR(rq)) {
 		err = PTR_ERR(rq);
+		pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
 		goto out;
 	}
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 45/51] drm/i915/selftest: Fix workarounds selftest for GuC submission
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: Rahul Kumar Singh <rahul.kumar.singh@intel.com>

When GuC submission is enabled, the GuC controls engine resets. Rather
than explicitly triggering a reset, the driver must submit a hanging
context to GuC and wait for the reset to occur.

Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 130 +++++++++++++-----
 .../i915/selftests/intel_scheduler_helpers.c  |  76 ++++++++++
 .../i915/selftests/intel_scheduler_helpers.h  |  28 ++++
 4 files changed, 201 insertions(+), 34 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 10b3bb6207ba..ab7679957623 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -280,6 +280,7 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 i915-$(CONFIG_DRM_I915_SELFTEST) += \
 	gem/selftests/i915_gem_client_blt.o \
 	gem/selftests/igt_gem_utils.o \
+	selftests/intel_scheduler_helpers.o \
 	selftests/i915_random.o \
 	selftests/i915_selftest.o \
 	selftests/igt_atomic.o \
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 7ebc4edb8ecf..7727bc531ea9 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -12,6 +12,7 @@
 #include "selftests/igt_flush_test.h"
 #include "selftests/igt_reset.h"
 #include "selftests/igt_spinner.h"
+#include "selftests/intel_scheduler_helpers.h"
 #include "selftests/mock_drm.h"
 
 #include "gem/selftests/igt_gem_utils.h"
@@ -261,28 +262,34 @@ static int do_engine_reset(struct intel_engine_cs *engine)
 	return intel_engine_reset(engine, "live_workarounds");
 }
 
+static int do_guc_reset(struct intel_engine_cs *engine)
+{
+	/* Currently a no-op as the reset is handled by GuC */
+	return 0;
+}
+
 static int
 switch_to_scratch_context(struct intel_engine_cs *engine,
-			  struct igt_spinner *spin)
+			  struct igt_spinner *spin,
+			  struct i915_request **rq)
 {
 	struct intel_context *ce;
-	struct i915_request *rq;
 	int err = 0;
 
 	ce = intel_context_create(engine);
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
-	rq = igt_spinner_create_request(spin, ce, MI_NOOP);
+	*rq = igt_spinner_create_request(spin, ce, MI_NOOP);
 	intel_context_put(ce);
 
-	if (IS_ERR(rq)) {
+	if (IS_ERR(*rq)) {
 		spin = NULL;
-		err = PTR_ERR(rq);
+		err = PTR_ERR(*rq);
 		goto err;
 	}
 
-	err = request_add_spin(rq, spin);
+	err = request_add_spin(*rq, spin);
 err:
 	if (err && spin)
 		igt_spinner_end(spin);
@@ -296,6 +303,7 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
 {
 	struct intel_context *ce, *tmp;
 	struct igt_spinner spin;
+	struct i915_request *rq;
 	intel_wakeref_t wakeref;
 	int err;
 
@@ -316,13 +324,24 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
 		goto out_spin;
 	}
 
-	err = switch_to_scratch_context(engine, &spin);
+	err = switch_to_scratch_context(engine, &spin, &rq);
 	if (err)
 		goto out_spin;
 
+	/* Ensure the spinner hasn't aborted */
+	if (i915_request_completed(rq)) {
+		pr_err("%s spinner failed to start\n", name);
+		err = -ETIMEDOUT;
+		goto out_spin;
+	}
+
 	with_intel_runtime_pm(engine->uncore->rpm, wakeref)
 		err = reset(engine);
 
+	/* Ensure the reset happens and kills the engine */
+	if (err == 0)
+		err = intel_selftest_wait_for_rq(rq);
+
 	igt_spinner_end(&spin);
 
 	if (err) {
@@ -787,9 +806,26 @@ static int live_reset_whitelist(void *arg)
 			continue;
 
 		if (intel_has_reset_engine(gt)) {
-			err = check_whitelist_across_reset(engine,
-							   do_engine_reset,
-							   "engine");
+			if (intel_engine_uses_guc(engine)) {
+				struct intel_selftest_saved_policy saved;
+				int err2;
+
+				err = intel_selftest_modify_policy(engine, &saved);
+				if(err)
+					goto out;
+
+				err = check_whitelist_across_reset(engine,
+								   do_guc_reset,
+								   "guc");
+
+				err2 = intel_selftest_restore_policy(engine, &saved);
+				if (err == 0)
+					err = err2;
+			} else
+				err = check_whitelist_across_reset(engine,
+								   do_engine_reset,
+								   "engine");
+
 			if (err)
 				goto out;
 		}
@@ -1226,31 +1262,41 @@ live_engine_reset_workarounds(void *arg)
 	reference_lists_init(gt, &lists);
 
 	for_each_engine(engine, gt, id) {
+		struct intel_selftest_saved_policy saved;
+		bool using_guc = intel_engine_uses_guc(engine);
 		bool ok;
+		int ret2;
 
 		pr_info("Verifying after %s reset...\n", engine->name);
+		ret = intel_selftest_modify_policy(engine, &saved);
+		if (ret)
+			break;
+
+
 		ce = intel_context_create(engine);
 		if (IS_ERR(ce)) {
 			ret = PTR_ERR(ce);
-			break;
+			goto restore;
 		}
 
-		ok = verify_wa_lists(gt, &lists, "before reset");
-		if (!ok) {
-			ret = -ESRCH;
-			goto err;
-		}
+		if (!using_guc) {
+			ok = verify_wa_lists(gt, &lists, "before reset");
+			if (!ok) {
+				ret = -ESRCH;
+				goto err;
+			}
 
-		ret = intel_engine_reset(engine, "live_workarounds:idle");
-		if (ret) {
-			pr_err("%s: Reset failed while idle\n", engine->name);
-			goto err;
-		}
+			ret = intel_engine_reset(engine, "live_workarounds:idle");
+			if (ret) {
+				pr_err("%s: Reset failed while idle\n", engine->name);
+				goto err;
+			}
 
-		ok = verify_wa_lists(gt, &lists, "after idle reset");
-		if (!ok) {
-			ret = -ESRCH;
-			goto err;
+			ok = verify_wa_lists(gt, &lists, "after idle reset");
+			if (!ok) {
+				ret = -ESRCH;
+				goto err;
+			}
 		}
 
 		ret = igt_spinner_init(&spin, engine->gt);
@@ -1271,25 +1317,41 @@ live_engine_reset_workarounds(void *arg)
 			goto err;
 		}
 
-		ret = intel_engine_reset(engine, "live_workarounds:active");
-		if (ret) {
-			pr_err("%s: Reset failed on an active spinner\n",
-			       engine->name);
-			igt_spinner_fini(&spin);
-			goto err;
+		/* Ensure the spinner hasn't aborted */
+		if (i915_request_completed(rq)) {
+			ret = -ETIMEDOUT;
+			goto skip;
+		}
+
+		if (!using_guc) {
+			ret = intel_engine_reset(engine, "live_workarounds:active");
+			if (ret) {
+				pr_err("%s: Reset failed on an active spinner\n",
+				       engine->name);
+				igt_spinner_fini(&spin);
+				goto err;
+			}
 		}
 
+		/* Ensure the reset happens and kills the engine */
+		if (ret == 0)
+			ret = intel_selftest_wait_for_rq(rq);
+
+skip:
 		igt_spinner_end(&spin);
 		igt_spinner_fini(&spin);
 
 		ok = verify_wa_lists(gt, &lists, "after busy reset");
-		if (!ok) {
+		if (!ok)
 			ret = -ESRCH;
-			goto err;
-		}
 
 err:
 		intel_context_put(ce);
+
+restore:
+		ret2 = intel_selftest_restore_policy(engine, &saved);
+		if (ret == 0)
+			ret = ret2;
 		if (ret)
 			break;
 	}
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
new file mode 100644
index 000000000000..91ecd8a1bd21
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
@@ -0,0 +1,76 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+//#include "gt/intel_engine_user.h"
+#include "gt/intel_gt.h"
+#include "i915_drv.h"
+#include "i915_selftest.h"
+
+#include "selftests/intel_scheduler_helpers.h"
+
+#define REDUCED_TIMESLICE	5
+#define REDUCED_PREEMPT		10
+#define WAIT_FOR_RESET_TIME	1000
+
+int intel_selftest_modify_policy(struct intel_engine_cs *engine,
+				 struct intel_selftest_saved_policy *saved)
+
+{
+	int err;
+
+	saved->reset = engine->i915->params.reset;
+	saved->flags = engine->flags;
+	saved->timeslice = engine->props.timeslice_duration_ms;
+	saved->preempt_timeout = engine->props.preempt_timeout_ms;
+
+	/*
+	 * Enable force pre-emption on time slice expiration
+	 * together with engine reset on pre-emption timeout.
+	 * This is required to make the GuC notice and reset
+	 * the single hanging context.
+	 * Also, reduce the preemption timeout to something
+	 * small to speed the test up.
+	 */
+	engine->i915->params.reset = 2;
+	engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
+	engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
+	engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
+
+	if (!intel_engine_uses_guc(engine))
+		return 0;
+
+	err = intel_guc_global_policies_update(&engine->gt->uc.guc);
+	if (err)
+		intel_selftest_restore_policy(engine, saved);
+
+	return err;
+}
+
+int intel_selftest_restore_policy(struct intel_engine_cs *engine,
+				  struct intel_selftest_saved_policy *saved)
+{
+	/* Restore the original policies */
+	engine->i915->params.reset = saved->reset;
+	engine->flags = saved->flags;
+	engine->props.timeslice_duration_ms = saved->timeslice;
+	engine->props.preempt_timeout_ms = saved->preempt_timeout;
+
+	if (!intel_engine_uses_guc(engine))
+		return 0;
+
+	return intel_guc_global_policies_update(&engine->gt->uc.guc);
+}
+
+int intel_selftest_wait_for_rq(struct i915_request *rq)
+{
+	long ret;
+
+	ret = i915_request_wait(rq, 0, WAIT_FOR_RESET_TIME);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
new file mode 100644
index 000000000000..f30e96f0ba95
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2019 Intel Corporation
+ */
+
+#ifndef _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
+#define _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
+
+#include <linux/types.h>
+
+struct i915_request;
+struct intel_engine_cs;
+
+struct intel_selftest_saved_policy
+{
+	u32 flags;
+	u32 reset;
+	u64 timeslice;
+	u64 preempt_timeout;
+};
+
+int intel_selftest_modify_policy(struct intel_engine_cs *engine,
+				 struct intel_selftest_saved_policy *saved);
+int intel_selftest_restore_policy(struct intel_engine_cs *engine,
+				  struct intel_selftest_saved_policy *saved);
+int intel_selftest_wait_for_rq( struct i915_request *rq);
+
+#endif
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 45/51] drm/i915/selftest: Fix workarounds selftest for GuC submission
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: Rahul Kumar Singh <rahul.kumar.singh@intel.com>

When GuC submission is enabled, the GuC controls engine resets. Rather
than explicitly triggering a reset, the driver must submit a hanging
context to GuC and wait for the reset to occur.

Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 .../gpu/drm/i915/gt/selftest_workarounds.c    | 130 +++++++++++++-----
 .../i915/selftests/intel_scheduler_helpers.c  |  76 ++++++++++
 .../i915/selftests/intel_scheduler_helpers.h  |  28 ++++
 4 files changed, 201 insertions(+), 34 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
 create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 10b3bb6207ba..ab7679957623 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -280,6 +280,7 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
 i915-$(CONFIG_DRM_I915_SELFTEST) += \
 	gem/selftests/i915_gem_client_blt.o \
 	gem/selftests/igt_gem_utils.o \
+	selftests/intel_scheduler_helpers.o \
 	selftests/i915_random.o \
 	selftests/i915_selftest.o \
 	selftests/igt_atomic.o \
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 7ebc4edb8ecf..7727bc531ea9 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -12,6 +12,7 @@
 #include "selftests/igt_flush_test.h"
 #include "selftests/igt_reset.h"
 #include "selftests/igt_spinner.h"
+#include "selftests/intel_scheduler_helpers.h"
 #include "selftests/mock_drm.h"
 
 #include "gem/selftests/igt_gem_utils.h"
@@ -261,28 +262,34 @@ static int do_engine_reset(struct intel_engine_cs *engine)
 	return intel_engine_reset(engine, "live_workarounds");
 }
 
+static int do_guc_reset(struct intel_engine_cs *engine)
+{
+	/* Currently a no-op as the reset is handled by GuC */
+	return 0;
+}
+
 static int
 switch_to_scratch_context(struct intel_engine_cs *engine,
-			  struct igt_spinner *spin)
+			  struct igt_spinner *spin,
+			  struct i915_request **rq)
 {
 	struct intel_context *ce;
-	struct i915_request *rq;
 	int err = 0;
 
 	ce = intel_context_create(engine);
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
-	rq = igt_spinner_create_request(spin, ce, MI_NOOP);
+	*rq = igt_spinner_create_request(spin, ce, MI_NOOP);
 	intel_context_put(ce);
 
-	if (IS_ERR(rq)) {
+	if (IS_ERR(*rq)) {
 		spin = NULL;
-		err = PTR_ERR(rq);
+		err = PTR_ERR(*rq);
 		goto err;
 	}
 
-	err = request_add_spin(rq, spin);
+	err = request_add_spin(*rq, spin);
 err:
 	if (err && spin)
 		igt_spinner_end(spin);
@@ -296,6 +303,7 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
 {
 	struct intel_context *ce, *tmp;
 	struct igt_spinner spin;
+	struct i915_request *rq;
 	intel_wakeref_t wakeref;
 	int err;
 
@@ -316,13 +324,24 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
 		goto out_spin;
 	}
 
-	err = switch_to_scratch_context(engine, &spin);
+	err = switch_to_scratch_context(engine, &spin, &rq);
 	if (err)
 		goto out_spin;
 
+	/* Ensure the spinner hasn't aborted */
+	if (i915_request_completed(rq)) {
+		pr_err("%s spinner failed to start\n", name);
+		err = -ETIMEDOUT;
+		goto out_spin;
+	}
+
 	with_intel_runtime_pm(engine->uncore->rpm, wakeref)
 		err = reset(engine);
 
+	/* Ensure the reset happens and kills the engine */
+	if (err == 0)
+		err = intel_selftest_wait_for_rq(rq);
+
 	igt_spinner_end(&spin);
 
 	if (err) {
@@ -787,9 +806,26 @@ static int live_reset_whitelist(void *arg)
 			continue;
 
 		if (intel_has_reset_engine(gt)) {
-			err = check_whitelist_across_reset(engine,
-							   do_engine_reset,
-							   "engine");
+			if (intel_engine_uses_guc(engine)) {
+				struct intel_selftest_saved_policy saved;
+				int err2;
+
+				err = intel_selftest_modify_policy(engine, &saved);
+				if(err)
+					goto out;
+
+				err = check_whitelist_across_reset(engine,
+								   do_guc_reset,
+								   "guc");
+
+				err2 = intel_selftest_restore_policy(engine, &saved);
+				if (err == 0)
+					err = err2;
+			} else
+				err = check_whitelist_across_reset(engine,
+								   do_engine_reset,
+								   "engine");
+
 			if (err)
 				goto out;
 		}
@@ -1226,31 +1262,41 @@ live_engine_reset_workarounds(void *arg)
 	reference_lists_init(gt, &lists);
 
 	for_each_engine(engine, gt, id) {
+		struct intel_selftest_saved_policy saved;
+		bool using_guc = intel_engine_uses_guc(engine);
 		bool ok;
+		int ret2;
 
 		pr_info("Verifying after %s reset...\n", engine->name);
+		ret = intel_selftest_modify_policy(engine, &saved);
+		if (ret)
+			break;
+
+
 		ce = intel_context_create(engine);
 		if (IS_ERR(ce)) {
 			ret = PTR_ERR(ce);
-			break;
+			goto restore;
 		}
 
-		ok = verify_wa_lists(gt, &lists, "before reset");
-		if (!ok) {
-			ret = -ESRCH;
-			goto err;
-		}
+		if (!using_guc) {
+			ok = verify_wa_lists(gt, &lists, "before reset");
+			if (!ok) {
+				ret = -ESRCH;
+				goto err;
+			}
 
-		ret = intel_engine_reset(engine, "live_workarounds:idle");
-		if (ret) {
-			pr_err("%s: Reset failed while idle\n", engine->name);
-			goto err;
-		}
+			ret = intel_engine_reset(engine, "live_workarounds:idle");
+			if (ret) {
+				pr_err("%s: Reset failed while idle\n", engine->name);
+				goto err;
+			}
 
-		ok = verify_wa_lists(gt, &lists, "after idle reset");
-		if (!ok) {
-			ret = -ESRCH;
-			goto err;
+			ok = verify_wa_lists(gt, &lists, "after idle reset");
+			if (!ok) {
+				ret = -ESRCH;
+				goto err;
+			}
 		}
 
 		ret = igt_spinner_init(&spin, engine->gt);
@@ -1271,25 +1317,41 @@ live_engine_reset_workarounds(void *arg)
 			goto err;
 		}
 
-		ret = intel_engine_reset(engine, "live_workarounds:active");
-		if (ret) {
-			pr_err("%s: Reset failed on an active spinner\n",
-			       engine->name);
-			igt_spinner_fini(&spin);
-			goto err;
+		/* Ensure the spinner hasn't aborted */
+		if (i915_request_completed(rq)) {
+			ret = -ETIMEDOUT;
+			goto skip;
+		}
+
+		if (!using_guc) {
+			ret = intel_engine_reset(engine, "live_workarounds:active");
+			if (ret) {
+				pr_err("%s: Reset failed on an active spinner\n",
+				       engine->name);
+				igt_spinner_fini(&spin);
+				goto err;
+			}
 		}
 
+		/* Ensure the reset happens and kills the engine */
+		if (ret == 0)
+			ret = intel_selftest_wait_for_rq(rq);
+
+skip:
 		igt_spinner_end(&spin);
 		igt_spinner_fini(&spin);
 
 		ok = verify_wa_lists(gt, &lists, "after busy reset");
-		if (!ok) {
+		if (!ok)
 			ret = -ESRCH;
-			goto err;
-		}
 
 err:
 		intel_context_put(ce);
+
+restore:
+		ret2 = intel_selftest_restore_policy(engine, &saved);
+		if (ret == 0)
+			ret = ret2;
 		if (ret)
 			break;
 	}
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
new file mode 100644
index 000000000000..91ecd8a1bd21
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
@@ -0,0 +1,76 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+//#include "gt/intel_engine_user.h"
+#include "gt/intel_gt.h"
+#include "i915_drv.h"
+#include "i915_selftest.h"
+
+#include "selftests/intel_scheduler_helpers.h"
+
+#define REDUCED_TIMESLICE	5
+#define REDUCED_PREEMPT		10
+#define WAIT_FOR_RESET_TIME	1000
+
+int intel_selftest_modify_policy(struct intel_engine_cs *engine,
+				 struct intel_selftest_saved_policy *saved)
+
+{
+	int err;
+
+	saved->reset = engine->i915->params.reset;
+	saved->flags = engine->flags;
+	saved->timeslice = engine->props.timeslice_duration_ms;
+	saved->preempt_timeout = engine->props.preempt_timeout_ms;
+
+	/*
+	 * Enable force pre-emption on time slice expiration
+	 * together with engine reset on pre-emption timeout.
+	 * This is required to make the GuC notice and reset
+	 * the single hanging context.
+	 * Also, reduce the preemption timeout to something
+	 * small to speed the test up.
+	 */
+	engine->i915->params.reset = 2;
+	engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
+	engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
+	engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
+
+	if (!intel_engine_uses_guc(engine))
+		return 0;
+
+	err = intel_guc_global_policies_update(&engine->gt->uc.guc);
+	if (err)
+		intel_selftest_restore_policy(engine, saved);
+
+	return err;
+}
+
+int intel_selftest_restore_policy(struct intel_engine_cs *engine,
+				  struct intel_selftest_saved_policy *saved)
+{
+	/* Restore the original policies */
+	engine->i915->params.reset = saved->reset;
+	engine->flags = saved->flags;
+	engine->props.timeslice_duration_ms = saved->timeslice;
+	engine->props.preempt_timeout_ms = saved->preempt_timeout;
+
+	if (!intel_engine_uses_guc(engine))
+		return 0;
+
+	return intel_guc_global_policies_update(&engine->gt->uc.guc);
+}
+
+int intel_selftest_wait_for_rq(struct i915_request *rq)
+{
+	long ret;
+
+	ret = i915_request_wait(rq, 0, WAIT_FOR_RESET_TIME);
+	if (ret < 0)
+		return ret;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
new file mode 100644
index 000000000000..f30e96f0ba95
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2019 Intel Corporation
+ */
+
+#ifndef _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
+#define _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
+
+#include <linux/types.h>
+
+struct i915_request;
+struct intel_engine_cs;
+
+struct intel_selftest_saved_policy
+{
+	u32 flags;
+	u32 reset;
+	u64 timeslice;
+	u64 preempt_timeout;
+};
+
+int intel_selftest_modify_policy(struct intel_engine_cs *engine,
+				 struct intel_selftest_saved_policy *saved);
+int intel_selftest_restore_policy(struct intel_engine_cs *engine,
+				  struct intel_selftest_saved_policy *saved);
+int intel_selftest_wait_for_rq( struct i915_request *rq);
+
+#endif
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 46/51] drm/i915/selftest: Fix MOCS selftest for GuC submission
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: Rahul Kumar Singh <rahul.kumar.singh@intel.com>

When GuC submission is enabled, the GuC controls engine resets. Rather
than explicitly triggering a reset, the driver must submit a hanging
context to GuC and wait for the reset to occur.

Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_mocs.c | 49 ++++++++++++++++++-------
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index 8763bbeca0f7..b7314739ee40 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -10,6 +10,7 @@
 #include "gem/selftests/mock_context.h"
 #include "selftests/igt_reset.h"
 #include "selftests/igt_spinner.h"
+#include "selftests/intel_scheduler_helpers.h"
 
 struct live_mocs {
 	struct drm_i915_mocs_table table;
@@ -318,7 +319,8 @@ static int live_mocs_clean(void *arg)
 }
 
 static int active_engine_reset(struct intel_context *ce,
-			       const char *reason)
+			       const char *reason,
+			       bool using_guc)
 {
 	struct igt_spinner spin;
 	struct i915_request *rq;
@@ -335,9 +337,13 @@ static int active_engine_reset(struct intel_context *ce,
 	}
 
 	err = request_add_spin(rq, &spin);
-	if (err == 0)
+	if (err == 0 && !using_guc)
 		err = intel_engine_reset(ce->engine, reason);
 
+	/* Ensure the reset happens and kills the engine */
+	if (err == 0)
+		err = intel_selftest_wait_for_rq(rq);
+
 	igt_spinner_end(&spin);
 	igt_spinner_fini(&spin);
 
@@ -345,21 +351,23 @@ static int active_engine_reset(struct intel_context *ce,
 }
 
 static int __live_mocs_reset(struct live_mocs *mocs,
-			     struct intel_context *ce)
+			     struct intel_context *ce, bool using_guc)
 {
 	struct intel_gt *gt = ce->engine->gt;
 	int err;
 
 	if (intel_has_reset_engine(gt)) {
-		err = intel_engine_reset(ce->engine, "mocs");
-		if (err)
-			return err;
-
-		err = check_mocs_engine(mocs, ce);
-		if (err)
-			return err;
+		if (!using_guc) {
+			err = intel_engine_reset(ce->engine, "mocs");
+			if (err)
+				return err;
+
+			err = check_mocs_engine(mocs, ce);
+			if (err)
+				return err;
+		}
 
-		err = active_engine_reset(ce, "mocs");
+		err = active_engine_reset(ce, "mocs", using_guc);
 		if (err)
 			return err;
 
@@ -395,19 +403,32 @@ static int live_mocs_reset(void *arg)
 
 	igt_global_reset_lock(gt);
 	for_each_engine(engine, gt, id) {
+		bool using_guc = intel_engine_uses_guc(engine);
+		struct intel_selftest_saved_policy saved;
 		struct intel_context *ce;
+		int err2;
+
+		err = intel_selftest_modify_policy(engine, &saved);
+		if (err)
+			break;
 
 		ce = mocs_context_create(engine);
 		if (IS_ERR(ce)) {
 			err = PTR_ERR(ce);
-			break;
+			goto restore;
 		}
 
 		intel_engine_pm_get(engine);
-		err = __live_mocs_reset(&mocs, ce);
-		intel_engine_pm_put(engine);
 
+		err = __live_mocs_reset(&mocs, ce, using_guc);
+
+		intel_engine_pm_put(engine);
 		intel_context_put(ce);
+
+restore:
+		err2 = intel_selftest_restore_policy(engine, &saved);
+		if (err == 0)
+			err = err2;
 		if (err)
 			break;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 46/51] drm/i915/selftest: Fix MOCS selftest for GuC submission
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: Rahul Kumar Singh <rahul.kumar.singh@intel.com>

When GuC submission is enabled, the GuC controls engine resets. Rather
than explicitly triggering a reset, the driver must submit a hanging
context to GuC and wait for the reset to occur.

Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_mocs.c | 49 ++++++++++++++++++-------
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index 8763bbeca0f7..b7314739ee40 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -10,6 +10,7 @@
 #include "gem/selftests/mock_context.h"
 #include "selftests/igt_reset.h"
 #include "selftests/igt_spinner.h"
+#include "selftests/intel_scheduler_helpers.h"
 
 struct live_mocs {
 	struct drm_i915_mocs_table table;
@@ -318,7 +319,8 @@ static int live_mocs_clean(void *arg)
 }
 
 static int active_engine_reset(struct intel_context *ce,
-			       const char *reason)
+			       const char *reason,
+			       bool using_guc)
 {
 	struct igt_spinner spin;
 	struct i915_request *rq;
@@ -335,9 +337,13 @@ static int active_engine_reset(struct intel_context *ce,
 	}
 
 	err = request_add_spin(rq, &spin);
-	if (err == 0)
+	if (err == 0 && !using_guc)
 		err = intel_engine_reset(ce->engine, reason);
 
+	/* Ensure the reset happens and kills the engine */
+	if (err == 0)
+		err = intel_selftest_wait_for_rq(rq);
+
 	igt_spinner_end(&spin);
 	igt_spinner_fini(&spin);
 
@@ -345,21 +351,23 @@ static int active_engine_reset(struct intel_context *ce,
 }
 
 static int __live_mocs_reset(struct live_mocs *mocs,
-			     struct intel_context *ce)
+			     struct intel_context *ce, bool using_guc)
 {
 	struct intel_gt *gt = ce->engine->gt;
 	int err;
 
 	if (intel_has_reset_engine(gt)) {
-		err = intel_engine_reset(ce->engine, "mocs");
-		if (err)
-			return err;
-
-		err = check_mocs_engine(mocs, ce);
-		if (err)
-			return err;
+		if (!using_guc) {
+			err = intel_engine_reset(ce->engine, "mocs");
+			if (err)
+				return err;
+
+			err = check_mocs_engine(mocs, ce);
+			if (err)
+				return err;
+		}
 
-		err = active_engine_reset(ce, "mocs");
+		err = active_engine_reset(ce, "mocs", using_guc);
 		if (err)
 			return err;
 
@@ -395,19 +403,32 @@ static int live_mocs_reset(void *arg)
 
 	igt_global_reset_lock(gt);
 	for_each_engine(engine, gt, id) {
+		bool using_guc = intel_engine_uses_guc(engine);
+		struct intel_selftest_saved_policy saved;
 		struct intel_context *ce;
+		int err2;
+
+		err = intel_selftest_modify_policy(engine, &saved);
+		if (err)
+			break;
 
 		ce = mocs_context_create(engine);
 		if (IS_ERR(ce)) {
 			err = PTR_ERR(ce);
-			break;
+			goto restore;
 		}
 
 		intel_engine_pm_get(engine);
-		err = __live_mocs_reset(&mocs, ce);
-		intel_engine_pm_put(engine);
 
+		err = __live_mocs_reset(&mocs, ce, using_guc);
+
+		intel_engine_pm_put(engine);
 		intel_context_put(ce);
+
+restore:
+		err2 = intel_selftest_restore_policy(engine, &saved);
+		if (err == 0)
+			err = err2;
 		if (err)
 			break;
 	}
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Requests may take slightly longer with GuC submission, let's increase
the timeouts in live_requests.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index bd5c96a77ba3..d67710d10615 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg)
 		i915_request_add(rq);
 
 		err = 0;
-		if (i915_request_wait(rq, 0, HZ / 5) < 0)
+		if (i915_request_wait(rq, 0, HZ) < 0)
 			err = -ETIME;
 		i915_request_put(rq);
 		if (err)
@@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg)
 	}
 	igt_spinner_end(&spin);
 
-	if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0)
+	if (err == 0 && i915_request_wait(rq, 0, HZ) < 0)
 		err = -EIO;
 	i915_request_put(rq);
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Requests may take slightly longer with GuC submission, let's increase
the timeouts in live_requests.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index bd5c96a77ba3..d67710d10615 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg)
 		i915_request_add(rq);
 
 		err = 0;
-		if (i915_request_wait(rq, 0, HZ / 5) < 0)
+		if (i915_request_wait(rq, 0, HZ) < 0)
 			err = -ETIME;
 		i915_request_put(rq);
 		if (err)
@@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg)
 	}
 	igt_spinner_end(&spin);
 
-	if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0)
+	if (err == 0 && i915_request_wait(rq, 0, HZ) < 0)
 		err = -EIO;
 	i915_request_put(rq);
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 48/51] drm/i915/selftest: Fix hangcheck self test for GuC submission
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

When GuC submission is enabled, the GuC controls engine resets. Rather
than explicitly triggering a reset, the driver must submit a hanging
context to GuC and wait for the reset to occur.

Conversely, one of the tests specifically sends hanging batches to the
engines but wants them to sit around until a manual reset of the full
GT (including GuC itself). That means disabling GuC based engine
resets to prevent those from killing the hanging batch too soon. So,
add support to the scheduling policy helper for disabling resets as
well as making them quicker!

In GuC submission mode, the 'is engine idle' test basically turns into
'is engine PM wakelock held'. Independently, there is a heartbeat
disable helper function that the tests use. For unexplained reasons,
this acquires the engine wakelock before disabling the heartbeat and
only releases it when re-enabling the heartbeat. As one of the tests
tries to do a wait for idle in the middle of a heartbeat disabled
section, it is therefore guaranteed to always fail. Added a 'no_pm'
variant of the heartbeat helper that allows the engine to be asleep
while also having heartbeats disabled.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   1 +
 .../drm/i915/gt/selftest_engine_heartbeat.c   |  22 ++
 .../drm/i915/gt/selftest_engine_heartbeat.h   |   2 +
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 223 +++++++++++++-----
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |   3 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |   6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   3 +
 .../i915/selftests/intel_scheduler_helpers.c  |  39 ++-
 .../i915/selftests/intel_scheduler_helpers.h  |   9 +-
 9 files changed, 237 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index d66b732a91c2..eec57e57403f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -449,6 +449,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_IS_VIRTUAL       BIT(5)
 #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
 #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
+#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
 	unsigned int flags;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index 4896e4ccad50..317eebf086c3 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -405,3 +405,25 @@ void st_engine_heartbeat_enable(struct intel_engine_cs *engine)
 	engine->props.heartbeat_interval_ms =
 		engine->defaults.heartbeat_interval_ms;
 }
+
+void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine)
+{
+	engine->props.heartbeat_interval_ms = 0;
+
+	/*
+	 * Park the heartbeat but without holding the PM lock as that
+	 * makes the engines appear not-idle. Note that if/when unpark
+	 * is called due to the PM lock being acquired later the
+	 * heartbeat still won't be enabled because of the above = 0.
+	 */
+	if (intel_engine_pm_get_if_awake(engine)) {
+		intel_engine_park_heartbeat(engine);
+		intel_engine_pm_put(engine);
+	}
+}
+
+void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine)
+{
+	engine->props.heartbeat_interval_ms =
+		engine->defaults.heartbeat_interval_ms;
+}
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
index cd27113d5400..81da2cd8e406 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
@@ -9,6 +9,8 @@
 struct intel_engine_cs;
 
 void st_engine_heartbeat_disable(struct intel_engine_cs *engine);
+void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine);
 void st_engine_heartbeat_enable(struct intel_engine_cs *engine);
+void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine);
 
 #endif /* SELFTEST_ENGINE_HEARTBEAT_H */
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 0ed87cc4d063..971c0c249eb0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -17,6 +17,8 @@
 #include "selftests/igt_flush_test.h"
 #include "selftests/igt_reset.h"
 #include "selftests/igt_atomic.h"
+#include "selftests/igt_spinner.h"
+#include "selftests/intel_scheduler_helpers.h"
 
 #include "selftests/mock_drm.h"
 
@@ -449,6 +451,14 @@ static int igt_reset_nop_engine(void *arg)
 		IGT_TIMEOUT(end_time);
 		int err;
 
+		if (intel_engine_uses_guc(engine)) {
+			/* Engine level resets are triggered by GuC when a hang
+			 * is detected. They can't be triggered by the KMD any
+			 * more. Thus a nop batch cannot be used as a reset test
+			 */
+			continue;
+		}
+
 		ce = intel_context_create(engine);
 		if (IS_ERR(ce)) {
 			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
@@ -560,6 +570,10 @@ static int igt_reset_fail_engine(void *arg)
 		IGT_TIMEOUT(end_time);
 		int err;
 
+		/* Can't manually break the reset if i915 doesn't perform it */
+		if (intel_engine_uses_guc(engine))
+			continue;
+
 		ce = intel_context_create(engine);
 		if (IS_ERR(ce)) {
 			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
@@ -699,8 +713,12 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 	for_each_engine(engine, gt, id) {
 		unsigned int reset_count, reset_engine_count;
 		unsigned long count;
+		bool using_guc = intel_engine_uses_guc(engine);
 		IGT_TIMEOUT(end_time);
 
+		if (using_guc && !active)
+			continue;
+
 		if (active && !intel_engine_can_store_dword(engine))
 			continue;
 
@@ -718,14 +736,23 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
 		count = 0;
 		do {
-			if (active) {
-				struct i915_request *rq;
+			struct i915_request *rq = NULL;
+			struct intel_selftest_saved_policy saved;
+			int err2;
+
+			err = intel_selftest_modify_policy(engine, &saved,
+							   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
+			if (err) {
+				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
+				break;
+			}
 
+			if (active) {
 				rq = hang_create_request(&h, engine);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
 					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
-					break;
+					goto restore;
 				}
 
 				i915_request_get(rq);
@@ -741,34 +768,58 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 
 					i915_request_put(rq);
 					err = -EIO;
-					break;
+					goto restore;
 				}
+			}
 
-				i915_request_put(rq);
+			if (!using_guc) {
+				err = intel_engine_reset(engine, NULL);
+				if (err) {
+					pr_err("intel_engine_reset(%s) failed, err:%d\n",
+					       engine->name, err);
+					goto skip;
+				}
 			}
 
-			err = intel_engine_reset(engine, NULL);
-			if (err) {
-				pr_err("intel_engine_reset(%s) failed, err:%d\n",
-				       engine->name, err);
-				break;
+			if (rq) {
+				/* Ensure the reset happens and kills the engine */
+				err = intel_selftest_wait_for_rq(rq);
+				if (err)
+					pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n",
+					       engine->name, rq->fence.context, rq->fence.seqno, rq->context->guc_id, err);
 			}
 
+skip:
+			if (rq)
+				i915_request_put(rq);
+
 			if (i915_reset_count(global) != reset_count) {
 				pr_err("Full GPU reset recorded! (engine reset expected)\n");
 				err = -EINVAL;
-				break;
+				goto restore;
 			}
 
-			if (i915_reset_engine_count(global, engine) !=
-			    ++reset_engine_count) {
-				pr_err("%s engine reset not recorded!\n",
-				       engine->name);
-				err = -EINVAL;
-				break;
+			/* GuC based resets are not logged per engine */
+			if (!using_guc) {
+				if (i915_reset_engine_count(global, engine) !=
+				    ++reset_engine_count) {
+					pr_err("%s engine reset not recorded!\n",
+					       engine->name);
+					err = -EINVAL;
+					goto restore;
+				}
 			}
 
 			count++;
+
+restore:
+			err2 = intel_selftest_restore_policy(engine, &saved);
+			if (err2)
+				pr_err("[%s] Restore policy failed: %d!\n", engine->name, err);
+			if (err == 0)
+				err = err2;
+			if (err)
+				break;
 		} while (time_before(jiffies, end_time));
 		clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
 		st_engine_heartbeat_enable(engine);
@@ -941,10 +992,13 @@ static int __igt_reset_engines(struct intel_gt *gt,
 		struct active_engine threads[I915_NUM_ENGINES] = {};
 		unsigned long device = i915_reset_count(global);
 		unsigned long count = 0, reported;
+		bool using_guc = intel_engine_uses_guc(engine);
 		IGT_TIMEOUT(end_time);
 
-		if (flags & TEST_ACTIVE &&
-		    !intel_engine_can_store_dword(engine))
+		if (flags & TEST_ACTIVE) {
+			if (!intel_engine_can_store_dword(engine))
+				continue;
+		} else if (using_guc)
 			continue;
 
 		if (!wait_for_idle(engine)) {
@@ -984,17 +1038,26 @@ static int __igt_reset_engines(struct intel_gt *gt,
 
 		yield(); /* start all threads before we begin */
 
-		st_engine_heartbeat_disable(engine);
+		st_engine_heartbeat_disable_no_pm(engine);
 		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
 		do {
 			struct i915_request *rq = NULL;
+			struct intel_selftest_saved_policy saved;
+			int err2;
+
+			err = intel_selftest_modify_policy(engine, &saved,
+							  SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
+			if (err) {
+				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
+				break;
+			}
 
 			if (flags & TEST_ACTIVE) {
 				rq = hang_create_request(&h, engine);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
 					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
-					break;
+					goto restore;
 				}
 
 				i915_request_get(rq);
@@ -1010,15 +1073,27 @@ static int __igt_reset_engines(struct intel_gt *gt,
 
 					i915_request_put(rq);
 					err = -EIO;
-					break;
+					goto restore;
 				}
+			} else {
+				intel_engine_pm_get(engine);
 			}
 
-			err = intel_engine_reset(engine, NULL);
-			if (err) {
-				pr_err("i915_reset_engine(%s:%s): failed, err=%d\n",
-				       engine->name, test_name, err);
-				break;
+			if (!using_guc) {
+				err = intel_engine_reset(engine, NULL);
+				if (err) {
+					pr_err("i915_reset_engine(%s:%s): failed, err=%d\n",
+					       engine->name, test_name, err);
+					goto restore;
+				}
+			}
+
+			if (rq) {
+				/* Ensure the reset happens and kills the engine */
+				err = intel_selftest_wait_for_rq(rq);
+				if (err)
+					pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n",
+					       engine->name, rq->fence.context, rq->fence.seqno, rq->context->guc_id, err);
 			}
 
 			count++;
@@ -1035,7 +1110,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 					GEM_TRACE_DUMP();
 					intel_gt_set_wedged(gt);
 					err = -EIO;
-					break;
+					goto restore;
 				}
 
 				if (i915_request_wait(rq, 0, HZ / 5) < 0) {
@@ -1054,12 +1129,15 @@ static int __igt_reset_engines(struct intel_gt *gt,
 					GEM_TRACE_DUMP();
 					intel_gt_set_wedged(gt);
 					err = -EIO;
-					break;
+					goto restore;
 				}
 
 				i915_request_put(rq);
 			}
 
+			if (!(flags & TEST_ACTIVE))
+				intel_engine_pm_put(engine);
+
 			if (!(flags & TEST_SELF) && !wait_for_idle(engine)) {
 				struct drm_printer p =
 					drm_info_printer(gt->i915->drm.dev);
@@ -1071,22 +1149,34 @@ static int __igt_reset_engines(struct intel_gt *gt,
 						  "%s\n", engine->name);
 
 				err = -EIO;
-				break;
+				goto restore;
 			}
+
+restore:
+			err2 = intel_selftest_restore_policy(engine, &saved);
+			if (err2)
+				pr_err("[%s] Restore policy failed: %d!\n", engine->name, err2);
+			if (err == 0)
+				err = err2;
+			if (err)
+				break;
 		} while (time_before(jiffies, end_time));
 		clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
-		st_engine_heartbeat_enable(engine);
+		st_engine_heartbeat_enable_no_pm(engine);
 
 		pr_info("i915_reset_engine(%s:%s): %lu resets\n",
 			engine->name, test_name, count);
 
-		reported = i915_reset_engine_count(global, engine);
-		reported -= threads[engine->id].resets;
-		if (reported != count) {
-			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
-			       engine->name, test_name, count, reported);
-			if (!err)
-				err = -EINVAL;
+		/* GuC based resets are not logged per engine */
+		if (!using_guc) {
+			reported = i915_reset_engine_count(global, engine);
+			reported -= threads[engine->id].resets;
+			if (reported != count) {
+				pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
+				       engine->name, test_name, count, reported);
+				if (!err)
+					err = -EINVAL;
+			}
 		}
 
 unwind:
@@ -1105,15 +1195,18 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			}
 			put_task_struct(threads[tmp].task);
 
-			if (other->uabi_class != engine->uabi_class &&
-			    threads[tmp].resets !=
-			    i915_reset_engine_count(global, other)) {
-				pr_err("Innocent engine %s was reset (count=%ld)\n",
-				       other->name,
-				       i915_reset_engine_count(global, other) -
-				       threads[tmp].resets);
-				if (!err)
-					err = -EINVAL;
+			/* GuC based resets are not logged per engine */
+			if (!using_guc) {
+				if (other->uabi_class != engine->uabi_class &&
+				    threads[tmp].resets !=
+				    i915_reset_engine_count(global, other)) {
+					pr_err("Innocent engine %s was reset (count=%ld)\n",
+					       other->name,
+					       i915_reset_engine_count(global, other) -
+					       threads[tmp].resets);
+					if (!err)
+						err = -EINVAL;
+				}
 			}
 		}
 
@@ -1553,18 +1646,29 @@ static int igt_reset_queue(void *arg)
 		goto unlock;
 
 	for_each_engine(engine, gt, id) {
+		struct intel_selftest_saved_policy saved;
 		struct i915_request *prev;
 		IGT_TIMEOUT(end_time);
 		unsigned int count;
+		bool using_guc = intel_engine_uses_guc(engine);
 
 		if (!intel_engine_can_store_dword(engine))
 			continue;
 
+		if (using_guc) {
+			err = intel_selftest_modify_policy(engine, &saved,
+							  SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK);
+			if (err) {
+				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
+				goto fini;
+			}
+		}
+
 		prev = hang_create_request(&h, engine);
 		if (IS_ERR(prev)) {
 			err = PTR_ERR(prev);
 			pr_err("[%s] Create 'prev' hang request failed: %d!\n", engine->name, err);
-			goto fini;
+			goto restore;
 		}
 
 		i915_request_get(prev);
@@ -1579,7 +1683,7 @@ static int igt_reset_queue(void *arg)
 			if (IS_ERR(rq)) {
 				err = PTR_ERR(rq);
 				pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
-				goto fini;
+				goto restore;
 			}
 
 			i915_request_get(rq);
@@ -1604,7 +1708,7 @@ static int igt_reset_queue(void *arg)
 
 				GEM_TRACE_DUMP();
 				intel_gt_set_wedged(gt);
-				goto fini;
+				goto restore;
 			}
 
 			if (!wait_until_running(&h, prev)) {
@@ -1622,7 +1726,7 @@ static int igt_reset_queue(void *arg)
 				intel_gt_set_wedged(gt);
 
 				err = -EIO;
-				goto fini;
+				goto restore;
 			}
 
 			reset_count = fake_hangcheck(gt, BIT(id));
@@ -1633,7 +1737,7 @@ static int igt_reset_queue(void *arg)
 				i915_request_put(rq);
 				i915_request_put(prev);
 				err = -EINVAL;
-				goto fini;
+				goto restore;
 			}
 
 			if (rq->fence.error) {
@@ -1642,7 +1746,7 @@ static int igt_reset_queue(void *arg)
 				i915_request_put(rq);
 				i915_request_put(prev);
 				err = -EINVAL;
-				goto fini;
+				goto restore;
 			}
 
 			if (i915_reset_count(global) == reset_count) {
@@ -1650,7 +1754,7 @@ static int igt_reset_queue(void *arg)
 				i915_request_put(rq);
 				i915_request_put(prev);
 				err = -EINVAL;
-				goto fini;
+				goto restore;
 			}
 
 			i915_request_put(prev);
@@ -1665,6 +1769,17 @@ static int igt_reset_queue(void *arg)
 
 		i915_request_put(prev);
 
+restore:
+		if (using_guc) {
+			int err2 = intel_selftest_restore_policy(engine, &saved);
+			if (err2)
+				pr_err("%s:%d> [%s] Restore policy failed: %d!\n", __func__, __LINE__, engine->name, err2);
+			if (err == 0)
+				err = err2;
+		}
+		if (err)
+			goto fini;
+
 		err = igt_flush_test(gt->i915);
 		if (err) {
 			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index b7314739ee40..13d25bf2a94a 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -408,7 +408,8 @@ static int live_mocs_reset(void *arg)
 		struct intel_context *ce;
 		int err2;
 
-		err = intel_selftest_modify_policy(engine, &saved);
+		err = intel_selftest_modify_policy(engine, &saved,
+						   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
 		if (err)
 			break;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 7727bc531ea9..d820f0b41634 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -810,7 +810,8 @@ static int live_reset_whitelist(void *arg)
 				struct intel_selftest_saved_policy saved;
 				int err2;
 
-				err = intel_selftest_modify_policy(engine, &saved);
+				err = intel_selftest_modify_policy(engine, &saved,
+								   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
 				if(err)
 					goto out;
 
@@ -1268,7 +1269,8 @@ live_engine_reset_workarounds(void *arg)
 		int ret2;
 
 		pr_info("Verifying after %s reset...\n", engine->name);
-		ret = intel_selftest_modify_policy(engine, &saved);
+		ret = intel_selftest_modify_policy(engine, &saved,
+						   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
 		if (ret)
 			break;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 1c30d04733ff..536fdbc406c6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1235,6 +1235,9 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 {
 	desc->policy_flags = 0;
 
+	if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
+		desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
+
 	/* NB: For both of these, zero means disabled. */
 	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
 	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
index 91ecd8a1bd21..69db139f9e0d 100644
--- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
@@ -16,7 +16,8 @@
 #define WAIT_FOR_RESET_TIME	1000
 
 int intel_selftest_modify_policy(struct intel_engine_cs *engine,
-				 struct intel_selftest_saved_policy *saved)
+				 struct intel_selftest_saved_policy *saved,
+				 u32 modify_type)
 
 {
 	int err;
@@ -26,18 +27,30 @@ int intel_selftest_modify_policy(struct intel_engine_cs *engine,
 	saved->timeslice = engine->props.timeslice_duration_ms;
 	saved->preempt_timeout = engine->props.preempt_timeout_ms;
 
-	/*
-	 * Enable force pre-emption on time slice expiration
-	 * together with engine reset on pre-emption timeout.
-	 * This is required to make the GuC notice and reset
-	 * the single hanging context.
-	 * Also, reduce the preemption timeout to something
-	 * small to speed the test up.
-	 */
-	engine->i915->params.reset = 2;
-	engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
-	engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
-	engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
+	switch (modify_type) {
+	case SELFTEST_SCHEDULER_MODIFY_FAST_RESET:
+		/*
+		 * Enable force pre-emption on time slice expiration
+		 * together with engine reset on pre-emption timeout.
+		 * This is required to make the GuC notice and reset
+		 * the single hanging context.
+		 * Also, reduce the preemption timeout to something
+		 * small to speed the test up.
+		 */
+		engine->i915->params.reset = 2;
+		engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
+		engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
+		engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
+		break;
+
+	case SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK:
+		engine->props.preempt_timeout_ms = 0;
+		break;
+
+	default:
+		pr_err("Invalid scheduler policy modification type: %d!\n", modify_type);
+		return -EINVAL;
+	}
 
 	if (!intel_engine_uses_guc(engine))
 		return 0;
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
index f30e96f0ba95..050bc5a8ba8b 100644
--- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
@@ -19,8 +19,15 @@ struct intel_selftest_saved_policy
 	u64 preempt_timeout;
 };
 
+enum selftest_scheduler_modify
+{
+	SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK = 0,
+	SELFTEST_SCHEDULER_MODIFY_FAST_RESET,
+};
+
 int intel_selftest_modify_policy(struct intel_engine_cs *engine,
-				 struct intel_selftest_saved_policy *saved);
+				 struct intel_selftest_saved_policy *saved,
+				 enum selftest_scheduler_modify modify_type);
 int intel_selftest_restore_policy(struct intel_engine_cs *engine,
 				  struct intel_selftest_saved_policy *saved);
 int intel_selftest_wait_for_rq( struct i915_request *rq);
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 48/51] drm/i915/selftest: Fix hangcheck self test for GuC submission
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

When GuC submission is enabled, the GuC controls engine resets. Rather
than explicitly triggering a reset, the driver must submit a hanging
context to GuC and wait for the reset to occur.

Conversely, one of the tests specifically sends hanging batches to the
engines but wants them to sit around until a manual reset of the full
GT (including GuC itself). That means disabling GuC based engine
resets to prevent those from killing the hanging batch too soon. So,
add support to the scheduling policy helper for disabling resets as
well as making them quicker!

In GuC submission mode, the 'is engine idle' test basically turns into
'is engine PM wakelock held'. Independently, there is a heartbeat
disable helper function that the tests use. For unexplained reasons,
this acquires the engine wakelock before disabling the heartbeat and
only releases it when re-enabling the heartbeat. As one of the tests
tries to do a wait for idle in the middle of a heartbeat disabled
section, it is therefore guaranteed to always fail. Added a 'no_pm'
variant of the heartbeat helper that allows the engine to be asleep
while also having heartbeats disabled.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   1 +
 .../drm/i915/gt/selftest_engine_heartbeat.c   |  22 ++
 .../drm/i915/gt/selftest_engine_heartbeat.h   |   2 +
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 223 +++++++++++++-----
 drivers/gpu/drm/i915/gt/selftest_mocs.c       |   3 +-
 .../gpu/drm/i915/gt/selftest_workarounds.c    |   6 +-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   3 +
 .../i915/selftests/intel_scheduler_helpers.c  |  39 ++-
 .../i915/selftests/intel_scheduler_helpers.h  |   9 +-
 9 files changed, 237 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index d66b732a91c2..eec57e57403f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -449,6 +449,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_IS_VIRTUAL       BIT(5)
 #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
 #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
+#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
 	unsigned int flags;
 
 	/*
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index 4896e4ccad50..317eebf086c3 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -405,3 +405,25 @@ void st_engine_heartbeat_enable(struct intel_engine_cs *engine)
 	engine->props.heartbeat_interval_ms =
 		engine->defaults.heartbeat_interval_ms;
 }
+
+void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine)
+{
+	engine->props.heartbeat_interval_ms = 0;
+
+	/*
+	 * Park the heartbeat but without holding the PM lock as that
+	 * makes the engines appear not-idle. Note that if/when unpark
+	 * is called due to the PM lock being acquired later the
+	 * heartbeat still won't be enabled because of the above = 0.
+	 */
+	if (intel_engine_pm_get_if_awake(engine)) {
+		intel_engine_park_heartbeat(engine);
+		intel_engine_pm_put(engine);
+	}
+}
+
+void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine)
+{
+	engine->props.heartbeat_interval_ms =
+		engine->defaults.heartbeat_interval_ms;
+}
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
index cd27113d5400..81da2cd8e406 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
@@ -9,6 +9,8 @@
 struct intel_engine_cs;
 
 void st_engine_heartbeat_disable(struct intel_engine_cs *engine);
+void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine);
 void st_engine_heartbeat_enable(struct intel_engine_cs *engine);
+void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine);
 
 #endif /* SELFTEST_ENGINE_HEARTBEAT_H */
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 0ed87cc4d063..971c0c249eb0 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -17,6 +17,8 @@
 #include "selftests/igt_flush_test.h"
 #include "selftests/igt_reset.h"
 #include "selftests/igt_atomic.h"
+#include "selftests/igt_spinner.h"
+#include "selftests/intel_scheduler_helpers.h"
 
 #include "selftests/mock_drm.h"
 
@@ -449,6 +451,14 @@ static int igt_reset_nop_engine(void *arg)
 		IGT_TIMEOUT(end_time);
 		int err;
 
+		if (intel_engine_uses_guc(engine)) {
+			/* Engine level resets are triggered by GuC when a hang
+			 * is detected. They can't be triggered by the KMD any
+			 * more. Thus a nop batch cannot be used as a reset test
+			 */
+			continue;
+		}
+
 		ce = intel_context_create(engine);
 		if (IS_ERR(ce)) {
 			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
@@ -560,6 +570,10 @@ static int igt_reset_fail_engine(void *arg)
 		IGT_TIMEOUT(end_time);
 		int err;
 
+		/* Can't manually break the reset if i915 doesn't perform it */
+		if (intel_engine_uses_guc(engine))
+			continue;
+
 		ce = intel_context_create(engine);
 		if (IS_ERR(ce)) {
 			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
@@ -699,8 +713,12 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 	for_each_engine(engine, gt, id) {
 		unsigned int reset_count, reset_engine_count;
 		unsigned long count;
+		bool using_guc = intel_engine_uses_guc(engine);
 		IGT_TIMEOUT(end_time);
 
+		if (using_guc && !active)
+			continue;
+
 		if (active && !intel_engine_can_store_dword(engine))
 			continue;
 
@@ -718,14 +736,23 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
 		count = 0;
 		do {
-			if (active) {
-				struct i915_request *rq;
+			struct i915_request *rq = NULL;
+			struct intel_selftest_saved_policy saved;
+			int err2;
+
+			err = intel_selftest_modify_policy(engine, &saved,
+							   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
+			if (err) {
+				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
+				break;
+			}
 
+			if (active) {
 				rq = hang_create_request(&h, engine);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
 					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
-					break;
+					goto restore;
 				}
 
 				i915_request_get(rq);
@@ -741,34 +768,58 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
 
 					i915_request_put(rq);
 					err = -EIO;
-					break;
+					goto restore;
 				}
+			}
 
-				i915_request_put(rq);
+			if (!using_guc) {
+				err = intel_engine_reset(engine, NULL);
+				if (err) {
+					pr_err("intel_engine_reset(%s) failed, err:%d\n",
+					       engine->name, err);
+					goto skip;
+				}
 			}
 
-			err = intel_engine_reset(engine, NULL);
-			if (err) {
-				pr_err("intel_engine_reset(%s) failed, err:%d\n",
-				       engine->name, err);
-				break;
+			if (rq) {
+				/* Ensure the reset happens and kills the engine */
+				err = intel_selftest_wait_for_rq(rq);
+				if (err)
+					pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n",
+					       engine->name, rq->fence.context, rq->fence.seqno, rq->context->guc_id, err);
 			}
 
+skip:
+			if (rq)
+				i915_request_put(rq);
+
 			if (i915_reset_count(global) != reset_count) {
 				pr_err("Full GPU reset recorded! (engine reset expected)\n");
 				err = -EINVAL;
-				break;
+				goto restore;
 			}
 
-			if (i915_reset_engine_count(global, engine) !=
-			    ++reset_engine_count) {
-				pr_err("%s engine reset not recorded!\n",
-				       engine->name);
-				err = -EINVAL;
-				break;
+			/* GuC based resets are not logged per engine */
+			if (!using_guc) {
+				if (i915_reset_engine_count(global, engine) !=
+				    ++reset_engine_count) {
+					pr_err("%s engine reset not recorded!\n",
+					       engine->name);
+					err = -EINVAL;
+					goto restore;
+				}
 			}
 
 			count++;
+
+restore:
+			err2 = intel_selftest_restore_policy(engine, &saved);
+			if (err2)
+				pr_err("[%s] Restore policy failed: %d!\n", engine->name, err);
+			if (err == 0)
+				err = err2;
+			if (err)
+				break;
 		} while (time_before(jiffies, end_time));
 		clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
 		st_engine_heartbeat_enable(engine);
@@ -941,10 +992,13 @@ static int __igt_reset_engines(struct intel_gt *gt,
 		struct active_engine threads[I915_NUM_ENGINES] = {};
 		unsigned long device = i915_reset_count(global);
 		unsigned long count = 0, reported;
+		bool using_guc = intel_engine_uses_guc(engine);
 		IGT_TIMEOUT(end_time);
 
-		if (flags & TEST_ACTIVE &&
-		    !intel_engine_can_store_dword(engine))
+		if (flags & TEST_ACTIVE) {
+			if (!intel_engine_can_store_dword(engine))
+				continue;
+		} else if (using_guc)
 			continue;
 
 		if (!wait_for_idle(engine)) {
@@ -984,17 +1038,26 @@ static int __igt_reset_engines(struct intel_gt *gt,
 
 		yield(); /* start all threads before we begin */
 
-		st_engine_heartbeat_disable(engine);
+		st_engine_heartbeat_disable_no_pm(engine);
 		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
 		do {
 			struct i915_request *rq = NULL;
+			struct intel_selftest_saved_policy saved;
+			int err2;
+
+			err = intel_selftest_modify_policy(engine, &saved,
+							  SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
+			if (err) {
+				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
+				break;
+			}
 
 			if (flags & TEST_ACTIVE) {
 				rq = hang_create_request(&h, engine);
 				if (IS_ERR(rq)) {
 					err = PTR_ERR(rq);
 					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
-					break;
+					goto restore;
 				}
 
 				i915_request_get(rq);
@@ -1010,15 +1073,27 @@ static int __igt_reset_engines(struct intel_gt *gt,
 
 					i915_request_put(rq);
 					err = -EIO;
-					break;
+					goto restore;
 				}
+			} else {
+				intel_engine_pm_get(engine);
 			}
 
-			err = intel_engine_reset(engine, NULL);
-			if (err) {
-				pr_err("i915_reset_engine(%s:%s): failed, err=%d\n",
-				       engine->name, test_name, err);
-				break;
+			if (!using_guc) {
+				err = intel_engine_reset(engine, NULL);
+				if (err) {
+					pr_err("i915_reset_engine(%s:%s): failed, err=%d\n",
+					       engine->name, test_name, err);
+					goto restore;
+				}
+			}
+
+			if (rq) {
+				/* Ensure the reset happens and kills the engine */
+				err = intel_selftest_wait_for_rq(rq);
+				if (err)
+					pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n",
+					       engine->name, rq->fence.context, rq->fence.seqno, rq->context->guc_id, err);
 			}
 
 			count++;
@@ -1035,7 +1110,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
 					GEM_TRACE_DUMP();
 					intel_gt_set_wedged(gt);
 					err = -EIO;
-					break;
+					goto restore;
 				}
 
 				if (i915_request_wait(rq, 0, HZ / 5) < 0) {
@@ -1054,12 +1129,15 @@ static int __igt_reset_engines(struct intel_gt *gt,
 					GEM_TRACE_DUMP();
 					intel_gt_set_wedged(gt);
 					err = -EIO;
-					break;
+					goto restore;
 				}
 
 				i915_request_put(rq);
 			}
 
+			if (!(flags & TEST_ACTIVE))
+				intel_engine_pm_put(engine);
+
 			if (!(flags & TEST_SELF) && !wait_for_idle(engine)) {
 				struct drm_printer p =
 					drm_info_printer(gt->i915->drm.dev);
@@ -1071,22 +1149,34 @@ static int __igt_reset_engines(struct intel_gt *gt,
 						  "%s\n", engine->name);
 
 				err = -EIO;
-				break;
+				goto restore;
 			}
+
+restore:
+			err2 = intel_selftest_restore_policy(engine, &saved);
+			if (err2)
+				pr_err("[%s] Restore policy failed: %d!\n", engine->name, err2);
+			if (err == 0)
+				err = err2;
+			if (err)
+				break;
 		} while (time_before(jiffies, end_time));
 		clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
-		st_engine_heartbeat_enable(engine);
+		st_engine_heartbeat_enable_no_pm(engine);
 
 		pr_info("i915_reset_engine(%s:%s): %lu resets\n",
 			engine->name, test_name, count);
 
-		reported = i915_reset_engine_count(global, engine);
-		reported -= threads[engine->id].resets;
-		if (reported != count) {
-			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
-			       engine->name, test_name, count, reported);
-			if (!err)
-				err = -EINVAL;
+		/* GuC based resets are not logged per engine */
+		if (!using_guc) {
+			reported = i915_reset_engine_count(global, engine);
+			reported -= threads[engine->id].resets;
+			if (reported != count) {
+				pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
+				       engine->name, test_name, count, reported);
+				if (!err)
+					err = -EINVAL;
+			}
 		}
 
 unwind:
@@ -1105,15 +1195,18 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			}
 			put_task_struct(threads[tmp].task);
 
-			if (other->uabi_class != engine->uabi_class &&
-			    threads[tmp].resets !=
-			    i915_reset_engine_count(global, other)) {
-				pr_err("Innocent engine %s was reset (count=%ld)\n",
-				       other->name,
-				       i915_reset_engine_count(global, other) -
-				       threads[tmp].resets);
-				if (!err)
-					err = -EINVAL;
+			/* GuC based resets are not logged per engine */
+			if (!using_guc) {
+				if (other->uabi_class != engine->uabi_class &&
+				    threads[tmp].resets !=
+				    i915_reset_engine_count(global, other)) {
+					pr_err("Innocent engine %s was reset (count=%ld)\n",
+					       other->name,
+					       i915_reset_engine_count(global, other) -
+					       threads[tmp].resets);
+					if (!err)
+						err = -EINVAL;
+				}
 			}
 		}
 
@@ -1553,18 +1646,29 @@ static int igt_reset_queue(void *arg)
 		goto unlock;
 
 	for_each_engine(engine, gt, id) {
+		struct intel_selftest_saved_policy saved;
 		struct i915_request *prev;
 		IGT_TIMEOUT(end_time);
 		unsigned int count;
+		bool using_guc = intel_engine_uses_guc(engine);
 
 		if (!intel_engine_can_store_dword(engine))
 			continue;
 
+		if (using_guc) {
+			err = intel_selftest_modify_policy(engine, &saved,
+							  SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK);
+			if (err) {
+				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
+				goto fini;
+			}
+		}
+
 		prev = hang_create_request(&h, engine);
 		if (IS_ERR(prev)) {
 			err = PTR_ERR(prev);
 			pr_err("[%s] Create 'prev' hang request failed: %d!\n", engine->name, err);
-			goto fini;
+			goto restore;
 		}
 
 		i915_request_get(prev);
@@ -1579,7 +1683,7 @@ static int igt_reset_queue(void *arg)
 			if (IS_ERR(rq)) {
 				err = PTR_ERR(rq);
 				pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
-				goto fini;
+				goto restore;
 			}
 
 			i915_request_get(rq);
@@ -1604,7 +1708,7 @@ static int igt_reset_queue(void *arg)
 
 				GEM_TRACE_DUMP();
 				intel_gt_set_wedged(gt);
-				goto fini;
+				goto restore;
 			}
 
 			if (!wait_until_running(&h, prev)) {
@@ -1622,7 +1726,7 @@ static int igt_reset_queue(void *arg)
 				intel_gt_set_wedged(gt);
 
 				err = -EIO;
-				goto fini;
+				goto restore;
 			}
 
 			reset_count = fake_hangcheck(gt, BIT(id));
@@ -1633,7 +1737,7 @@ static int igt_reset_queue(void *arg)
 				i915_request_put(rq);
 				i915_request_put(prev);
 				err = -EINVAL;
-				goto fini;
+				goto restore;
 			}
 
 			if (rq->fence.error) {
@@ -1642,7 +1746,7 @@ static int igt_reset_queue(void *arg)
 				i915_request_put(rq);
 				i915_request_put(prev);
 				err = -EINVAL;
-				goto fini;
+				goto restore;
 			}
 
 			if (i915_reset_count(global) == reset_count) {
@@ -1650,7 +1754,7 @@ static int igt_reset_queue(void *arg)
 				i915_request_put(rq);
 				i915_request_put(prev);
 				err = -EINVAL;
-				goto fini;
+				goto restore;
 			}
 
 			i915_request_put(prev);
@@ -1665,6 +1769,17 @@ static int igt_reset_queue(void *arg)
 
 		i915_request_put(prev);
 
+restore:
+		if (using_guc) {
+			int err2 = intel_selftest_restore_policy(engine, &saved);
+			if (err2)
+				pr_err("%s:%d> [%s] Restore policy failed: %d!\n", __func__, __LINE__, engine->name, err2);
+			if (err == 0)
+				err = err2;
+		}
+		if (err)
+			goto fini;
+
 		err = igt_flush_test(gt->i915);
 		if (err) {
 			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index b7314739ee40..13d25bf2a94a 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -408,7 +408,8 @@ static int live_mocs_reset(void *arg)
 		struct intel_context *ce;
 		int err2;
 
-		err = intel_selftest_modify_policy(engine, &saved);
+		err = intel_selftest_modify_policy(engine, &saved,
+						   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
 		if (err)
 			break;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
index 7727bc531ea9..d820f0b41634 100644
--- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
@@ -810,7 +810,8 @@ static int live_reset_whitelist(void *arg)
 				struct intel_selftest_saved_policy saved;
 				int err2;
 
-				err = intel_selftest_modify_policy(engine, &saved);
+				err = intel_selftest_modify_policy(engine, &saved,
+								   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
 				if(err)
 					goto out;
 
@@ -1268,7 +1269,8 @@ live_engine_reset_workarounds(void *arg)
 		int ret2;
 
 		pr_info("Verifying after %s reset...\n", engine->name);
-		ret = intel_selftest_modify_policy(engine, &saved);
+		ret = intel_selftest_modify_policy(engine, &saved,
+						   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
 		if (ret)
 			break;
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 1c30d04733ff..536fdbc406c6 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -1235,6 +1235,9 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 {
 	desc->policy_flags = 0;
 
+	if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
+		desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
+
 	/* NB: For both of these, zero means disabled. */
 	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
 	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
index 91ecd8a1bd21..69db139f9e0d 100644
--- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
@@ -16,7 +16,8 @@
 #define WAIT_FOR_RESET_TIME	1000
 
 int intel_selftest_modify_policy(struct intel_engine_cs *engine,
-				 struct intel_selftest_saved_policy *saved)
+				 struct intel_selftest_saved_policy *saved,
+				 u32 modify_type)
 
 {
 	int err;
@@ -26,18 +27,30 @@ int intel_selftest_modify_policy(struct intel_engine_cs *engine,
 	saved->timeslice = engine->props.timeslice_duration_ms;
 	saved->preempt_timeout = engine->props.preempt_timeout_ms;
 
-	/*
-	 * Enable force pre-emption on time slice expiration
-	 * together with engine reset on pre-emption timeout.
-	 * This is required to make the GuC notice and reset
-	 * the single hanging context.
-	 * Also, reduce the preemption timeout to something
-	 * small to speed the test up.
-	 */
-	engine->i915->params.reset = 2;
-	engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
-	engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
-	engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
+	switch (modify_type) {
+	case SELFTEST_SCHEDULER_MODIFY_FAST_RESET:
+		/*
+		 * Enable force pre-emption on time slice expiration
+		 * together with engine reset on pre-emption timeout.
+		 * This is required to make the GuC notice and reset
+		 * the single hanging context.
+		 * Also, reduce the preemption timeout to something
+		 * small to speed the test up.
+		 */
+		engine->i915->params.reset = 2;
+		engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
+		engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
+		engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
+		break;
+
+	case SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK:
+		engine->props.preempt_timeout_ms = 0;
+		break;
+
+	default:
+		pr_err("Invalid scheduler policy modification type: %d!\n", modify_type);
+		return -EINVAL;
+	}
 
 	if (!intel_engine_uses_guc(engine))
 		return 0;
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
index f30e96f0ba95..050bc5a8ba8b 100644
--- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
@@ -19,8 +19,15 @@ struct intel_selftest_saved_policy
 	u64 preempt_timeout;
 };
 
+enum selftest_scheduler_modify
+{
+	SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK = 0,
+	SELFTEST_SCHEDULER_MODIFY_FAST_RESET,
+};
+
 int intel_selftest_modify_policy(struct intel_engine_cs *engine,
-				 struct intel_selftest_saved_policy *saved);
+				 struct intel_selftest_saved_policy *saved,
+				 enum selftest_scheduler_modify modify_type);
 int intel_selftest_restore_policy(struct intel_engine_cs *engine,
 				  struct intel_selftest_saved_policy *saved);
 int intel_selftest_wait_for_rq( struct i915_request *rq);
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 49/51] drm/i915/selftest: Bump selftest timeouts for hangcheck
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: John Harrison <John.C.Harrison@Intel.com>

Some testing environments and some heavier tests are slower than
previous limits allowed for. For example, it can take multiple seconds
for the 'context has been reset' notification handler to reach the
'kill the requests' code in the 'active' version of the 'reset
engines' test. During which time the selftest gets bored, gives up
waiting and fails the test.

There is also an async thread that the selftest uses to pump work
through the hardware in parallel to the context that is marked for
reset. That also could get bored waiting for completions and kill the
test off.

Lastly, the flush at the of various test sections can also see
timeouts due to the large amount of work backed up. This is also true
of the live_hwsp_read test.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c             | 2 +-
 drivers/gpu/drm/i915/selftests/igt_flush_test.c          | 2 +-
 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 971c0c249eb0..a93a9b0d258e 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -876,7 +876,7 @@ static int active_request_put(struct i915_request *rq)
 	if (!rq)
 		return 0;
 
-	if (i915_request_wait(rq, 0, 5 * HZ) < 0) {
+	if (i915_request_wait(rq, 0, 10 * HZ) < 0) {
 		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld\n",
 			  rq->engine->name,
 			  rq->fence.context,
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index 7b0939e3f007..a6c71fca61aa 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -19,7 +19,7 @@ int igt_flush_test(struct drm_i915_private *i915)
 
 	cond_resched();
 
-	if (intel_gt_wait_for_idle(gt, HZ / 5) == -ETIME) {
+	if (intel_gt_wait_for_idle(gt, HZ) == -ETIME) {
 		pr_err("%pS timed out, cancelling all further testing.\n",
 		       __builtin_return_address(0));
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
index 69db139f9e0d..ebd6d69b3315 100644
--- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
@@ -13,7 +13,7 @@
 
 #define REDUCED_TIMESLICE	5
 #define REDUCED_PREEMPT		10
-#define WAIT_FOR_RESET_TIME	1000
+#define WAIT_FOR_RESET_TIME	10000
 
 int intel_selftest_modify_policy(struct intel_engine_cs *engine,
 				 struct intel_selftest_saved_policy *saved,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 49/51] drm/i915/selftest: Bump selftest timeouts for hangcheck
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: John Harrison <John.C.Harrison@Intel.com>

Some testing environments and some heavier tests are slower than
previous limits allowed for. For example, it can take multiple seconds
for the 'context has been reset' notification handler to reach the
'kill the requests' code in the 'active' version of the 'reset
engines' test. During which time the selftest gets bored, gives up
waiting and fails the test.

There is also an async thread that the selftest uses to pump work
through the hardware in parallel to the context that is marked for
reset. That also could get bored waiting for completions and kill the
test off.

Lastly, the flush at the of various test sections can also see
timeouts due to the large amount of work backed up. This is also true
of the live_hwsp_read test.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c             | 2 +-
 drivers/gpu/drm/i915/selftests/igt_flush_test.c          | 2 +-
 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 971c0c249eb0..a93a9b0d258e 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -876,7 +876,7 @@ static int active_request_put(struct i915_request *rq)
 	if (!rq)
 		return 0;
 
-	if (i915_request_wait(rq, 0, 5 * HZ) < 0) {
+	if (i915_request_wait(rq, 0, 10 * HZ) < 0) {
 		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld\n",
 			  rq->engine->name,
 			  rq->fence.context,
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index 7b0939e3f007..a6c71fca61aa 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -19,7 +19,7 @@ int igt_flush_test(struct drm_i915_private *i915)
 
 	cond_resched();
 
-	if (intel_gt_wait_for_idle(gt, HZ / 5) == -ETIME) {
+	if (intel_gt_wait_for_idle(gt, HZ) == -ETIME) {
 		pr_err("%pS timed out, cancelling all further testing.\n",
 		       __builtin_return_address(0));
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
index 69db139f9e0d..ebd6d69b3315 100644
--- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
+++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
@@ -13,7 +13,7 @@
 
 #define REDUCED_TIMESLICE	5
 #define REDUCED_PREEMPT		10
-#define WAIT_FOR_RESET_TIME	1000
+#define WAIT_FOR_RESET_TIME	10000
 
 int intel_selftest_modify_policy(struct intel_engine_cs *engine,
 				 struct intel_selftest_saved_policy *saved,
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 50/51] drm/i915/guc: Implement GuC priority management
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

Implement a simple static mapping algorithm of the i915 priority levels
(int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as
follows:

i915 level < 0          -> GuC low level     (3)
i915 level == 0         -> GuC normal level  (2)
i915 level < INT_MAX    -> GuC high level    (1)
i915 level == INT_MAX   -> GuC highest level (0)

We believe this mapping should cover the UMD use cases (3 distinct user
levels + 1 kernel level).

In addition to static mapping, a simple counter system is attached to
each context tracking the number of requests inflight on the context at
each level. This is needed as the GuC levels are per context while in
the i915 levels are per request.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   3 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 207 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.c           |   5 +
 drivers/gpu/drm/i915/i915_request.h           |   8 +
 drivers/gpu/drm/i915/i915_scheduler.c         |   7 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  12 +
 drivers/gpu/drm/i915/i915_trace.h             |  16 +-
 include/uapi/drm/i915_drm.h                   |   9 +
 10 files changed, 274 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 2007dc6f6b99..209cf265bf74 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -245,6 +245,9 @@ static void signal_irq_work(struct irq_work *work)
 			llist_entry(signal, typeof(*rq), signal_node);
 		struct list_head cb_list;
 
+		if (rq->engine->sched_engine->retire_inflight_request_prio)
+			rq->engine->sched_engine->retire_inflight_request_prio(rq);
+
 		spin_lock(&rq->lock);
 		list_replace(&rq->fence.cb_list, &cb_list);
 		__dma_fence_signal__timestamp(&rq->fence, timestamp);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 005a64f2afa7..fe555551c2d2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -18,8 +18,9 @@
 #include "intel_engine_types.h"
 #include "intel_sseu.h"
 
-#define CONTEXT_REDZONE POISON_INUSE
+#include "uc/intel_guc_fwif.h"
 
+#define CONTEXT_REDZONE POISON_INUSE
 DECLARE_EWMA(runtime, 3, 8);
 
 struct i915_gem_context;
@@ -191,6 +192,12 @@ struct intel_context {
 
 	/* GuC context blocked fence */
 	struct i915_sw_fence guc_blocked;
+
+	/*
+	 * GuC priority management
+	 */
+	u8 guc_prio;
+	u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 84142127ebd8..8f8bea08e734 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -11,6 +11,7 @@
 #include "intel_engine.h"
 #include "intel_engine_user.h"
 #include "intel_gt.h"
+#include "uc/intel_guc_submission.h"
 
 struct intel_engine_cs *
 intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
@@ -115,6 +116,9 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
 			disabled |= (I915_SCHEDULER_CAP_ENABLED |
 				     I915_SCHEDULER_CAP_PRIORITY);
 
+		if (intel_uc_uses_guc_submission(&i915->gt.uc))
+			enabled |= I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
+
 		for (i = 0; i < ARRAY_SIZE(map); i++) {
 			if (engine->flags & BIT(map[i].engine))
 				enabled |= BIT(map[i].sched);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 536fdbc406c6..263ad6a9e4a9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -81,7 +81,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
  */
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
 #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
-#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
+#define SCHED_STATE_NO_LOCK_REGISTERED			BIT(2)
+#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		3
 #define SCHED_STATE_NO_LOCK_BLOCKED \
 	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
 #define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
@@ -142,6 +143,24 @@ static inline void decr_context_blocked(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline bool context_registered(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_REGISTERED);
+}
+
+static inline void set_context_registered(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_REGISTERED,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_registered(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_REGISTERED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -1080,6 +1099,7 @@ static int steal_guc_id(struct intel_guc *guc)
 
 		list_del_init(&ce->guc_id_link);
 		guc_id = ce->guc_id;
+		clr_context_registered(ce);
 		set_context_guc_id_invalid(ce);
 		return guc_id;
 	} else {
@@ -1184,10 +1204,13 @@ static int register_context(struct intel_context *ce, bool loop)
 	struct intel_guc *guc = ce_to_guc(ce);
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
 		ce->guc_id * sizeof(struct guc_lrc_desc);
+	int ret;
 
 	trace_intel_context_register(ce);
 
-	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
+	ret = __guc_action_register_context(guc, ce->guc_id, offset, loop);
+	set_context_registered(ce);
+	return ret;
 }
 
 static int __guc_action_deregister_context(struct intel_guc *guc,
@@ -1243,6 +1266,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
 }
 
+static inline u8 map_i915_prio_to_guc_prio(int prio);
+
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 {
 	struct intel_runtime_pm *runtime_pm =
@@ -1251,6 +1276,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	struct intel_guc *guc = &engine->gt->uc.guc;
 	u32 desc_idx = ce->guc_id;
 	struct guc_lrc_desc *desc;
+	const struct i915_gem_context *ctx;
+	int prio = I915_CONTEXT_DEFAULT_PRIORITY;
 	bool context_registered;
 	intel_wakeref_t wakeref;
 	int ret = 0;
@@ -1266,6 +1293,12 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 
 	context_registered = lrc_desc_registered(guc, desc_idx);
 
+	rcu_read_lock();
+	ctx = rcu_dereference(ce->gem_context);
+	if (ctx)
+		prio = ctx->sched.priority;
+	rcu_read_unlock();
+
 	reset_lrc_desc(guc, desc_idx);
 	set_lrc_desc_registered(guc, desc_idx, ce);
 
@@ -1274,7 +1307,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	desc->engine_submit_mask = adjust_engine_mask(engine->class,
 						      engine->mask);
 	desc->hw_context_desc = ce->lrc.lrca;
-	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	ce->guc_prio = map_i915_prio_to_guc_prio(prio);
+	desc->priority = ce->guc_prio;
 	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
 	guc_context_policy_init(engine, desc);
 	init_sched_state(ce);
@@ -1659,11 +1693,17 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
 	GEM_BUG_ON(context_enabled(ce));
 
+	clr_context_registered(ce);
 	deregister_context(ce, ce->guc_id, true);
 }
 
 static void __guc_context_destroy(struct intel_context *ce)
 {
+	GEM_BUG_ON(ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] ||
+		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] ||
+		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] ||
+		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]);
+
 	lrc_fini(ce);
 	intel_context_fini(ce);
 
@@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void guc_context_set_prio(struct intel_guc *guc,
+				 struct intel_context *ce,
+				 u8 prio)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
+		ce->guc_id,
+		prio,
+	};
+
+	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
+		   prio > GUC_CLIENT_PRIORITY_NORMAL);
+
+	if (ce->guc_prio == prio || submission_disabled(guc) ||
+	    !context_registered(ce))
+		return;
+
+	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+
+	ce->guc_prio = prio;
+	trace_intel_context_set_prio(ce);
+}
+
+static inline u8 map_i915_prio_to_guc_prio(int prio)
+{
+	if (prio == I915_PRIORITY_NORMAL)
+		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	else if (prio < I915_PRIORITY_NORMAL)
+		return GUC_CLIENT_PRIORITY_NORMAL;
+	else if (prio != I915_PRIORITY_BARRIER)
+		return GUC_CLIENT_PRIORITY_HIGH;
+	else
+		return GUC_CLIENT_PRIORITY_KMD_HIGH;
+}
+
+static inline void add_context_inflight_prio(struct intel_context *ce,
+					     u8 guc_prio)
+{
+	lockdep_assert_held(&ce->guc_active.lock);
+	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
+
+	++ce->guc_prio_count[guc_prio];
+
+	/* Overflow protection */
+	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
+}
+
+static inline void sub_context_inflight_prio(struct intel_context *ce,
+					     u8 guc_prio)
+{
+	lockdep_assert_held(&ce->guc_active.lock);
+	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
+
+	/* Underflow protection */
+	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
+
+	--ce->guc_prio_count[guc_prio];
+}
+
+static inline void update_context_prio(struct intel_context *ce)
+{
+	struct intel_guc *guc = &ce->engine->gt->uc.guc;
+	int i;
+
+	lockdep_assert_held(&ce->guc_active.lock);
+
+	for (i = 0; i < ARRAY_SIZE(ce->guc_prio_count); ++i) {
+		if (ce->guc_prio_count[i]) {
+			guc_context_set_prio(guc, ce, i);
+			break;
+		}
+	}
+}
+
+static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio)
+{
+	/* Lower value is higher priority */
+	return new_guc_prio < old_guc_prio;
+}
+
 static void add_to_context(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
+	u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq));
+
+	GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI);
 
 	spin_lock(&ce->guc_active.lock);
 	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
+
+	if (rq->guc_prio == GUC_PRIO_INIT) {
+		rq->guc_prio = new_guc_prio;
+		add_context_inflight_prio(ce, rq->guc_prio);
+	} else if (new_guc_prio_higher(rq->guc_prio, new_guc_prio)) {
+		sub_context_inflight_prio(ce, rq->guc_prio);
+		rq->guc_prio = new_guc_prio;
+		add_context_inflight_prio(ce, rq->guc_prio);
+	}
+	update_context_prio(ce);
+
 	spin_unlock(&ce->guc_active.lock);
 }
 
+static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce)
+{
+	if (rq->guc_prio != GUC_PRIO_INIT &&
+	    rq->guc_prio != GUC_PRIO_FINI) {
+		sub_context_inflight_prio(ce, rq->guc_prio);
+		update_context_prio(ce);
+	}
+	rq->guc_prio = GUC_PRIO_FINI;
+}
+
 static void remove_from_context(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
@@ -1777,6 +1921,8 @@ static void remove_from_context(struct i915_request *rq)
 	/* Prevent further __await_execution() registering a cb, then flush */
 	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
 
+	guc_prio_fini(rq, ce);
+
 	spin_unlock_irq(&ce->guc_active.lock);
 
 	atomic_dec(&ce->guc_id_ref);
@@ -2058,6 +2204,39 @@ static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
 	}
 }
 
+static void guc_bump_inflight_request_prio(struct i915_request *rq,
+					   int prio)
+{
+	struct intel_context *ce = rq->context;
+	u8 new_guc_prio = map_i915_prio_to_guc_prio(prio);
+
+	/* Short circuit function */
+	if (prio < I915_PRIORITY_NORMAL ||
+	    (rq->guc_prio == GUC_PRIO_FINI) ||
+	    (rq->guc_prio != GUC_PRIO_INIT &&
+	     !new_guc_prio_higher(rq->guc_prio, new_guc_prio)))
+		return;
+
+	spin_lock(&ce->guc_active.lock);
+	if (rq->guc_prio != GUC_PRIO_FINI) {
+		if (rq->guc_prio != GUC_PRIO_INIT)
+			sub_context_inflight_prio(ce, rq->guc_prio);
+		rq->guc_prio = new_guc_prio;
+		add_context_inflight_prio(ce, rq->guc_prio);
+		update_context_prio(ce);
+	}
+	spin_unlock(&ce->guc_active.lock);
+}
+
+static void guc_retire_inflight_request_prio(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock(&ce->guc_active.lock);
+	guc_prio_fini(rq, ce);
+	spin_unlock(&ce->guc_active.lock);
+}
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -2293,6 +2472,10 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 		guc->sched_engine->disabled = guc_sched_engine_disabled;
 		guc->sched_engine->private_data = guc;
 		guc->sched_engine->destroy = guc_sched_engine_destroy;
+		guc->sched_engine->bump_inflight_request_prio =
+			guc_bump_inflight_request_prio;
+		guc->sched_engine->retire_inflight_request_prio =
+			guc_retire_inflight_request_prio;
 		tasklet_setup(&guc->sched_engine->tasklet,
 			      guc_submission_tasklet);
 	}
@@ -2670,6 +2853,22 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 	drm_printf(p, "\n");
 }
 
+static inline void guc_log_context_priority(struct drm_printer *p,
+					    struct intel_context *ce)
+{
+	int i;
+
+	drm_printf(p, "\t\tPriority: %d\n",
+		   ce->guc_prio);
+	drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n");
+	for (i = GUC_CLIENT_PRIORITY_KMD_HIGH;
+	     i < GUC_CLIENT_PRIORITY_NUM; ++i) {
+		drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n",
+			   i, ce->guc_prio_count[i]);
+	}
+	drm_printf(p, "\n");
+}
+
 void intel_guc_submission_print_context_info(struct intel_guc *guc,
 					     struct drm_printer *p)
 {
@@ -2692,6 +2891,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
 			   ce->guc_state.sched_state,
 			   atomic_read(&ce->guc_sched_state_no_lock));
+
+		guc_log_context_priority(p, ce);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f3552642b8a1..3fdfada99ba0 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -114,6 +114,9 @@ static void i915_fence_release(struct dma_fence *fence)
 {
 	struct i915_request *rq = to_request(fence);
 
+	GEM_BUG_ON(rq->guc_prio != GUC_PRIO_INIT &&
+		   rq->guc_prio != GUC_PRIO_FINI);
+
 	/*
 	 * The request is put onto a RCU freelist (i.e. the address
 	 * is immediately reused), mark the fences as being freed now.
@@ -922,6 +925,8 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 
 	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
 
+	rq->guc_prio = GUC_PRIO_INIT;
+
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_reinit(&i915_request_get(rq)->submit);
 	i915_sw_fence_reinit(&i915_request_get(rq)->semaphore);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index a3d4728ea06c..f0463d19c712 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -293,6 +293,14 @@ struct i915_request {
 	 */
 	struct list_head guc_fence_link;
 
+	/**
+	 * Priority level while the request is inflight. Differs slightly than
+	 * i915 scheduler priority.
+	 */
+#define	GUC_PRIO_INIT	0xff
+#define	GUC_PRIO_FINI	0xfe
+	u8 guc_prio;
+
 	I915_SELFTEST_DECLARE(struct {
 		struct list_head link;
 		unsigned long delay;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 8766a8643469..3fccae3672c1 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -241,6 +241,9 @@ static void __i915_schedule(struct i915_sched_node *node,
 	/* Fifo and depth-first replacement ensure our deps execute before us */
 	sched_engine = lock_sched_engine(node, sched_engine, &cache);
 	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
+		struct i915_request *from = container_of(dep->signaler,
+							 struct i915_request,
+							 sched);
 		INIT_LIST_HEAD(&dep->dfs_link);
 
 		node = dep->signaler;
@@ -254,6 +257,10 @@ static void __i915_schedule(struct i915_sched_node *node,
 		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
 			   sched_engine);
 
+		/* Must be called before changing the nodes priority */
+		if (sched_engine->bump_inflight_request_prio)
+			sched_engine->bump_inflight_request_prio(from, prio);
+
 		WRITE_ONCE(node->attr.priority, prio);
 
 		/*
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index eaef233e9080..b0a1b58c7893 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -179,6 +179,18 @@ struct i915_sched_engine {
 	void	(*kick_backend)(const struct i915_request *rq,
 				int prio);
 
+	/**
+	 * @bump_inflight_request_prio: update priority of an inflight request
+	 */
+	void	(*bump_inflight_request_prio)(struct i915_request *rq,
+					      int prio);
+
+	/**
+	 * @retire_inflight_request_prio: indicate request is retired to
+	 * priority tracking
+	 */
+	void	(*retire_inflight_request_prio)(struct i915_request *rq);
+
 	/**
 	 * @schedule: adjust priority of request
 	 *
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 937d3706af9b..9d2cd14ed882 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -914,6 +914,7 @@ DECLARE_EVENT_CLASS(intel_context,
 			     __field(int, pin_count)
 			     __field(u32, sched_state)
 			     __field(u32, guc_sched_state_no_lock)
+			     __field(u8, guc_prio)
 			     ),
 
 	    TP_fast_assign(
@@ -922,11 +923,17 @@ DECLARE_EVENT_CLASS(intel_context,
 			   __entry->sched_state = ce->guc_state.sched_state;
 			   __entry->guc_sched_state_no_lock =
 			   atomic_read(&ce->guc_sched_state_no_lock);
+			   __entry->guc_prio = ce->guc_prio;
 			   ),
 
-	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
+	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x, guc_prio=%u",
 		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
-		      __entry->guc_sched_state_no_lock)
+		      __entry->guc_sched_state_no_lock, __entry->guc_prio)
+);
+
+DEFINE_EVENT(intel_context, intel_context_set_prio,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
 );
 
 DEFINE_EVENT(intel_context, intel_context_reset,
@@ -1036,6 +1043,11 @@ trace_i915_request_out(struct i915_request *rq)
 {
 }
 
+static inline void
+trace_intel_context_set_prio(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_reset(struct intel_context *ce)
 {
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index e54f9efaead0..cb0a5396e655 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -572,6 +572,15 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
 #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
 #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
+/*
+ * Indicates the 2k user priority levels are statically mapped into 3 buckets as
+ * follows:
+ *
+ * -1k to -1	Low priority
+ * 0		Normal priority
+ * 1 to 1k	Highest priority
+ */
+#define   I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP	(1ul << 5)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 50/51] drm/i915/guc: Implement GuC priority management
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

Implement a simple static mapping algorithm of the i915 priority levels
(int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as
follows:

i915 level < 0          -> GuC low level     (3)
i915 level == 0         -> GuC normal level  (2)
i915 level < INT_MAX    -> GuC high level    (1)
i915 level == INT_MAX   -> GuC highest level (0)

We believe this mapping should cover the UMD use cases (3 distinct user
levels + 1 kernel level).

In addition to static mapping, a simple counter system is attached to
each context tracking the number of requests inflight on the context at
each level. This is needed as the GuC levels are per context while in
the i915 levels are per request.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   3 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |   4 +
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 207 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.c           |   5 +
 drivers/gpu/drm/i915/i915_request.h           |   8 +
 drivers/gpu/drm/i915/i915_scheduler.c         |   7 +
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  12 +
 drivers/gpu/drm/i915/i915_trace.h             |  16 +-
 include/uapi/drm/i915_drm.h                   |   9 +
 10 files changed, 274 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 2007dc6f6b99..209cf265bf74 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -245,6 +245,9 @@ static void signal_irq_work(struct irq_work *work)
 			llist_entry(signal, typeof(*rq), signal_node);
 		struct list_head cb_list;
 
+		if (rq->engine->sched_engine->retire_inflight_request_prio)
+			rq->engine->sched_engine->retire_inflight_request_prio(rq);
+
 		spin_lock(&rq->lock);
 		list_replace(&rq->fence.cb_list, &cb_list);
 		__dma_fence_signal__timestamp(&rq->fence, timestamp);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 005a64f2afa7..fe555551c2d2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -18,8 +18,9 @@
 #include "intel_engine_types.h"
 #include "intel_sseu.h"
 
-#define CONTEXT_REDZONE POISON_INUSE
+#include "uc/intel_guc_fwif.h"
 
+#define CONTEXT_REDZONE POISON_INUSE
 DECLARE_EWMA(runtime, 3, 8);
 
 struct i915_gem_context;
@@ -191,6 +192,12 @@ struct intel_context {
 
 	/* GuC context blocked fence */
 	struct i915_sw_fence guc_blocked;
+
+	/*
+	 * GuC priority management
+	 */
+	u8 guc_prio;
+	u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
 };
 
 #endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 84142127ebd8..8f8bea08e734 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -11,6 +11,7 @@
 #include "intel_engine.h"
 #include "intel_engine_user.h"
 #include "intel_gt.h"
+#include "uc/intel_guc_submission.h"
 
 struct intel_engine_cs *
 intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
@@ -115,6 +116,9 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
 			disabled |= (I915_SCHEDULER_CAP_ENABLED |
 				     I915_SCHEDULER_CAP_PRIORITY);
 
+		if (intel_uc_uses_guc_submission(&i915->gt.uc))
+			enabled |= I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
+
 		for (i = 0; i < ARRAY_SIZE(map); i++) {
 			if (engine->flags & BIT(map[i].engine))
 				enabled |= BIT(map[i].sched);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 536fdbc406c6..263ad6a9e4a9 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -81,7 +81,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
  */
 #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
 #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
-#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
+#define SCHED_STATE_NO_LOCK_REGISTERED			BIT(2)
+#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		3
 #define SCHED_STATE_NO_LOCK_BLOCKED \
 	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
 #define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
@@ -142,6 +143,24 @@ static inline void decr_context_blocked(struct intel_context *ce)
 		   &ce->guc_sched_state_no_lock);
 }
 
+static inline bool context_registered(struct intel_context *ce)
+{
+	return (atomic_read(&ce->guc_sched_state_no_lock) &
+		SCHED_STATE_NO_LOCK_REGISTERED);
+}
+
+static inline void set_context_registered(struct intel_context *ce)
+{
+	atomic_or(SCHED_STATE_NO_LOCK_REGISTERED,
+		  &ce->guc_sched_state_no_lock);
+}
+
+static inline void clr_context_registered(struct intel_context *ce)
+{
+	atomic_and((u32)~SCHED_STATE_NO_LOCK_REGISTERED,
+		   &ce->guc_sched_state_no_lock);
+}
+
 /*
  * Below is a set of functions which control the GuC scheduling state which
  * require a lock, aside from the special case where the functions are called
@@ -1080,6 +1099,7 @@ static int steal_guc_id(struct intel_guc *guc)
 
 		list_del_init(&ce->guc_id_link);
 		guc_id = ce->guc_id;
+		clr_context_registered(ce);
 		set_context_guc_id_invalid(ce);
 		return guc_id;
 	} else {
@@ -1184,10 +1204,13 @@ static int register_context(struct intel_context *ce, bool loop)
 	struct intel_guc *guc = ce_to_guc(ce);
 	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
 		ce->guc_id * sizeof(struct guc_lrc_desc);
+	int ret;
 
 	trace_intel_context_register(ce);
 
-	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
+	ret = __guc_action_register_context(guc, ce->guc_id, offset, loop);
+	set_context_registered(ce);
+	return ret;
 }
 
 static int __guc_action_deregister_context(struct intel_guc *guc,
@@ -1243,6 +1266,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
 	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
 }
 
+static inline u8 map_i915_prio_to_guc_prio(int prio);
+
 static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 {
 	struct intel_runtime_pm *runtime_pm =
@@ -1251,6 +1276,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	struct intel_guc *guc = &engine->gt->uc.guc;
 	u32 desc_idx = ce->guc_id;
 	struct guc_lrc_desc *desc;
+	const struct i915_gem_context *ctx;
+	int prio = I915_CONTEXT_DEFAULT_PRIORITY;
 	bool context_registered;
 	intel_wakeref_t wakeref;
 	int ret = 0;
@@ -1266,6 +1293,12 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 
 	context_registered = lrc_desc_registered(guc, desc_idx);
 
+	rcu_read_lock();
+	ctx = rcu_dereference(ce->gem_context);
+	if (ctx)
+		prio = ctx->sched.priority;
+	rcu_read_unlock();
+
 	reset_lrc_desc(guc, desc_idx);
 	set_lrc_desc_registered(guc, desc_idx, ce);
 
@@ -1274,7 +1307,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
 	desc->engine_submit_mask = adjust_engine_mask(engine->class,
 						      engine->mask);
 	desc->hw_context_desc = ce->lrc.lrca;
-	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	ce->guc_prio = map_i915_prio_to_guc_prio(prio);
+	desc->priority = ce->guc_prio;
 	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
 	guc_context_policy_init(engine, desc);
 	init_sched_state(ce);
@@ -1659,11 +1693,17 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
 	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
 	GEM_BUG_ON(context_enabled(ce));
 
+	clr_context_registered(ce);
 	deregister_context(ce, ce->guc_id, true);
 }
 
 static void __guc_context_destroy(struct intel_context *ce)
 {
+	GEM_BUG_ON(ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] ||
+		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] ||
+		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] ||
+		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]);
+
 	lrc_fini(ce);
 	intel_context_fini(ce);
 
@@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
 	return lrc_alloc(ce, ce->engine);
 }
 
+static void guc_context_set_prio(struct intel_guc *guc,
+				 struct intel_context *ce,
+				 u8 prio)
+{
+	u32 action[] = {
+		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
+		ce->guc_id,
+		prio,
+	};
+
+	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
+		   prio > GUC_CLIENT_PRIORITY_NORMAL);
+
+	if (ce->guc_prio == prio || submission_disabled(guc) ||
+	    !context_registered(ce))
+		return;
+
+	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
+
+	ce->guc_prio = prio;
+	trace_intel_context_set_prio(ce);
+}
+
+static inline u8 map_i915_prio_to_guc_prio(int prio)
+{
+	if (prio == I915_PRIORITY_NORMAL)
+		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
+	else if (prio < I915_PRIORITY_NORMAL)
+		return GUC_CLIENT_PRIORITY_NORMAL;
+	else if (prio != I915_PRIORITY_BARRIER)
+		return GUC_CLIENT_PRIORITY_HIGH;
+	else
+		return GUC_CLIENT_PRIORITY_KMD_HIGH;
+}
+
+static inline void add_context_inflight_prio(struct intel_context *ce,
+					     u8 guc_prio)
+{
+	lockdep_assert_held(&ce->guc_active.lock);
+	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
+
+	++ce->guc_prio_count[guc_prio];
+
+	/* Overflow protection */
+	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
+}
+
+static inline void sub_context_inflight_prio(struct intel_context *ce,
+					     u8 guc_prio)
+{
+	lockdep_assert_held(&ce->guc_active.lock);
+	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
+
+	/* Underflow protection */
+	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
+
+	--ce->guc_prio_count[guc_prio];
+}
+
+static inline void update_context_prio(struct intel_context *ce)
+{
+	struct intel_guc *guc = &ce->engine->gt->uc.guc;
+	int i;
+
+	lockdep_assert_held(&ce->guc_active.lock);
+
+	for (i = 0; i < ARRAY_SIZE(ce->guc_prio_count); ++i) {
+		if (ce->guc_prio_count[i]) {
+			guc_context_set_prio(guc, ce, i);
+			break;
+		}
+	}
+}
+
+static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio)
+{
+	/* Lower value is higher priority */
+	return new_guc_prio < old_guc_prio;
+}
+
 static void add_to_context(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
+	u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq));
+
+	GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI);
 
 	spin_lock(&ce->guc_active.lock);
 	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
+
+	if (rq->guc_prio == GUC_PRIO_INIT) {
+		rq->guc_prio = new_guc_prio;
+		add_context_inflight_prio(ce, rq->guc_prio);
+	} else if (new_guc_prio_higher(rq->guc_prio, new_guc_prio)) {
+		sub_context_inflight_prio(ce, rq->guc_prio);
+		rq->guc_prio = new_guc_prio;
+		add_context_inflight_prio(ce, rq->guc_prio);
+	}
+	update_context_prio(ce);
+
 	spin_unlock(&ce->guc_active.lock);
 }
 
+static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce)
+{
+	if (rq->guc_prio != GUC_PRIO_INIT &&
+	    rq->guc_prio != GUC_PRIO_FINI) {
+		sub_context_inflight_prio(ce, rq->guc_prio);
+		update_context_prio(ce);
+	}
+	rq->guc_prio = GUC_PRIO_FINI;
+}
+
 static void remove_from_context(struct i915_request *rq)
 {
 	struct intel_context *ce = rq->context;
@@ -1777,6 +1921,8 @@ static void remove_from_context(struct i915_request *rq)
 	/* Prevent further __await_execution() registering a cb, then flush */
 	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
 
+	guc_prio_fini(rq, ce);
+
 	spin_unlock_irq(&ce->guc_active.lock);
 
 	atomic_dec(&ce->guc_id_ref);
@@ -2058,6 +2204,39 @@ static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
 	}
 }
 
+static void guc_bump_inflight_request_prio(struct i915_request *rq,
+					   int prio)
+{
+	struct intel_context *ce = rq->context;
+	u8 new_guc_prio = map_i915_prio_to_guc_prio(prio);
+
+	/* Short circuit function */
+	if (prio < I915_PRIORITY_NORMAL ||
+	    (rq->guc_prio == GUC_PRIO_FINI) ||
+	    (rq->guc_prio != GUC_PRIO_INIT &&
+	     !new_guc_prio_higher(rq->guc_prio, new_guc_prio)))
+		return;
+
+	spin_lock(&ce->guc_active.lock);
+	if (rq->guc_prio != GUC_PRIO_FINI) {
+		if (rq->guc_prio != GUC_PRIO_INIT)
+			sub_context_inflight_prio(ce, rq->guc_prio);
+		rq->guc_prio = new_guc_prio;
+		add_context_inflight_prio(ce, rq->guc_prio);
+		update_context_prio(ce);
+	}
+	spin_unlock(&ce->guc_active.lock);
+}
+
+static void guc_retire_inflight_request_prio(struct i915_request *rq)
+{
+	struct intel_context *ce = rq->context;
+
+	spin_lock(&ce->guc_active.lock);
+	guc_prio_fini(rq, ce);
+	spin_unlock(&ce->guc_active.lock);
+}
+
 static void sanitize_hwsp(struct intel_engine_cs *engine)
 {
 	struct intel_timeline *tl;
@@ -2293,6 +2472,10 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
 		guc->sched_engine->disabled = guc_sched_engine_disabled;
 		guc->sched_engine->private_data = guc;
 		guc->sched_engine->destroy = guc_sched_engine_destroy;
+		guc->sched_engine->bump_inflight_request_prio =
+			guc_bump_inflight_request_prio;
+		guc->sched_engine->retire_inflight_request_prio =
+			guc_retire_inflight_request_prio;
 		tasklet_setup(&guc->sched_engine->tasklet,
 			      guc_submission_tasklet);
 	}
@@ -2670,6 +2853,22 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
 	drm_printf(p, "\n");
 }
 
+static inline void guc_log_context_priority(struct drm_printer *p,
+					    struct intel_context *ce)
+{
+	int i;
+
+	drm_printf(p, "\t\tPriority: %d\n",
+		   ce->guc_prio);
+	drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n");
+	for (i = GUC_CLIENT_PRIORITY_KMD_HIGH;
+	     i < GUC_CLIENT_PRIORITY_NUM; ++i) {
+		drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n",
+			   i, ce->guc_prio_count[i]);
+	}
+	drm_printf(p, "\n");
+}
+
 void intel_guc_submission_print_context_info(struct intel_guc *guc,
 					     struct drm_printer *p)
 {
@@ -2692,6 +2891,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
 		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
 			   ce->guc_state.sched_state,
 			   atomic_read(&ce->guc_sched_state_no_lock));
+
+		guc_log_context_priority(p, ce);
 	}
 }
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f3552642b8a1..3fdfada99ba0 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -114,6 +114,9 @@ static void i915_fence_release(struct dma_fence *fence)
 {
 	struct i915_request *rq = to_request(fence);
 
+	GEM_BUG_ON(rq->guc_prio != GUC_PRIO_INIT &&
+		   rq->guc_prio != GUC_PRIO_FINI);
+
 	/*
 	 * The request is put onto a RCU freelist (i.e. the address
 	 * is immediately reused), mark the fences as being freed now.
@@ -922,6 +925,8 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
 
 	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
 
+	rq->guc_prio = GUC_PRIO_INIT;
+
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_reinit(&i915_request_get(rq)->submit);
 	i915_sw_fence_reinit(&i915_request_get(rq)->semaphore);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index a3d4728ea06c..f0463d19c712 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -293,6 +293,14 @@ struct i915_request {
 	 */
 	struct list_head guc_fence_link;
 
+	/**
+	 * Priority level while the request is inflight. Differs slightly than
+	 * i915 scheduler priority.
+	 */
+#define	GUC_PRIO_INIT	0xff
+#define	GUC_PRIO_FINI	0xfe
+	u8 guc_prio;
+
 	I915_SELFTEST_DECLARE(struct {
 		struct list_head link;
 		unsigned long delay;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 8766a8643469..3fccae3672c1 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -241,6 +241,9 @@ static void __i915_schedule(struct i915_sched_node *node,
 	/* Fifo and depth-first replacement ensure our deps execute before us */
 	sched_engine = lock_sched_engine(node, sched_engine, &cache);
 	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
+		struct i915_request *from = container_of(dep->signaler,
+							 struct i915_request,
+							 sched);
 		INIT_LIST_HEAD(&dep->dfs_link);
 
 		node = dep->signaler;
@@ -254,6 +257,10 @@ static void __i915_schedule(struct i915_sched_node *node,
 		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
 			   sched_engine);
 
+		/* Must be called before changing the nodes priority */
+		if (sched_engine->bump_inflight_request_prio)
+			sched_engine->bump_inflight_request_prio(from, prio);
+
 		WRITE_ONCE(node->attr.priority, prio);
 
 		/*
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index eaef233e9080..b0a1b58c7893 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -179,6 +179,18 @@ struct i915_sched_engine {
 	void	(*kick_backend)(const struct i915_request *rq,
 				int prio);
 
+	/**
+	 * @bump_inflight_request_prio: update priority of an inflight request
+	 */
+	void	(*bump_inflight_request_prio)(struct i915_request *rq,
+					      int prio);
+
+	/**
+	 * @retire_inflight_request_prio: indicate request is retired to
+	 * priority tracking
+	 */
+	void	(*retire_inflight_request_prio)(struct i915_request *rq);
+
 	/**
 	 * @schedule: adjust priority of request
 	 *
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 937d3706af9b..9d2cd14ed882 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -914,6 +914,7 @@ DECLARE_EVENT_CLASS(intel_context,
 			     __field(int, pin_count)
 			     __field(u32, sched_state)
 			     __field(u32, guc_sched_state_no_lock)
+			     __field(u8, guc_prio)
 			     ),
 
 	    TP_fast_assign(
@@ -922,11 +923,17 @@ DECLARE_EVENT_CLASS(intel_context,
 			   __entry->sched_state = ce->guc_state.sched_state;
 			   __entry->guc_sched_state_no_lock =
 			   atomic_read(&ce->guc_sched_state_no_lock);
+			   __entry->guc_prio = ce->guc_prio;
 			   ),
 
-	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
+	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x, guc_prio=%u",
 		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
-		      __entry->guc_sched_state_no_lock)
+		      __entry->guc_sched_state_no_lock, __entry->guc_prio)
+);
+
+DEFINE_EVENT(intel_context, intel_context_set_prio,
+	     TP_PROTO(struct intel_context *ce),
+	     TP_ARGS(ce)
 );
 
 DEFINE_EVENT(intel_context, intel_context_reset,
@@ -1036,6 +1043,11 @@ trace_i915_request_out(struct i915_request *rq)
 {
 }
 
+static inline void
+trace_intel_context_set_prio(struct intel_context *ce)
+{
+}
+
 static inline void
 trace_intel_context_reset(struct intel_context *ce)
 {
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index e54f9efaead0..cb0a5396e655 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -572,6 +572,15 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
 #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
 #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
+/*
+ * Indicates the 2k user priority levels are statically mapped into 3 buckets as
+ * follows:
+ *
+ * -1k to -1	Low priority
+ * 0		Normal priority
+ * 1 to 1k	Highest priority
+ */
+#define   I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP	(1ul << 5)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [PATCH 51/51] drm/i915/guc: Unblock GuC submission on Gen11+
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 20:17   ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Unblock GuC submission on Gen11+ platforms.

v2:
 (Martin Peres / John H)
  - Delete debug message when GuC is disabled by default on certain
    platforms

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h            |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 ++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |  3 +--
 drivers/gpu/drm/i915/gt/uc/intel_uc.c             | 13 ++++++++-----
 4 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index eb6062f95d3b..5d94cf482516 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -55,6 +55,7 @@ struct intel_guc {
 	struct ida guc_ids;
 	struct list_head guc_id_list;
 
+	bool submission_supported;
 	bool submission_selected;
 
 	struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 263ad6a9e4a9..32269a22562e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2512,6 +2512,13 @@ void intel_guc_submission_disable(struct intel_guc *guc)
 	/* Note: By the time we're here, GuC may have already been reset */
 }
 
+static bool __guc_submission_supported(struct intel_guc *guc)
+{
+	/* GuC submission is unavailable for pre-Gen11 */
+	return intel_guc_is_supported(guc) &&
+	       GRAPHICS_VER(guc_to_gt(guc)->i915) >= 11;
+}
+
 static bool __guc_submission_selected(struct intel_guc *guc)
 {
 	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
@@ -2524,6 +2531,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
 
 void intel_guc_submission_init_early(struct intel_guc *guc)
 {
+	guc->submission_supported = __guc_submission_supported(guc);
 	guc->submission_selected = __guc_submission_selected(guc);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 03bc1c83a4d2..c7ef44fa0c36 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,8 +38,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
 
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
-	/* XXX: GuC submission is unavailable for now */
-	return false;
+	return guc->submission_supported;
 }
 
 static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 7a69c3c027e9..da57d18d9f6b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -34,8 +34,14 @@ static void uc_expand_default_options(struct intel_uc *uc)
 		return;
 	}
 
-	/* Default: enable HuC authentication only */
-	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+	/* Intermediate platforms are HuC authentication only */
+	if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
+		i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+		return;
+	}
+
+	/* Default: enable HuC authentication and GuC submission */
+	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
 }
 
 /* Reset GuC providing us with fresh state for both GuC and HuC.
@@ -313,9 +319,6 @@ static int __uc_init(struct intel_uc *uc)
 	if (i915_inject_probe_failure(uc_to_gt(uc)->i915))
 		return -ENOMEM;
 
-	/* XXX: GuC submission is unavailable for now */
-	GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
-
 	ret = intel_guc_init(guc);
 	if (ret)
 		return ret;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 221+ messages in thread

* [Intel-gfx] [PATCH 51/51] drm/i915/guc: Unblock GuC submission on Gen11+
@ 2021-07-16 20:17   ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 20:17 UTC (permalink / raw)
  To: intel-gfx, dri-devel

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Unblock GuC submission on Gen11+ platforms.

v2:
 (Martin Peres / John H)
  - Delete debug message when GuC is disabled by default on certain
    platforms

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/i915/gt/uc/intel_guc.h            |  1 +
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  8 ++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h |  3 +--
 drivers/gpu/drm/i915/gt/uc/intel_uc.c             | 13 ++++++++-----
 4 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
index eb6062f95d3b..5d94cf482516 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
@@ -55,6 +55,7 @@ struct intel_guc {
 	struct ida guc_ids;
 	struct list_head guc_id_list;
 
+	bool submission_supported;
 	bool submission_selected;
 
 	struct i915_vma *ads_vma;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 263ad6a9e4a9..32269a22562e 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2512,6 +2512,13 @@ void intel_guc_submission_disable(struct intel_guc *guc)
 	/* Note: By the time we're here, GuC may have already been reset */
 }
 
+static bool __guc_submission_supported(struct intel_guc *guc)
+{
+	/* GuC submission is unavailable for pre-Gen11 */
+	return intel_guc_is_supported(guc) &&
+	       GRAPHICS_VER(guc_to_gt(guc)->i915) >= 11;
+}
+
 static bool __guc_submission_selected(struct intel_guc *guc)
 {
 	struct drm_i915_private *i915 = guc_to_gt(guc)->i915;
@@ -2524,6 +2531,7 @@ static bool __guc_submission_selected(struct intel_guc *guc)
 
 void intel_guc_submission_init_early(struct intel_guc *guc)
 {
+	guc->submission_supported = __guc_submission_supported(guc);
 	guc->submission_selected = __guc_submission_selected(guc);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
index 03bc1c83a4d2..c7ef44fa0c36 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
@@ -38,8 +38,7 @@ int intel_guc_wait_for_pending_msg(struct intel_guc *guc,
 
 static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
 {
-	/* XXX: GuC submission is unavailable for now */
-	return false;
+	return guc->submission_supported;
 }
 
 static inline bool intel_guc_submission_is_wanted(struct intel_guc *guc)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 7a69c3c027e9..da57d18d9f6b 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -34,8 +34,14 @@ static void uc_expand_default_options(struct intel_uc *uc)
 		return;
 	}
 
-	/* Default: enable HuC authentication only */
-	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+	/* Intermediate platforms are HuC authentication only */
+	if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) {
+		i915->params.enable_guc = ENABLE_GUC_LOAD_HUC;
+		return;
+	}
+
+	/* Default: enable HuC authentication and GuC submission */
+	i915->params.enable_guc = ENABLE_GUC_LOAD_HUC | ENABLE_GUC_SUBMISSION;
 }
 
 /* Reset GuC providing us with fresh state for both GuC and HuC.
@@ -313,9 +319,6 @@ static int __uc_init(struct intel_uc *uc)
 	if (i915_inject_probe_failure(uc_to_gt(uc)->i915))
 		return -ENOMEM;
 
-	/* XXX: GuC submission is unavailable for now */
-	GEM_BUG_ON(intel_uc_uses_guc_submission(uc));
-
 	ret = intel_guc_init(guc);
 	if (ret)
 		return ret;
-- 
2.28.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 221+ messages in thread

* Re: [PATCH 49/51] drm/i915/selftest: Bump selftest timeouts for hangcheck
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 22:23     ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 22:23 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

On Fri, Jul 16, 2021 at 01:17:22PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Some testing environments and some heavier tests are slower than
> previous limits allowed for. For example, it can take multiple seconds
> for the 'context has been reset' notification handler to reach the
> 'kill the requests' code in the 'active' version of the 'reset
> engines' test. During which time the selftest gets bored, gives up
> waiting and fails the test.
> 
> There is also an async thread that the selftest uses to pump work
> through the hardware in parallel to the context that is marked for
> reset. That also could get bored waiting for completions and kill the
> test off.
> 
> Lastly, the flush at the of various test sections can also see
> timeouts due to the large amount of work backed up. This is also true
> of the live_hwsp_read test.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c             | 2 +-
>  drivers/gpu/drm/i915/selftests/igt_flush_test.c          | 2 +-
>  drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 971c0c249eb0..a93a9b0d258e 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -876,7 +876,7 @@ static int active_request_put(struct i915_request *rq)
>  	if (!rq)
>  		return 0;
>  
> -	if (i915_request_wait(rq, 0, 5 * HZ) < 0) {
> +	if (i915_request_wait(rq, 0, 10 * HZ) < 0) {
>  		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld\n",
>  			  rq->engine->name,
>  			  rq->fence.context,
> diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> index 7b0939e3f007..a6c71fca61aa 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> @@ -19,7 +19,7 @@ int igt_flush_test(struct drm_i915_private *i915)
>  
>  	cond_resched();
>  
> -	if (intel_gt_wait_for_idle(gt, HZ / 5) == -ETIME) {
> +	if (intel_gt_wait_for_idle(gt, HZ) == -ETIME) {
>  		pr_err("%pS timed out, cancelling all further testing.\n",
>  		       __builtin_return_address(0));
>  
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> index 69db139f9e0d..ebd6d69b3315 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> @@ -13,7 +13,7 @@
>  
>  #define REDUCED_TIMESLICE	5
>  #define REDUCED_PREEMPT		10
> -#define WAIT_FOR_RESET_TIME	1000
> +#define WAIT_FOR_RESET_TIME	10000
>  
>  int intel_selftest_modify_policy(struct intel_engine_cs *engine,
>  				 struct intel_selftest_saved_policy *saved,
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 49/51] drm/i915/selftest: Bump selftest timeouts for hangcheck
@ 2021-07-16 22:23     ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 22:23 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:22PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Some testing environments and some heavier tests are slower than
> previous limits allowed for. For example, it can take multiple seconds
> for the 'context has been reset' notification handler to reach the
> 'kill the requests' code in the 'active' version of the 'reset
> engines' test. During which time the selftest gets bored, gives up
> waiting and fails the test.
> 
> There is also an async thread that the selftest uses to pump work
> through the hardware in parallel to the context that is marked for
> reset. That also could get bored waiting for completions and kill the
> test off.
> 
> Lastly, the flush at the of various test sections can also see
> timeouts due to the large amount of work backed up. This is also true
> of the live_hwsp_read test.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c             | 2 +-
>  drivers/gpu/drm/i915/selftests/igt_flush_test.c          | 2 +-
>  drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 971c0c249eb0..a93a9b0d258e 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -876,7 +876,7 @@ static int active_request_put(struct i915_request *rq)
>  	if (!rq)
>  		return 0;
>  
> -	if (i915_request_wait(rq, 0, 5 * HZ) < 0) {
> +	if (i915_request_wait(rq, 0, 10 * HZ) < 0) {
>  		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld\n",
>  			  rq->engine->name,
>  			  rq->fence.context,
> diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> index 7b0939e3f007..a6c71fca61aa 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> @@ -19,7 +19,7 @@ int igt_flush_test(struct drm_i915_private *i915)
>  
>  	cond_resched();
>  
> -	if (intel_gt_wait_for_idle(gt, HZ / 5) == -ETIME) {
> +	if (intel_gt_wait_for_idle(gt, HZ) == -ETIME) {
>  		pr_err("%pS timed out, cancelling all further testing.\n",
>  		       __builtin_return_address(0));
>  
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> index 69db139f9e0d..ebd6d69b3315 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> @@ -13,7 +13,7 @@
>  
>  #define REDUCED_TIMESLICE	5
>  #define REDUCED_PREEMPT		10
> -#define WAIT_FOR_RESET_TIME	1000
> +#define WAIT_FOR_RESET_TIME	10000
>  
>  int intel_selftest_modify_policy(struct intel_engine_cs *engine,
>  				 struct intel_selftest_saved_policy *saved,
> -- 
> 2.28.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 48/51] drm/i915/selftest: Fix hangcheck self test for GuC submission
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 23:43     ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 23:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

On Fri, Jul 16, 2021 at 01:17:21PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> When GuC submission is enabled, the GuC controls engine resets. Rather
> than explicitly triggering a reset, the driver must submit a hanging
> context to GuC and wait for the reset to occur.
> 
> Conversely, one of the tests specifically sends hanging batches to the
> engines but wants them to sit around until a manual reset of the full
> GT (including GuC itself). That means disabling GuC based engine
> resets to prevent those from killing the hanging batch too soon. So,
> add support to the scheduling policy helper for disabling resets as
> well as making them quicker!
> 
> In GuC submission mode, the 'is engine idle' test basically turns into
> 'is engine PM wakelock held'. Independently, there is a heartbeat
> disable helper function that the tests use. For unexplained reasons,
> this acquires the engine wakelock before disabling the heartbeat and
> only releases it when re-enabling the heartbeat. As one of the tests
> tries to do a wait for idle in the middle of a heartbeat disabled
> section, it is therefore guaranteed to always fail. Added a 'no_pm'
> variant of the heartbeat helper that allows the engine to be asleep
> while also having heartbeats disabled.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   1 +
>  .../drm/i915/gt/selftest_engine_heartbeat.c   |  22 ++
>  .../drm/i915/gt/selftest_engine_heartbeat.h   |   2 +
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 223 +++++++++++++-----
>  drivers/gpu/drm/i915/gt/selftest_mocs.c       |   3 +-
>  .../gpu/drm/i915/gt/selftest_workarounds.c    |   6 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   3 +
>  .../i915/selftests/intel_scheduler_helpers.c  |  39 ++-
>  .../i915/selftests/intel_scheduler_helpers.h  |   9 +-
>  9 files changed, 237 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index d66b732a91c2..eec57e57403f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -449,6 +449,7 @@ struct intel_engine_cs {
>  #define I915_ENGINE_IS_VIRTUAL       BIT(5)
>  #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
>  #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
> +#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
>  	unsigned int flags;
>  
>  	/*
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> index 4896e4ccad50..317eebf086c3 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> @@ -405,3 +405,25 @@ void st_engine_heartbeat_enable(struct intel_engine_cs *engine)
>  	engine->props.heartbeat_interval_ms =
>  		engine->defaults.heartbeat_interval_ms;
>  }
> +
> +void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine)
> +{
> +	engine->props.heartbeat_interval_ms = 0;
> +
> +	/*
> +	 * Park the heartbeat but without holding the PM lock as that
> +	 * makes the engines appear not-idle. Note that if/when unpark
> +	 * is called due to the PM lock being acquired later the
> +	 * heartbeat still won't be enabled because of the above = 0.
> +	 */
> +	if (intel_engine_pm_get_if_awake(engine)) {
> +		intel_engine_park_heartbeat(engine);
> +		intel_engine_pm_put(engine);
> +	}
> +}
> +
> +void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine)
> +{
> +	engine->props.heartbeat_interval_ms =
> +		engine->defaults.heartbeat_interval_ms;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
> index cd27113d5400..81da2cd8e406 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
> @@ -9,6 +9,8 @@
>  struct intel_engine_cs;
>  
>  void st_engine_heartbeat_disable(struct intel_engine_cs *engine);
> +void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine);
>  void st_engine_heartbeat_enable(struct intel_engine_cs *engine);
> +void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine);
>  
>  #endif /* SELFTEST_ENGINE_HEARTBEAT_H */
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 0ed87cc4d063..971c0c249eb0 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -17,6 +17,8 @@
>  #include "selftests/igt_flush_test.h"
>  #include "selftests/igt_reset.h"
>  #include "selftests/igt_atomic.h"
> +#include "selftests/igt_spinner.h"
> +#include "selftests/intel_scheduler_helpers.h"
>  
>  #include "selftests/mock_drm.h"
>  
> @@ -449,6 +451,14 @@ static int igt_reset_nop_engine(void *arg)
>  		IGT_TIMEOUT(end_time);
>  		int err;
>  
> +		if (intel_engine_uses_guc(engine)) {
> +			/* Engine level resets are triggered by GuC when a hang
> +			 * is detected. They can't be triggered by the KMD any
> +			 * more. Thus a nop batch cannot be used as a reset test
> +			 */
> +			continue;
> +		}
> +
>  		ce = intel_context_create(engine);
>  		if (IS_ERR(ce)) {
>  			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
> @@ -560,6 +570,10 @@ static int igt_reset_fail_engine(void *arg)
>  		IGT_TIMEOUT(end_time);
>  		int err;
>  
> +		/* Can't manually break the reset if i915 doesn't perform it */
> +		if (intel_engine_uses_guc(engine))
> +			continue;
> +
>  		ce = intel_context_create(engine);
>  		if (IS_ERR(ce)) {
>  			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
> @@ -699,8 +713,12 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  	for_each_engine(engine, gt, id) {
>  		unsigned int reset_count, reset_engine_count;
>  		unsigned long count;
> +		bool using_guc = intel_engine_uses_guc(engine);
>  		IGT_TIMEOUT(end_time);
>  
> +		if (using_guc && !active)
> +			continue;
> +
>  		if (active && !intel_engine_can_store_dword(engine))
>  			continue;
>  
> @@ -718,14 +736,23 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
>  		count = 0;
>  		do {
> -			if (active) {
> -				struct i915_request *rq;
> +			struct i915_request *rq = NULL;
> +			struct intel_selftest_saved_policy saved;
> +			int err2;
> +
> +			err = intel_selftest_modify_policy(engine, &saved,
> +							   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
> +			if (err) {
> +				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
> +				break;
> +			}
>  
> +			if (active) {
>  				rq = hang_create_request(&h, engine);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
>  					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
> -					break;
> +					goto restore;
>  				}
>  
>  				i915_request_get(rq);
> @@ -741,34 +768,58 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  
>  					i915_request_put(rq);
>  					err = -EIO;
> -					break;
> +					goto restore;
>  				}
> +			}
>  
> -				i915_request_put(rq);
> +			if (!using_guc) {
> +				err = intel_engine_reset(engine, NULL);
> +				if (err) {
> +					pr_err("intel_engine_reset(%s) failed, err:%d\n",
> +					       engine->name, err);
> +					goto skip;
> +				}
>  			}
>  
> -			err = intel_engine_reset(engine, NULL);
> -			if (err) {
> -				pr_err("intel_engine_reset(%s) failed, err:%d\n",
> -				       engine->name, err);
> -				break;
> +			if (rq) {
> +				/* Ensure the reset happens and kills the engine */
> +				err = intel_selftest_wait_for_rq(rq);
> +				if (err)
> +					pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n",
> +					       engine->name, rq->fence.context, rq->fence.seqno, rq->context->guc_id, err);
>  			}
>  
> +skip:
> +			if (rq)
> +				i915_request_put(rq);
> +
>  			if (i915_reset_count(global) != reset_count) {
>  				pr_err("Full GPU reset recorded! (engine reset expected)\n");
>  				err = -EINVAL;
> -				break;
> +				goto restore;
>  			}
>  
> -			if (i915_reset_engine_count(global, engine) !=
> -			    ++reset_engine_count) {
> -				pr_err("%s engine reset not recorded!\n",
> -				       engine->name);
> -				err = -EINVAL;
> -				break;
> +			/* GuC based resets are not logged per engine */
> +			if (!using_guc) {
> +				if (i915_reset_engine_count(global, engine) !=
> +				    ++reset_engine_count) {
> +					pr_err("%s engine reset not recorded!\n",
> +					       engine->name);
> +					err = -EINVAL;
> +					goto restore;
> +				}
>  			}
>  
>  			count++;
> +
> +restore:
> +			err2 = intel_selftest_restore_policy(engine, &saved);
> +			if (err2)
> +				pr_err("[%s] Restore policy failed: %d!\n", engine->name, err);
> +			if (err == 0)
> +				err = err2;
> +			if (err)
> +				break;
>  		} while (time_before(jiffies, end_time));
>  		clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
>  		st_engine_heartbeat_enable(engine);
> @@ -941,10 +992,13 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  		struct active_engine threads[I915_NUM_ENGINES] = {};
>  		unsigned long device = i915_reset_count(global);
>  		unsigned long count = 0, reported;
> +		bool using_guc = intel_engine_uses_guc(engine);
>  		IGT_TIMEOUT(end_time);
>  
> -		if (flags & TEST_ACTIVE &&
> -		    !intel_engine_can_store_dword(engine))
> +		if (flags & TEST_ACTIVE) {
> +			if (!intel_engine_can_store_dword(engine))
> +				continue;
> +		} else if (using_guc)
>  			continue;
>  
>  		if (!wait_for_idle(engine)) {
> @@ -984,17 +1038,26 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  
>  		yield(); /* start all threads before we begin */
>  
> -		st_engine_heartbeat_disable(engine);
> +		st_engine_heartbeat_disable_no_pm(engine);
>  		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
>  		do {
>  			struct i915_request *rq = NULL;
> +			struct intel_selftest_saved_policy saved;
> +			int err2;
> +
> +			err = intel_selftest_modify_policy(engine, &saved,
> +							  SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
> +			if (err) {
> +				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
> +				break;
> +			}
>  
>  			if (flags & TEST_ACTIVE) {
>  				rq = hang_create_request(&h, engine);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
>  					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
> -					break;
> +					goto restore;
>  				}
>  
>  				i915_request_get(rq);
> @@ -1010,15 +1073,27 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  
>  					i915_request_put(rq);
>  					err = -EIO;
> -					break;
> +					goto restore;
>  				}
> +			} else {
> +				intel_engine_pm_get(engine);
>  			}
>  
> -			err = intel_engine_reset(engine, NULL);
> -			if (err) {
> -				pr_err("i915_reset_engine(%s:%s): failed, err=%d\n",
> -				       engine->name, test_name, err);
> -				break;
> +			if (!using_guc) {
> +				err = intel_engine_reset(engine, NULL);
> +				if (err) {
> +					pr_err("i915_reset_engine(%s:%s): failed, err=%d\n",
> +					       engine->name, test_name, err);
> +					goto restore;
> +				}
> +			}
> +
> +			if (rq) {
> +				/* Ensure the reset happens and kills the engine */
> +				err = intel_selftest_wait_for_rq(rq);
> +				if (err)
> +					pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n",
> +					       engine->name, rq->fence.context, rq->fence.seqno, rq->context->guc_id, err);
>  			}
>  
>  			count++;
> @@ -1035,7 +1110,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  					GEM_TRACE_DUMP();
>  					intel_gt_set_wedged(gt);
>  					err = -EIO;
> -					break;
> +					goto restore;
>  				}
>  
>  				if (i915_request_wait(rq, 0, HZ / 5) < 0) {
> @@ -1054,12 +1129,15 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  					GEM_TRACE_DUMP();
>  					intel_gt_set_wedged(gt);
>  					err = -EIO;
> -					break;
> +					goto restore;
>  				}
>  
>  				i915_request_put(rq);
>  			}
>  
> +			if (!(flags & TEST_ACTIVE))
> +				intel_engine_pm_put(engine);
> +
>  			if (!(flags & TEST_SELF) && !wait_for_idle(engine)) {
>  				struct drm_printer p =
>  					drm_info_printer(gt->i915->drm.dev);
> @@ -1071,22 +1149,34 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  						  "%s\n", engine->name);
>  
>  				err = -EIO;
> -				break;
> +				goto restore;
>  			}
> +
> +restore:
> +			err2 = intel_selftest_restore_policy(engine, &saved);
> +			if (err2)
> +				pr_err("[%s] Restore policy failed: %d!\n", engine->name, err2);
> +			if (err == 0)
> +				err = err2;
> +			if (err)
> +				break;
>  		} while (time_before(jiffies, end_time));
>  		clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
> -		st_engine_heartbeat_enable(engine);
> +		st_engine_heartbeat_enable_no_pm(engine);
>  
>  		pr_info("i915_reset_engine(%s:%s): %lu resets\n",
>  			engine->name, test_name, count);
>  
> -		reported = i915_reset_engine_count(global, engine);
> -		reported -= threads[engine->id].resets;
> -		if (reported != count) {
> -			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
> -			       engine->name, test_name, count, reported);
> -			if (!err)
> -				err = -EINVAL;
> +		/* GuC based resets are not logged per engine */
> +		if (!using_guc) {
> +			reported = i915_reset_engine_count(global, engine);
> +			reported -= threads[engine->id].resets;
> +			if (reported != count) {
> +				pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
> +				       engine->name, test_name, count, reported);
> +				if (!err)
> +					err = -EINVAL;
> +			}
>  		}
>  
>  unwind:
> @@ -1105,15 +1195,18 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  			}
>  			put_task_struct(threads[tmp].task);
>  
> -			if (other->uabi_class != engine->uabi_class &&
> -			    threads[tmp].resets !=
> -			    i915_reset_engine_count(global, other)) {
> -				pr_err("Innocent engine %s was reset (count=%ld)\n",
> -				       other->name,
> -				       i915_reset_engine_count(global, other) -
> -				       threads[tmp].resets);
> -				if (!err)
> -					err = -EINVAL;
> +			/* GuC based resets are not logged per engine */
> +			if (!using_guc) {
> +				if (other->uabi_class != engine->uabi_class &&
> +				    threads[tmp].resets !=
> +				    i915_reset_engine_count(global, other)) {
> +					pr_err("Innocent engine %s was reset (count=%ld)\n",
> +					       other->name,
> +					       i915_reset_engine_count(global, other) -
> +					       threads[tmp].resets);
> +					if (!err)
> +						err = -EINVAL;
> +				}
>  			}
>  		}
>  
> @@ -1553,18 +1646,29 @@ static int igt_reset_queue(void *arg)
>  		goto unlock;
>  
>  	for_each_engine(engine, gt, id) {
> +		struct intel_selftest_saved_policy saved;
>  		struct i915_request *prev;
>  		IGT_TIMEOUT(end_time);
>  		unsigned int count;
> +		bool using_guc = intel_engine_uses_guc(engine);
>  
>  		if (!intel_engine_can_store_dword(engine))
>  			continue;
>  
> +		if (using_guc) {
> +			err = intel_selftest_modify_policy(engine, &saved,
> +							  SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK);
> +			if (err) {
> +				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
> +				goto fini;
> +			}
> +		}
> +
>  		prev = hang_create_request(&h, engine);
>  		if (IS_ERR(prev)) {
>  			err = PTR_ERR(prev);
>  			pr_err("[%s] Create 'prev' hang request failed: %d!\n", engine->name, err);
> -			goto fini;
> +			goto restore;
>  		}
>  
>  		i915_request_get(prev);
> @@ -1579,7 +1683,7 @@ static int igt_reset_queue(void *arg)
>  			if (IS_ERR(rq)) {
>  				err = PTR_ERR(rq);
>  				pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			i915_request_get(rq);
> @@ -1604,7 +1708,7 @@ static int igt_reset_queue(void *arg)
>  
>  				GEM_TRACE_DUMP();
>  				intel_gt_set_wedged(gt);
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			if (!wait_until_running(&h, prev)) {
> @@ -1622,7 +1726,7 @@ static int igt_reset_queue(void *arg)
>  				intel_gt_set_wedged(gt);
>  
>  				err = -EIO;
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			reset_count = fake_hangcheck(gt, BIT(id));
> @@ -1633,7 +1737,7 @@ static int igt_reset_queue(void *arg)
>  				i915_request_put(rq);
>  				i915_request_put(prev);
>  				err = -EINVAL;
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			if (rq->fence.error) {
> @@ -1642,7 +1746,7 @@ static int igt_reset_queue(void *arg)
>  				i915_request_put(rq);
>  				i915_request_put(prev);
>  				err = -EINVAL;
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			if (i915_reset_count(global) == reset_count) {
> @@ -1650,7 +1754,7 @@ static int igt_reset_queue(void *arg)
>  				i915_request_put(rq);
>  				i915_request_put(prev);
>  				err = -EINVAL;
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			i915_request_put(prev);
> @@ -1665,6 +1769,17 @@ static int igt_reset_queue(void *arg)
>  
>  		i915_request_put(prev);
>  
> +restore:
> +		if (using_guc) {
> +			int err2 = intel_selftest_restore_policy(engine, &saved);
> +			if (err2)
> +				pr_err("%s:%d> [%s] Restore policy failed: %d!\n", __func__, __LINE__, engine->name, err2);
> +			if (err == 0)
> +				err = err2;
> +		}
> +		if (err)
> +			goto fini;
> +
>  		err = igt_flush_test(gt->i915);
>  		if (err) {
>  			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> index b7314739ee40..13d25bf2a94a 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> @@ -408,7 +408,8 @@ static int live_mocs_reset(void *arg)
>  		struct intel_context *ce;
>  		int err2;
>  
> -		err = intel_selftest_modify_policy(engine, &saved);
> +		err = intel_selftest_modify_policy(engine, &saved,
> +						   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
>  		if (err)
>  			break;
>  
> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> index 7727bc531ea9..d820f0b41634 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> @@ -810,7 +810,8 @@ static int live_reset_whitelist(void *arg)
>  				struct intel_selftest_saved_policy saved;
>  				int err2;
>  
> -				err = intel_selftest_modify_policy(engine, &saved);
> +				err = intel_selftest_modify_policy(engine, &saved,
> +								   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
>  				if(err)
>  					goto out;
>  
> @@ -1268,7 +1269,8 @@ live_engine_reset_workarounds(void *arg)
>  		int ret2;
>  
>  		pr_info("Verifying after %s reset...\n", engine->name);
> -		ret = intel_selftest_modify_policy(engine, &saved);
> +		ret = intel_selftest_modify_policy(engine, &saved,
> +						   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
>  		if (ret)
>  			break;
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 1c30d04733ff..536fdbc406c6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1235,6 +1235,9 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>  {
>  	desc->policy_flags = 0;
>  
> +	if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
> +		desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
> +
>  	/* NB: For both of these, zero means disabled. */
>  	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
>  	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> index 91ecd8a1bd21..69db139f9e0d 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> @@ -16,7 +16,8 @@
>  #define WAIT_FOR_RESET_TIME	1000
>  
>  int intel_selftest_modify_policy(struct intel_engine_cs *engine,
> -				 struct intel_selftest_saved_policy *saved)
> +				 struct intel_selftest_saved_policy *saved,
> +				 u32 modify_type)
>  
>  {
>  	int err;
> @@ -26,18 +27,30 @@ int intel_selftest_modify_policy(struct intel_engine_cs *engine,
>  	saved->timeslice = engine->props.timeslice_duration_ms;
>  	saved->preempt_timeout = engine->props.preempt_timeout_ms;
>  
> -	/*
> -	 * Enable force pre-emption on time slice expiration
> -	 * together with engine reset on pre-emption timeout.
> -	 * This is required to make the GuC notice and reset
> -	 * the single hanging context.
> -	 * Also, reduce the preemption timeout to something
> -	 * small to speed the test up.
> -	 */
> -	engine->i915->params.reset = 2;
> -	engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
> -	engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
> -	engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
> +	switch (modify_type) {
> +	case SELFTEST_SCHEDULER_MODIFY_FAST_RESET:
> +		/*
> +		 * Enable force pre-emption on time slice expiration
> +		 * together with engine reset on pre-emption timeout.
> +		 * This is required to make the GuC notice and reset
> +		 * the single hanging context.
> +		 * Also, reduce the preemption timeout to something
> +		 * small to speed the test up.
> +		 */
> +		engine->i915->params.reset = 2;
> +		engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
> +		engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
> +		engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
> +		break;
> +
> +	case SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK:
> +		engine->props.preempt_timeout_ms = 0;
> +		break;
> +
> +	default:
> +		pr_err("Invalid scheduler policy modification type: %d!\n", modify_type);
> +		return -EINVAL;
> +	}
>  
>  	if (!intel_engine_uses_guc(engine))
>  		return 0;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> index f30e96f0ba95..050bc5a8ba8b 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> @@ -19,8 +19,15 @@ struct intel_selftest_saved_policy
>  	u64 preempt_timeout;
>  };
>  
> +enum selftest_scheduler_modify
> +{
> +	SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK = 0,
> +	SELFTEST_SCHEDULER_MODIFY_FAST_RESET,
> +};
> +
>  int intel_selftest_modify_policy(struct intel_engine_cs *engine,
> -				 struct intel_selftest_saved_policy *saved);
> +				 struct intel_selftest_saved_policy *saved,
> +				 enum selftest_scheduler_modify modify_type);
>  int intel_selftest_restore_policy(struct intel_engine_cs *engine,
>  				  struct intel_selftest_saved_policy *saved);
>  int intel_selftest_wait_for_rq( struct i915_request *rq);
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 48/51] drm/i915/selftest: Fix hangcheck self test for GuC submission
@ 2021-07-16 23:43     ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 23:43 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:21PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> When GuC submission is enabled, the GuC controls engine resets. Rather
> than explicitly triggering a reset, the driver must submit a hanging
> context to GuC and wait for the reset to occur.
> 
> Conversely, one of the tests specifically sends hanging batches to the
> engines but wants them to sit around until a manual reset of the full
> GT (including GuC itself). That means disabling GuC based engine
> resets to prevent those from killing the hanging batch too soon. So,
> add support to the scheduling policy helper for disabling resets as
> well as making them quicker!
> 
> In GuC submission mode, the 'is engine idle' test basically turns into
> 'is engine PM wakelock held'. Independently, there is a heartbeat
> disable helper function that the tests use. For unexplained reasons,
> this acquires the engine wakelock before disabling the heartbeat and
> only releases it when re-enabling the heartbeat. As one of the tests
> tries to do a wait for idle in the middle of a heartbeat disabled
> section, it is therefore guaranteed to always fail. Added a 'no_pm'
> variant of the heartbeat helper that allows the engine to be asleep
> while also having heartbeats disabled.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   1 +
>  .../drm/i915/gt/selftest_engine_heartbeat.c   |  22 ++
>  .../drm/i915/gt/selftest_engine_heartbeat.h   |   2 +
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 223 +++++++++++++-----
>  drivers/gpu/drm/i915/gt/selftest_mocs.c       |   3 +-
>  .../gpu/drm/i915/gt/selftest_workarounds.c    |   6 +-
>  .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   3 +
>  .../i915/selftests/intel_scheduler_helpers.c  |  39 ++-
>  .../i915/selftests/intel_scheduler_helpers.h  |   9 +-
>  9 files changed, 237 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index d66b732a91c2..eec57e57403f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -449,6 +449,7 @@ struct intel_engine_cs {
>  #define I915_ENGINE_IS_VIRTUAL       BIT(5)
>  #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
>  #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
> +#define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
>  	unsigned int flags;
>  
>  	/*
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> index 4896e4ccad50..317eebf086c3 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> @@ -405,3 +405,25 @@ void st_engine_heartbeat_enable(struct intel_engine_cs *engine)
>  	engine->props.heartbeat_interval_ms =
>  		engine->defaults.heartbeat_interval_ms;
>  }
> +
> +void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine)
> +{
> +	engine->props.heartbeat_interval_ms = 0;
> +
> +	/*
> +	 * Park the heartbeat but without holding the PM lock as that
> +	 * makes the engines appear not-idle. Note that if/when unpark
> +	 * is called due to the PM lock being acquired later the
> +	 * heartbeat still won't be enabled because of the above = 0.
> +	 */
> +	if (intel_engine_pm_get_if_awake(engine)) {
> +		intel_engine_park_heartbeat(engine);
> +		intel_engine_pm_put(engine);
> +	}
> +}
> +
> +void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine)
> +{
> +	engine->props.heartbeat_interval_ms =
> +		engine->defaults.heartbeat_interval_ms;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
> index cd27113d5400..81da2cd8e406 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.h
> @@ -9,6 +9,8 @@
>  struct intel_engine_cs;
>  
>  void st_engine_heartbeat_disable(struct intel_engine_cs *engine);
> +void st_engine_heartbeat_disable_no_pm(struct intel_engine_cs *engine);
>  void st_engine_heartbeat_enable(struct intel_engine_cs *engine);
> +void st_engine_heartbeat_enable_no_pm(struct intel_engine_cs *engine);
>  
>  #endif /* SELFTEST_ENGINE_HEARTBEAT_H */
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 0ed87cc4d063..971c0c249eb0 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -17,6 +17,8 @@
>  #include "selftests/igt_flush_test.h"
>  #include "selftests/igt_reset.h"
>  #include "selftests/igt_atomic.h"
> +#include "selftests/igt_spinner.h"
> +#include "selftests/intel_scheduler_helpers.h"
>  
>  #include "selftests/mock_drm.h"
>  
> @@ -449,6 +451,14 @@ static int igt_reset_nop_engine(void *arg)
>  		IGT_TIMEOUT(end_time);
>  		int err;
>  
> +		if (intel_engine_uses_guc(engine)) {
> +			/* Engine level resets are triggered by GuC when a hang
> +			 * is detected. They can't be triggered by the KMD any
> +			 * more. Thus a nop batch cannot be used as a reset test
> +			 */
> +			continue;
> +		}
> +
>  		ce = intel_context_create(engine);
>  		if (IS_ERR(ce)) {
>  			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
> @@ -560,6 +570,10 @@ static int igt_reset_fail_engine(void *arg)
>  		IGT_TIMEOUT(end_time);
>  		int err;
>  
> +		/* Can't manually break the reset if i915 doesn't perform it */
> +		if (intel_engine_uses_guc(engine))
> +			continue;
> +
>  		ce = intel_context_create(engine);
>  		if (IS_ERR(ce)) {
>  			pr_err("[%s] Create context failed: %d!\n", engine->name, err);
> @@ -699,8 +713,12 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  	for_each_engine(engine, gt, id) {
>  		unsigned int reset_count, reset_engine_count;
>  		unsigned long count;
> +		bool using_guc = intel_engine_uses_guc(engine);
>  		IGT_TIMEOUT(end_time);
>  
> +		if (using_guc && !active)
> +			continue;
> +
>  		if (active && !intel_engine_can_store_dword(engine))
>  			continue;
>  
> @@ -718,14 +736,23 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
>  		count = 0;
>  		do {
> -			if (active) {
> -				struct i915_request *rq;
> +			struct i915_request *rq = NULL;
> +			struct intel_selftest_saved_policy saved;
> +			int err2;
> +
> +			err = intel_selftest_modify_policy(engine, &saved,
> +							   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
> +			if (err) {
> +				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
> +				break;
> +			}
>  
> +			if (active) {
>  				rq = hang_create_request(&h, engine);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
>  					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
> -					break;
> +					goto restore;
>  				}
>  
>  				i915_request_get(rq);
> @@ -741,34 +768,58 @@ static int __igt_reset_engine(struct intel_gt *gt, bool active)
>  
>  					i915_request_put(rq);
>  					err = -EIO;
> -					break;
> +					goto restore;
>  				}
> +			}
>  
> -				i915_request_put(rq);
> +			if (!using_guc) {
> +				err = intel_engine_reset(engine, NULL);
> +				if (err) {
> +					pr_err("intel_engine_reset(%s) failed, err:%d\n",
> +					       engine->name, err);
> +					goto skip;
> +				}
>  			}
>  
> -			err = intel_engine_reset(engine, NULL);
> -			if (err) {
> -				pr_err("intel_engine_reset(%s) failed, err:%d\n",
> -				       engine->name, err);
> -				break;
> +			if (rq) {
> +				/* Ensure the reset happens and kills the engine */
> +				err = intel_selftest_wait_for_rq(rq);
> +				if (err)
> +					pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n",
> +					       engine->name, rq->fence.context, rq->fence.seqno, rq->context->guc_id, err);
>  			}
>  
> +skip:
> +			if (rq)
> +				i915_request_put(rq);
> +
>  			if (i915_reset_count(global) != reset_count) {
>  				pr_err("Full GPU reset recorded! (engine reset expected)\n");
>  				err = -EINVAL;
> -				break;
> +				goto restore;
>  			}
>  
> -			if (i915_reset_engine_count(global, engine) !=
> -			    ++reset_engine_count) {
> -				pr_err("%s engine reset not recorded!\n",
> -				       engine->name);
> -				err = -EINVAL;
> -				break;
> +			/* GuC based resets are not logged per engine */
> +			if (!using_guc) {
> +				if (i915_reset_engine_count(global, engine) !=
> +				    ++reset_engine_count) {
> +					pr_err("%s engine reset not recorded!\n",
> +					       engine->name);
> +					err = -EINVAL;
> +					goto restore;
> +				}
>  			}
>  
>  			count++;
> +
> +restore:
> +			err2 = intel_selftest_restore_policy(engine, &saved);
> +			if (err2)
> +				pr_err("[%s] Restore policy failed: %d!\n", engine->name, err);
> +			if (err == 0)
> +				err = err2;
> +			if (err)
> +				break;
>  		} while (time_before(jiffies, end_time));
>  		clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
>  		st_engine_heartbeat_enable(engine);
> @@ -941,10 +992,13 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  		struct active_engine threads[I915_NUM_ENGINES] = {};
>  		unsigned long device = i915_reset_count(global);
>  		unsigned long count = 0, reported;
> +		bool using_guc = intel_engine_uses_guc(engine);
>  		IGT_TIMEOUT(end_time);
>  
> -		if (flags & TEST_ACTIVE &&
> -		    !intel_engine_can_store_dword(engine))
> +		if (flags & TEST_ACTIVE) {
> +			if (!intel_engine_can_store_dword(engine))
> +				continue;
> +		} else if (using_guc)
>  			continue;
>  
>  		if (!wait_for_idle(engine)) {
> @@ -984,17 +1038,26 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  
>  		yield(); /* start all threads before we begin */
>  
> -		st_engine_heartbeat_disable(engine);
> +		st_engine_heartbeat_disable_no_pm(engine);
>  		set_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
>  		do {
>  			struct i915_request *rq = NULL;
> +			struct intel_selftest_saved_policy saved;
> +			int err2;
> +
> +			err = intel_selftest_modify_policy(engine, &saved,
> +							  SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
> +			if (err) {
> +				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
> +				break;
> +			}
>  
>  			if (flags & TEST_ACTIVE) {
>  				rq = hang_create_request(&h, engine);
>  				if (IS_ERR(rq)) {
>  					err = PTR_ERR(rq);
>  					pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
> -					break;
> +					goto restore;
>  				}
>  
>  				i915_request_get(rq);
> @@ -1010,15 +1073,27 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  
>  					i915_request_put(rq);
>  					err = -EIO;
> -					break;
> +					goto restore;
>  				}
> +			} else {
> +				intel_engine_pm_get(engine);
>  			}
>  
> -			err = intel_engine_reset(engine, NULL);
> -			if (err) {
> -				pr_err("i915_reset_engine(%s:%s): failed, err=%d\n",
> -				       engine->name, test_name, err);
> -				break;
> +			if (!using_guc) {
> +				err = intel_engine_reset(engine, NULL);
> +				if (err) {
> +					pr_err("i915_reset_engine(%s:%s): failed, err=%d\n",
> +					       engine->name, test_name, err);
> +					goto restore;
> +				}
> +			}
> +
> +			if (rq) {
> +				/* Ensure the reset happens and kills the engine */
> +				err = intel_selftest_wait_for_rq(rq);
> +				if (err)
> +					pr_err("[%s] Wait for request %lld:%lld [0x%04X] failed: %d!\n",
> +					       engine->name, rq->fence.context, rq->fence.seqno, rq->context->guc_id, err);
>  			}
>  
>  			count++;
> @@ -1035,7 +1110,7 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  					GEM_TRACE_DUMP();
>  					intel_gt_set_wedged(gt);
>  					err = -EIO;
> -					break;
> +					goto restore;
>  				}
>  
>  				if (i915_request_wait(rq, 0, HZ / 5) < 0) {
> @@ -1054,12 +1129,15 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  					GEM_TRACE_DUMP();
>  					intel_gt_set_wedged(gt);
>  					err = -EIO;
> -					break;
> +					goto restore;
>  				}
>  
>  				i915_request_put(rq);
>  			}
>  
> +			if (!(flags & TEST_ACTIVE))
> +				intel_engine_pm_put(engine);
> +
>  			if (!(flags & TEST_SELF) && !wait_for_idle(engine)) {
>  				struct drm_printer p =
>  					drm_info_printer(gt->i915->drm.dev);
> @@ -1071,22 +1149,34 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  						  "%s\n", engine->name);
>  
>  				err = -EIO;
> -				break;
> +				goto restore;
>  			}
> +
> +restore:
> +			err2 = intel_selftest_restore_policy(engine, &saved);
> +			if (err2)
> +				pr_err("[%s] Restore policy failed: %d!\n", engine->name, err2);
> +			if (err == 0)
> +				err = err2;
> +			if (err)
> +				break;
>  		} while (time_before(jiffies, end_time));
>  		clear_bit(I915_RESET_ENGINE + id, &gt->reset.flags);
> -		st_engine_heartbeat_enable(engine);
> +		st_engine_heartbeat_enable_no_pm(engine);
>  
>  		pr_info("i915_reset_engine(%s:%s): %lu resets\n",
>  			engine->name, test_name, count);
>  
> -		reported = i915_reset_engine_count(global, engine);
> -		reported -= threads[engine->id].resets;
> -		if (reported != count) {
> -			pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
> -			       engine->name, test_name, count, reported);
> -			if (!err)
> -				err = -EINVAL;
> +		/* GuC based resets are not logged per engine */
> +		if (!using_guc) {
> +			reported = i915_reset_engine_count(global, engine);
> +			reported -= threads[engine->id].resets;
> +			if (reported != count) {
> +				pr_err("i915_reset_engine(%s:%s): reset %lu times, but reported %lu\n",
> +				       engine->name, test_name, count, reported);
> +				if (!err)
> +					err = -EINVAL;
> +			}
>  		}
>  
>  unwind:
> @@ -1105,15 +1195,18 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  			}
>  			put_task_struct(threads[tmp].task);
>  
> -			if (other->uabi_class != engine->uabi_class &&
> -			    threads[tmp].resets !=
> -			    i915_reset_engine_count(global, other)) {
> -				pr_err("Innocent engine %s was reset (count=%ld)\n",
> -				       other->name,
> -				       i915_reset_engine_count(global, other) -
> -				       threads[tmp].resets);
> -				if (!err)
> -					err = -EINVAL;
> +			/* GuC based resets are not logged per engine */
> +			if (!using_guc) {
> +				if (other->uabi_class != engine->uabi_class &&
> +				    threads[tmp].resets !=
> +				    i915_reset_engine_count(global, other)) {
> +					pr_err("Innocent engine %s was reset (count=%ld)\n",
> +					       other->name,
> +					       i915_reset_engine_count(global, other) -
> +					       threads[tmp].resets);
> +					if (!err)
> +						err = -EINVAL;
> +				}
>  			}
>  		}
>  
> @@ -1553,18 +1646,29 @@ static int igt_reset_queue(void *arg)
>  		goto unlock;
>  
>  	for_each_engine(engine, gt, id) {
> +		struct intel_selftest_saved_policy saved;
>  		struct i915_request *prev;
>  		IGT_TIMEOUT(end_time);
>  		unsigned int count;
> +		bool using_guc = intel_engine_uses_guc(engine);
>  
>  		if (!intel_engine_can_store_dword(engine))
>  			continue;
>  
> +		if (using_guc) {
> +			err = intel_selftest_modify_policy(engine, &saved,
> +							  SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK);
> +			if (err) {
> +				pr_err("[%s] Modify policy failed: %d!\n", engine->name, err);
> +				goto fini;
> +			}
> +		}
> +
>  		prev = hang_create_request(&h, engine);
>  		if (IS_ERR(prev)) {
>  			err = PTR_ERR(prev);
>  			pr_err("[%s] Create 'prev' hang request failed: %d!\n", engine->name, err);
> -			goto fini;
> +			goto restore;
>  		}
>  
>  		i915_request_get(prev);
> @@ -1579,7 +1683,7 @@ static int igt_reset_queue(void *arg)
>  			if (IS_ERR(rq)) {
>  				err = PTR_ERR(rq);
>  				pr_err("[%s] Create hang request failed: %d!\n", engine->name, err);
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			i915_request_get(rq);
> @@ -1604,7 +1708,7 @@ static int igt_reset_queue(void *arg)
>  
>  				GEM_TRACE_DUMP();
>  				intel_gt_set_wedged(gt);
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			if (!wait_until_running(&h, prev)) {
> @@ -1622,7 +1726,7 @@ static int igt_reset_queue(void *arg)
>  				intel_gt_set_wedged(gt);
>  
>  				err = -EIO;
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			reset_count = fake_hangcheck(gt, BIT(id));
> @@ -1633,7 +1737,7 @@ static int igt_reset_queue(void *arg)
>  				i915_request_put(rq);
>  				i915_request_put(prev);
>  				err = -EINVAL;
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			if (rq->fence.error) {
> @@ -1642,7 +1746,7 @@ static int igt_reset_queue(void *arg)
>  				i915_request_put(rq);
>  				i915_request_put(prev);
>  				err = -EINVAL;
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			if (i915_reset_count(global) == reset_count) {
> @@ -1650,7 +1754,7 @@ static int igt_reset_queue(void *arg)
>  				i915_request_put(rq);
>  				i915_request_put(prev);
>  				err = -EINVAL;
> -				goto fini;
> +				goto restore;
>  			}
>  
>  			i915_request_put(prev);
> @@ -1665,6 +1769,17 @@ static int igt_reset_queue(void *arg)
>  
>  		i915_request_put(prev);
>  
> +restore:
> +		if (using_guc) {
> +			int err2 = intel_selftest_restore_policy(engine, &saved);
> +			if (err2)
> +				pr_err("%s:%d> [%s] Restore policy failed: %d!\n", __func__, __LINE__, engine->name, err2);
> +			if (err == 0)
> +				err = err2;
> +		}
> +		if (err)
> +			goto fini;
> +
>  		err = igt_flush_test(gt->i915);
>  		if (err) {
>  			pr_err("[%s] Flush failed: %d!\n", engine->name, err);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> index b7314739ee40..13d25bf2a94a 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> @@ -408,7 +408,8 @@ static int live_mocs_reset(void *arg)
>  		struct intel_context *ce;
>  		int err2;
>  
> -		err = intel_selftest_modify_policy(engine, &saved);
> +		err = intel_selftest_modify_policy(engine, &saved,
> +						   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
>  		if (err)
>  			break;
>  
> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> index 7727bc531ea9..d820f0b41634 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> @@ -810,7 +810,8 @@ static int live_reset_whitelist(void *arg)
>  				struct intel_selftest_saved_policy saved;
>  				int err2;
>  
> -				err = intel_selftest_modify_policy(engine, &saved);
> +				err = intel_selftest_modify_policy(engine, &saved,
> +								   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
>  				if(err)
>  					goto out;
>  
> @@ -1268,7 +1269,8 @@ live_engine_reset_workarounds(void *arg)
>  		int ret2;
>  
>  		pr_info("Verifying after %s reset...\n", engine->name);
> -		ret = intel_selftest_modify_policy(engine, &saved);
> +		ret = intel_selftest_modify_policy(engine, &saved,
> +						   SELFTEST_SCHEDULER_MODIFY_FAST_RESET);
>  		if (ret)
>  			break;
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 1c30d04733ff..536fdbc406c6 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1235,6 +1235,9 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>  {
>  	desc->policy_flags = 0;
>  
> +	if (engine->flags & I915_ENGINE_WANT_FORCED_PREEMPTION)
> +		desc->policy_flags |= CONTEXT_POLICY_FLAG_PREEMPT_TO_IDLE;
> +
>  	/* NB: For both of these, zero means disabled. */
>  	desc->execution_quantum = engine->props.timeslice_duration_ms * 1000;
>  	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> index 91ecd8a1bd21..69db139f9e0d 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> @@ -16,7 +16,8 @@
>  #define WAIT_FOR_RESET_TIME	1000
>  
>  int intel_selftest_modify_policy(struct intel_engine_cs *engine,
> -				 struct intel_selftest_saved_policy *saved)
> +				 struct intel_selftest_saved_policy *saved,
> +				 u32 modify_type)
>  
>  {
>  	int err;
> @@ -26,18 +27,30 @@ int intel_selftest_modify_policy(struct intel_engine_cs *engine,
>  	saved->timeslice = engine->props.timeslice_duration_ms;
>  	saved->preempt_timeout = engine->props.preempt_timeout_ms;
>  
> -	/*
> -	 * Enable force pre-emption on time slice expiration
> -	 * together with engine reset on pre-emption timeout.
> -	 * This is required to make the GuC notice and reset
> -	 * the single hanging context.
> -	 * Also, reduce the preemption timeout to something
> -	 * small to speed the test up.
> -	 */
> -	engine->i915->params.reset = 2;
> -	engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
> -	engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
> -	engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
> +	switch (modify_type) {
> +	case SELFTEST_SCHEDULER_MODIFY_FAST_RESET:
> +		/*
> +		 * Enable force pre-emption on time slice expiration
> +		 * together with engine reset on pre-emption timeout.
> +		 * This is required to make the GuC notice and reset
> +		 * the single hanging context.
> +		 * Also, reduce the preemption timeout to something
> +		 * small to speed the test up.
> +		 */
> +		engine->i915->params.reset = 2;
> +		engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
> +		engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
> +		engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
> +		break;
> +
> +	case SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK:
> +		engine->props.preempt_timeout_ms = 0;
> +		break;
> +
> +	default:
> +		pr_err("Invalid scheduler policy modification type: %d!\n", modify_type);
> +		return -EINVAL;
> +	}
>  
>  	if (!intel_engine_uses_guc(engine))
>  		return 0;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> index f30e96f0ba95..050bc5a8ba8b 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> @@ -19,8 +19,15 @@ struct intel_selftest_saved_policy
>  	u64 preempt_timeout;
>  };
>  
> +enum selftest_scheduler_modify
> +{
> +	SELFTEST_SCHEDULER_MODIFY_NO_HANGCHECK = 0,
> +	SELFTEST_SCHEDULER_MODIFY_FAST_RESET,
> +};
> +
>  int intel_selftest_modify_policy(struct intel_engine_cs *engine,
> -				 struct intel_selftest_saved_policy *saved);
> +				 struct intel_selftest_saved_policy *saved,
> +				 enum selftest_scheduler_modify modify_type);
>  int intel_selftest_restore_policy(struct intel_engine_cs *engine,
>  				  struct intel_selftest_saved_policy *saved);
>  int intel_selftest_wait_for_rq( struct i915_request *rq);
> -- 
> 2.28.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 46/51] drm/i915/selftest: Fix MOCS selftest for GuC submission
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-16 23:57     ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 23:57 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

On Fri, Jul 16, 2021 at 01:17:19PM -0700, Matthew Brost wrote:
> From: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
> 
> When GuC submission is enabled, the GuC controls engine resets. Rather
> than explicitly triggering a reset, the driver must submit a hanging
> context to GuC and wait for the reset to occur.
> 
> Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_mocs.c | 49 ++++++++++++++++++-------
>  1 file changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> index 8763bbeca0f7..b7314739ee40 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> @@ -10,6 +10,7 @@
>  #include "gem/selftests/mock_context.h"
>  #include "selftests/igt_reset.h"
>  #include "selftests/igt_spinner.h"
> +#include "selftests/intel_scheduler_helpers.h"
>  
>  struct live_mocs {
>  	struct drm_i915_mocs_table table;
> @@ -318,7 +319,8 @@ static int live_mocs_clean(void *arg)
>  }
>  
>  static int active_engine_reset(struct intel_context *ce,
> -			       const char *reason)
> +			       const char *reason,
> +			       bool using_guc)
>  {
>  	struct igt_spinner spin;
>  	struct i915_request *rq;
> @@ -335,9 +337,13 @@ static int active_engine_reset(struct intel_context *ce,
>  	}
>  
>  	err = request_add_spin(rq, &spin);
> -	if (err == 0)
> +	if (err == 0 && !using_guc)
>  		err = intel_engine_reset(ce->engine, reason);
>  
> +	/* Ensure the reset happens and kills the engine */
> +	if (err == 0)
> +		err = intel_selftest_wait_for_rq(rq);
> +
>  	igt_spinner_end(&spin);
>  	igt_spinner_fini(&spin);
>  
> @@ -345,21 +351,23 @@ static int active_engine_reset(struct intel_context *ce,
>  }
>  
>  static int __live_mocs_reset(struct live_mocs *mocs,
> -			     struct intel_context *ce)
> +			     struct intel_context *ce, bool using_guc)
>  {
>  	struct intel_gt *gt = ce->engine->gt;
>  	int err;
>  
>  	if (intel_has_reset_engine(gt)) {
> -		err = intel_engine_reset(ce->engine, "mocs");
> -		if (err)
> -			return err;
> -
> -		err = check_mocs_engine(mocs, ce);
> -		if (err)
> -			return err;
> +		if (!using_guc) {
> +			err = intel_engine_reset(ce->engine, "mocs");
> +			if (err)
> +				return err;
> +
> +			err = check_mocs_engine(mocs, ce);
> +			if (err)
> +				return err;
> +		}
>  
> -		err = active_engine_reset(ce, "mocs");
> +		err = active_engine_reset(ce, "mocs", using_guc);
>  		if (err)
>  			return err;
>  
> @@ -395,19 +403,32 @@ static int live_mocs_reset(void *arg)
>  
>  	igt_global_reset_lock(gt);
>  	for_each_engine(engine, gt, id) {
> +		bool using_guc = intel_engine_uses_guc(engine);
> +		struct intel_selftest_saved_policy saved;
>  		struct intel_context *ce;
> +		int err2;
> +
> +		err = intel_selftest_modify_policy(engine, &saved);
> +		if (err)
> +			break;
>  
>  		ce = mocs_context_create(engine);
>  		if (IS_ERR(ce)) {
>  			err = PTR_ERR(ce);
> -			break;
> +			goto restore;
>  		}
>  
>  		intel_engine_pm_get(engine);
> -		err = __live_mocs_reset(&mocs, ce);
> -		intel_engine_pm_put(engine);
>  
> +		err = __live_mocs_reset(&mocs, ce, using_guc);
> +
> +		intel_engine_pm_put(engine);
>  		intel_context_put(ce);
> +
> +restore:
> +		err2 = intel_selftest_restore_policy(engine, &saved);
> +		if (err == 0)
> +			err = err2;
>  		if (err)
>  			break;
>  	}
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 46/51] drm/i915/selftest: Fix MOCS selftest for GuC submission
@ 2021-07-16 23:57     ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-16 23:57 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:19PM -0700, Matthew Brost wrote:
> From: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
> 
> When GuC submission is enabled, the GuC controls engine resets. Rather
> than explicitly triggering a reset, the driver must submit a hanging
> context to GuC and wait for the reset to occur.
> 
> Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_mocs.c | 49 ++++++++++++++++++-------
>  1 file changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> index 8763bbeca0f7..b7314739ee40 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> @@ -10,6 +10,7 @@
>  #include "gem/selftests/mock_context.h"
>  #include "selftests/igt_reset.h"
>  #include "selftests/igt_spinner.h"
> +#include "selftests/intel_scheduler_helpers.h"
>  
>  struct live_mocs {
>  	struct drm_i915_mocs_table table;
> @@ -318,7 +319,8 @@ static int live_mocs_clean(void *arg)
>  }
>  
>  static int active_engine_reset(struct intel_context *ce,
> -			       const char *reason)
> +			       const char *reason,
> +			       bool using_guc)
>  {
>  	struct igt_spinner spin;
>  	struct i915_request *rq;
> @@ -335,9 +337,13 @@ static int active_engine_reset(struct intel_context *ce,
>  	}
>  
>  	err = request_add_spin(rq, &spin);
> -	if (err == 0)
> +	if (err == 0 && !using_guc)
>  		err = intel_engine_reset(ce->engine, reason);
>  
> +	/* Ensure the reset happens and kills the engine */
> +	if (err == 0)
> +		err = intel_selftest_wait_for_rq(rq);
> +
>  	igt_spinner_end(&spin);
>  	igt_spinner_fini(&spin);
>  
> @@ -345,21 +351,23 @@ static int active_engine_reset(struct intel_context *ce,
>  }
>  
>  static int __live_mocs_reset(struct live_mocs *mocs,
> -			     struct intel_context *ce)
> +			     struct intel_context *ce, bool using_guc)
>  {
>  	struct intel_gt *gt = ce->engine->gt;
>  	int err;
>  
>  	if (intel_has_reset_engine(gt)) {
> -		err = intel_engine_reset(ce->engine, "mocs");
> -		if (err)
> -			return err;
> -
> -		err = check_mocs_engine(mocs, ce);
> -		if (err)
> -			return err;
> +		if (!using_guc) {
> +			err = intel_engine_reset(ce->engine, "mocs");
> +			if (err)
> +				return err;
> +
> +			err = check_mocs_engine(mocs, ce);
> +			if (err)
> +				return err;
> +		}
>  
> -		err = active_engine_reset(ce, "mocs");
> +		err = active_engine_reset(ce, "mocs", using_guc);
>  		if (err)
>  			return err;
>  
> @@ -395,19 +403,32 @@ static int live_mocs_reset(void *arg)
>  
>  	igt_global_reset_lock(gt);
>  	for_each_engine(engine, gt, id) {
> +		bool using_guc = intel_engine_uses_guc(engine);
> +		struct intel_selftest_saved_policy saved;
>  		struct intel_context *ce;
> +		int err2;
> +
> +		err = intel_selftest_modify_policy(engine, &saved);
> +		if (err)
> +			break;
>  
>  		ce = mocs_context_create(engine);
>  		if (IS_ERR(ce)) {
>  			err = PTR_ERR(ce);
> -			break;
> +			goto restore;
>  		}
>  
>  		intel_engine_pm_get(engine);
> -		err = __live_mocs_reset(&mocs, ce);
> -		intel_engine_pm_put(engine);
>  
> +		err = __live_mocs_reset(&mocs, ce, using_guc);
> +
> +		intel_engine_pm_put(engine);
>  		intel_context_put(ce);
> +
> +restore:
> +		err2 = intel_selftest_restore_policy(engine, &saved);
> +		if (err == 0)
> +			err = err2;
>  		if (err)
>  			break;
>  	}
> -- 
> 2.28.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for GuC submission support (rev3)
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
                   ` (51 preceding siblings ...)
  (?)
@ 2021-07-17  1:10 ` Patchwork
  -1 siblings, 0 replies; 221+ messages in thread
From: Patchwork @ 2021-07-17  1:10 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx

== Series Details ==

Series: GuC submission support (rev3)
URL   : https://patchwork.freedesktop.org/series/91840/
State : failure

== Summary ==

CALL    scripts/checksyscalls.sh
  CALL    scripts/atomic/check-atomics.sh
  DESCEND objtool
  CHK     include/generated/compile.h
  LD [M]  drivers/gpu/drm/i915/i915.o
  HDRTEST drivers/gpu/drm/i915/gt/intel_gt_requests.h
In file included from <command-line>:
./drivers/gpu/drm/i915/gt/intel_gt_requests.h: In function ‘intel_gt_retire_requests’:
./drivers/gpu/drm/i915/gt/intel_gt_requests.h:17:42: error: ‘NULL’ undeclared (first use in this function)
  intel_gt_retire_requests_timeout(gt, 0, NULL);
                                          ^~~~
./drivers/gpu/drm/i915/gt/intel_gt_requests.h:17:42: note: ‘NULL’ is defined in header ‘<stddef.h>’; did you forget to ‘#include <stddef.h>’?
./drivers/gpu/drm/i915/gt/intel_gt_requests.h:1:1:
+#include <stddef.h>
 /* SPDX-License-Identifier: MIT */
./drivers/gpu/drm/i915/gt/intel_gt_requests.h:17:42:
  intel_gt_retire_requests_timeout(gt, 0, NULL);
                                          ^~~~
./drivers/gpu/drm/i915/gt/intel_gt_requests.h:17:42: note: each undeclared identifier is reported only once for each function it appears in
drivers/gpu/drm/i915/Makefile:319: recipe for target 'drivers/gpu/drm/i915/gt/intel_gt_requests.hdrtest' failed
make[4]: *** [drivers/gpu/drm/i915/gt/intel_gt_requests.hdrtest] Error 1
scripts/Makefile.build:514: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:514: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:514: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:1841: recipe for target 'drivers' failed
make: *** [drivers] Error 2


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 00/51] GuC submission support
  2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
@ 2021-07-19  9:06   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 221+ messages in thread
From: Tvrtko Ursulin @ 2021-07-19  9:06 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 16/07/2021 21:16, Matthew Brost wrote:
> As discussed in [1], [2] we are enabling GuC submission support in the
> i915. This is a subset of the patches in step 5 described in [1],
> basically it is absolute to enable CI with GuC submission on gen11+
> platforms.
> 
> This series itself will likely be broken down into smaller patch sets to
> merge. Likely into CTBs changes, basic submission, virtual engines, and
> resets.
> 
> A following series will address the missing patches remaining from [1].
> 
> Locally tested on TGL machine and basic tests seem to be passing.
> 
> v2: Address all review comments in [3], include several more patches to
> make CI [4] happy.

What about:

ae1a6469-e8de-0745-1bb9-2054dd43dc74@linux.intel.com
e5d96ebb-f168-c1af-22c8-0b066388e70d@linux.intel.com

?

Regards,

Tvrtko

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> [1] https://patchwork.freedesktop.org/series/89844/
> [2] https://patchwork.freedesktop.org/series/91417/
> [3] https://patchwork.freedesktop.org/series/91840/
> [4] https://patchwork.freedesktop.org/series/91885/
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> Daniele Ceraolo Spurio (1):
>    drm/i915/guc: Unblock GuC submission on Gen11+
> 
> John Harrison (12):
>    drm/i915: Track 'serial' counts for virtual engines
>    drm/i915/guc: Provide mmio list to be saved/restored on engine reset
>    drm/i915/guc: Don't complain about reset races
>    drm/i915/guc: Enable GuC engine reset
>    drm/i915/guc: Fix for error capture after full GPU reset with GuC
>    drm/i915/guc: Hook GuC scheduling policies up
>    drm/i915/guc: Connect reset modparam updates to GuC policy flags
>    drm/i915/guc: Include scheduling policies in the debugfs state dump
>    drm/i915/guc: Add golden context to GuC ADS
>    drm/i915/selftest: Better error reporting from hangcheck selftest
>    drm/i915/selftest: Fix hangcheck self test for GuC submission
>    drm/i915/selftest: Bump selftest timeouts for hangcheck
> 
> Matthew Brost (36):
>    drm/i915/guc: Add new GuC interface defines and structures
>    drm/i915/guc: Remove GuC stage descriptor, add LRC descriptor
>    drm/i915/guc: Add LRC descriptor context lookup array
>    drm/i915/guc: Implement GuC submission tasklet
>    drm/i915/guc: Add bypass tasklet submission path to GuC
>    drm/i915/guc: Implement GuC context operations for new inteface
>    drm/i915/guc: Insert fence on context when deregistering
>    drm/i915/guc: Defer context unpin until scheduling is disabled
>    drm/i915/guc: Disable engine barriers with GuC during unpin
>    drm/i915/guc: Extend deregistration fence to schedule disable
>    drm/i915: Disable preempt busywait when using GuC scheduling
>    drm/i915/guc: Ensure request ordering via completion fences
>    drm/i915/guc: Disable semaphores when using GuC scheduling
>    drm/i915/guc: Ensure G2H response has space in buffer
>    drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
>    drm/i915/guc: Update GuC debugfs to support new GuC
>    drm/i915/guc: Add several request trace points
>    drm/i915: Add intel_context tracing
>    drm/i915/guc: GuC virtual engines
>    drm/i915: Hold reference to intel_context over life of i915_request
>    drm/i915/guc: Disable bonding extension with GuC submission
>    drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
>    drm/i915: Add i915_sched_engine destroy vfunc
>    drm/i915: Move active request tracking to a vfunc
>    drm/i915/guc: Reset implementation for new GuC interface
>    drm/i915: Reset GPU immediately if submission is disabled
>    drm/i915/guc: Add disable interrupts to guc sanitize
>    drm/i915/guc: Suspend/resume implementation for new interface
>    drm/i915/guc: Handle context reset notification
>    drm/i915/guc: Handle engine reset failure notification
>    drm/i915/guc: Enable the timer expired interrupt for GuC
>    drm/i915/guc: Capture error state on context reset
>    drm/i915/guc: Implement banned contexts for GuC submission
>    drm/i915/guc: Support request cancellation
>    drm/i915/selftest: Increase some timeouts in live_requests
>    drm/i915/guc: Implement GuC priority management
> 
> Rahul Kumar Singh (2):
>    drm/i915/selftest: Fix workarounds selftest for GuC submission
>    drm/i915/selftest: Fix MOCS selftest for GuC submission
> 
>   drivers/gpu/drm/i915/Makefile                 |    1 +
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   21 +-
>   drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
>   drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   44 +-
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   16 +-
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
>   drivers/gpu/drm/i915/gt/intel_context.c       |   50 +-
>   drivers/gpu/drm/i915/gt/intel_context.h       |   50 +-
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   63 +-
>   drivers/gpu/drm/i915/gt/intel_engine.h        |   54 +-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  182 +-
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   71 +-
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   13 +-
>   drivers/gpu/drm/i915/gt/intel_engine_user.c   |    4 +
>   .../drm/i915/gt/intel_execlists_submission.c  |   95 +-
>   .../drm/i915/gt/intel_execlists_submission.h  |    4 -
>   drivers/gpu/drm/i915/gt/intel_gt.c            |   21 +
>   drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |    6 +-
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   21 +-
>   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    7 +-
>   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
>   drivers/gpu/drm/i915/gt/intel_reset.c         |   50 +-
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   48 +
>   drivers/gpu/drm/i915/gt/intel_rps.c           |    4 +
>   drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
>   .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
>   drivers/gpu/drm/i915/gt/mock_engine.c         |   40 +-
>   drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
>   .../drm/i915/gt/selftest_engine_heartbeat.c   |   22 +
>   .../drm/i915/gt/selftest_engine_heartbeat.h   |    2 +
>   drivers/gpu/drm/i915/gt/selftest_execlists.c  |   12 +-
>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  314 +-
>   drivers/gpu/drm/i915/gt/selftest_mocs.c       |   50 +-
>   .../gpu/drm/i915/gt/selftest_workarounds.c    |  132 +-
>   .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   15 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   82 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  101 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  461 ++-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    4 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  132 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   16 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   25 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   88 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2776 +++++++++++++++--
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   18 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  101 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
>   drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
>   drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
>   drivers/gpu/drm/i915/i915_gpu_error.c         |   25 +-
>   drivers/gpu/drm/i915/i915_reg.h               |    2 +
>   drivers/gpu/drm/i915/i915_request.c           |  174 +-
>   drivers/gpu/drm/i915/i915_request.h           |   29 +
>   drivers/gpu/drm/i915/i915_scheduler.c         |   16 +-
>   drivers/gpu/drm/i915/i915_scheduler.h         |   10 +-
>   drivers/gpu/drm/i915/i915_scheduler_types.h   |   22 +
>   drivers/gpu/drm/i915/i915_trace.h             |  219 +-
>   drivers/gpu/drm/i915/selftests/i915_request.c |    4 +-
>   .../gpu/drm/i915/selftests/igt_flush_test.c   |    2 +-
>   .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
>   .../i915/selftests/intel_scheduler_helpers.c  |   89 +
>   .../i915/selftests/intel_scheduler_helpers.h  |   35 +
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
>   include/uapi/drm/i915_drm.h                   |    9 +
>   68 files changed, 5119 insertions(+), 862 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
>   create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 00/51] GuC submission support
@ 2021-07-19  9:06   ` Tvrtko Ursulin
  0 siblings, 0 replies; 221+ messages in thread
From: Tvrtko Ursulin @ 2021-07-19  9:06 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 16/07/2021 21:16, Matthew Brost wrote:
> As discussed in [1], [2] we are enabling GuC submission support in the
> i915. This is a subset of the patches in step 5 described in [1],
> basically it is absolute to enable CI with GuC submission on gen11+
> platforms.
> 
> This series itself will likely be broken down into smaller patch sets to
> merge. Likely into CTBs changes, basic submission, virtual engines, and
> resets.
> 
> A following series will address the missing patches remaining from [1].
> 
> Locally tested on TGL machine and basic tests seem to be passing.
> 
> v2: Address all review comments in [3], include several more patches to
> make CI [4] happy.

What about:

ae1a6469-e8de-0745-1bb9-2054dd43dc74@linux.intel.com
e5d96ebb-f168-c1af-22c8-0b066388e70d@linux.intel.com

?

Regards,

Tvrtko

> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> [1] https://patchwork.freedesktop.org/series/89844/
> [2] https://patchwork.freedesktop.org/series/91417/
> [3] https://patchwork.freedesktop.org/series/91840/
> [4] https://patchwork.freedesktop.org/series/91885/
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> 
> Daniele Ceraolo Spurio (1):
>    drm/i915/guc: Unblock GuC submission on Gen11+
> 
> John Harrison (12):
>    drm/i915: Track 'serial' counts for virtual engines
>    drm/i915/guc: Provide mmio list to be saved/restored on engine reset
>    drm/i915/guc: Don't complain about reset races
>    drm/i915/guc: Enable GuC engine reset
>    drm/i915/guc: Fix for error capture after full GPU reset with GuC
>    drm/i915/guc: Hook GuC scheduling policies up
>    drm/i915/guc: Connect reset modparam updates to GuC policy flags
>    drm/i915/guc: Include scheduling policies in the debugfs state dump
>    drm/i915/guc: Add golden context to GuC ADS
>    drm/i915/selftest: Better error reporting from hangcheck selftest
>    drm/i915/selftest: Fix hangcheck self test for GuC submission
>    drm/i915/selftest: Bump selftest timeouts for hangcheck
> 
> Matthew Brost (36):
>    drm/i915/guc: Add new GuC interface defines and structures
>    drm/i915/guc: Remove GuC stage descriptor, add LRC descriptor
>    drm/i915/guc: Add LRC descriptor context lookup array
>    drm/i915/guc: Implement GuC submission tasklet
>    drm/i915/guc: Add bypass tasklet submission path to GuC
>    drm/i915/guc: Implement GuC context operations for new inteface
>    drm/i915/guc: Insert fence on context when deregistering
>    drm/i915/guc: Defer context unpin until scheduling is disabled
>    drm/i915/guc: Disable engine barriers with GuC during unpin
>    drm/i915/guc: Extend deregistration fence to schedule disable
>    drm/i915: Disable preempt busywait when using GuC scheduling
>    drm/i915/guc: Ensure request ordering via completion fences
>    drm/i915/guc: Disable semaphores when using GuC scheduling
>    drm/i915/guc: Ensure G2H response has space in buffer
>    drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
>    drm/i915/guc: Update GuC debugfs to support new GuC
>    drm/i915/guc: Add several request trace points
>    drm/i915: Add intel_context tracing
>    drm/i915/guc: GuC virtual engines
>    drm/i915: Hold reference to intel_context over life of i915_request
>    drm/i915/guc: Disable bonding extension with GuC submission
>    drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
>    drm/i915: Add i915_sched_engine destroy vfunc
>    drm/i915: Move active request tracking to a vfunc
>    drm/i915/guc: Reset implementation for new GuC interface
>    drm/i915: Reset GPU immediately if submission is disabled
>    drm/i915/guc: Add disable interrupts to guc sanitize
>    drm/i915/guc: Suspend/resume implementation for new interface
>    drm/i915/guc: Handle context reset notification
>    drm/i915/guc: Handle engine reset failure notification
>    drm/i915/guc: Enable the timer expired interrupt for GuC
>    drm/i915/guc: Capture error state on context reset
>    drm/i915/guc: Implement banned contexts for GuC submission
>    drm/i915/guc: Support request cancellation
>    drm/i915/selftest: Increase some timeouts in live_requests
>    drm/i915/guc: Implement GuC priority management
> 
> Rahul Kumar Singh (2):
>    drm/i915/selftest: Fix workarounds selftest for GuC submission
>    drm/i915/selftest: Fix MOCS selftest for GuC submission
> 
>   drivers/gpu/drm/i915/Makefile                 |    1 +
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   21 +-
>   drivers/gpu/drm/i915/gem/i915_gem_context.h   |    1 +
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |    3 +-
>   drivers/gpu/drm/i915/gt/gen8_engine_cs.c      |    6 +-
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   44 +-
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   |   16 +-
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |    7 +
>   drivers/gpu/drm/i915/gt/intel_context.c       |   50 +-
>   drivers/gpu/drm/i915/gt/intel_context.h       |   50 +-
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   63 +-
>   drivers/gpu/drm/i915/gt/intel_engine.h        |   54 +-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  182 +-
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   71 +-
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.h  |    4 +
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |   13 +-
>   drivers/gpu/drm/i915/gt/intel_engine_user.c   |    4 +
>   .../drm/i915/gt/intel_execlists_submission.c  |   95 +-
>   .../drm/i915/gt/intel_execlists_submission.h  |    4 -
>   drivers/gpu/drm/i915/gt/intel_gt.c            |   21 +
>   drivers/gpu/drm/i915/gt/intel_gt.h            |    2 +
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |    6 +-
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c   |   21 +-
>   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |    7 +-
>   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |    1 -
>   drivers/gpu/drm/i915/gt/intel_reset.c         |   50 +-
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |   48 +
>   drivers/gpu/drm/i915/gt/intel_rps.c           |    4 +
>   drivers/gpu/drm/i915/gt/intel_workarounds.c   |   46 +-
>   .../gpu/drm/i915/gt/intel_workarounds_types.h |    1 +
>   drivers/gpu/drm/i915/gt/mock_engine.c         |   40 +-
>   drivers/gpu/drm/i915/gt/selftest_context.c    |   10 +
>   .../drm/i915/gt/selftest_engine_heartbeat.c   |   22 +
>   .../drm/i915/gt/selftest_engine_heartbeat.h   |    2 +
>   drivers/gpu/drm/i915/gt/selftest_execlists.c  |   12 +-
>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |  314 +-
>   drivers/gpu/drm/i915/gt/selftest_mocs.c       |   50 +-
>   .../gpu/drm/i915/gt/selftest_workarounds.c    |  132 +-
>   .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h  |   15 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |   82 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  101 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    |  461 ++-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h    |    4 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  132 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |   16 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    |   25 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h   |   88 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 2776 +++++++++++++++--
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   18 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  101 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   11 +
>   drivers/gpu/drm/i915/i915_debugfs_params.c    |   31 +
>   drivers/gpu/drm/i915/i915_gem_evict.c         |    1 +
>   drivers/gpu/drm/i915/i915_gpu_error.c         |   25 +-
>   drivers/gpu/drm/i915/i915_reg.h               |    2 +
>   drivers/gpu/drm/i915/i915_request.c           |  174 +-
>   drivers/gpu/drm/i915/i915_request.h           |   29 +
>   drivers/gpu/drm/i915/i915_scheduler.c         |   16 +-
>   drivers/gpu/drm/i915/i915_scheduler.h         |   10 +-
>   drivers/gpu/drm/i915/i915_scheduler_types.h   |   22 +
>   drivers/gpu/drm/i915/i915_trace.h             |  219 +-
>   drivers/gpu/drm/i915/selftests/i915_request.c |    4 +-
>   .../gpu/drm/i915/selftests/igt_flush_test.c   |    2 +-
>   .../gpu/drm/i915/selftests/igt_live_test.c    |    2 +-
>   .../i915/selftests/intel_scheduler_helpers.c  |   89 +
>   .../i915/selftests/intel_scheduler_helpers.h  |   35 +
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |    3 +-
>   include/uapi/drm/i915_drm.h                   |    9 +
>   68 files changed, 5119 insertions(+), 862 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
>   create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-19 17:24     ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 17:24 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:14PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The media watchdog mechanism involves GuC doing a silent reset and
> continue of the hung context. This requires the i915 driver provide a
> golden context to GuC in the ADS.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
>  drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
>  7 files changed, 199 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index acfdd53b2678..ceeb517ba259 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
>  	if (err)
>  		goto err_gt;
>  
> +	intel_uc_init_late(&gt->uc);
> +
>  	err = i915_inject_probe_error(gt->i915, -EIO);
>  	if (err)
>  		goto err_gt;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 68266cbffd1f..979128e28372 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
>  	}
>  }
>  
> +void intel_guc_init_late(struct intel_guc *guc)
> +{
> +	intel_guc_ads_init_late(guc);
> +}
> +
>  static u32 guc_ctl_debug_flags(struct intel_guc *guc)
>  {
>  	u32 level = intel_guc_log_get_level(&guc->log);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index bc71635c70b9..dc18ac510ac8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -60,6 +60,7 @@ struct intel_guc {
>  	struct i915_vma *ads_vma;
>  	struct __guc_ads_blob *ads_blob;
>  	u32 ads_regset_size;
> +	u32 ads_golden_ctxt_size;
>  
>  	struct i915_vma *lrc_desc_pool;
>  	void *lrc_desc_pool_vaddr;
> @@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
>  }
>  
>  void intel_guc_init_early(struct intel_guc *guc);
> +void intel_guc_init_late(struct intel_guc *guc);
>  void intel_guc_init_send_regs(struct intel_guc *guc);
>  void intel_guc_write_params(struct intel_guc *guc);
>  int intel_guc_init(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 93b0ac35a508..241b3089b658 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -7,6 +7,7 @@
>  
>  #include "gt/intel_gt.h"
>  #include "gt/intel_lrc.h"
> +#include "gt/shmem_utils.h"
>  #include "intel_guc_ads.h"
>  #include "intel_guc_fwif.h"
>  #include "intel_uc.h"
> @@ -33,6 +34,10 @@
>   *      +---------------------------------------+ <== dynamic
>   *      | padding                               |
>   *      +---------------------------------------+ <== 4K aligned
> + *      | golden contexts                       |
> + *      +---------------------------------------+
> + *      | padding                               |
> + *      +---------------------------------------+ <== 4K aligned
>   *      | private data                          |
>   *      +---------------------------------------+
>   *      | padding                               |
> @@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
>  	return guc->ads_regset_size;
>  }
>  
> +static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
> +{
> +	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
> +}
> +
>  static u32 guc_ads_private_data_size(struct intel_guc *guc)
>  {
>  	return PAGE_ALIGN(guc->fw.private_data_size);
> @@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
>  	return offsetof(struct __guc_ads_blob, regset);
>  }
>  
> -static u32 guc_ads_private_data_offset(struct intel_guc *guc)
> +static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
>  {
>  	u32 offset;
>  
>  	offset = guc_ads_regset_offset(guc) +
>  		 guc_ads_regset_size(guc);
> +
> +	return PAGE_ALIGN(offset);
> +}
> +
> +static u32 guc_ads_private_data_offset(struct intel_guc *guc)
> +{
> +	u32 offset;
> +
> +	offset = guc_ads_golden_ctxt_offset(guc) +
> +		 guc_ads_golden_ctxt_size(guc);
> +
>  	return PAGE_ALIGN(offset);
>  }
>  
> @@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
>  	GEM_BUG_ON(temp_set.size);
>  }
>  
> -/*
> - * The first 80 dwords of the register state context, containing the
> - * execlists and ppgtt registers.
> - */
> -#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
> +static void fill_engine_enable_masks(struct intel_gt *gt,
> +				     struct guc_gt_system_info *info)
> +{
> +	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
> +	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
> +	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
> +	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
> +}
>  
> -static void __guc_ads_init(struct intel_guc *guc)
> +/* Skip execlist and PPGTT registers */
> +#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
> +#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
> +
> +static int guc_prep_golden_context(struct intel_guc *guc,
> +				   struct __guc_ads_blob *blob)
>  {
>  	struct intel_gt *gt = guc_to_gt(guc);
> -	struct drm_i915_private *i915 = gt->i915;
> +	u32 addr_ggtt, offset;
> +	u32 total_size = 0, alloc_size, real_size;
> +	u8 engine_class, guc_class;
> +	struct guc_gt_system_info *info, local_info;
> +
> +	/*
> +	 * Reserve the memory for the golden contexts and point GuC at it but
> +	 * leave it empty for now. The context data will be filled in later
> +	 * once there is something available to put there.
> +	 *
> +	 * Note that the HWSP and ring context are not included.
> +	 *
> +	 * Note also that the storage must be pinned in the GGTT, so that the
> +	 * address won't change after GuC has been told where to find it. The
> +	 * GuC will also validate that the LRC base + size fall within the
> +	 * allowed GGTT range.
> +	 */
> +	if (blob) {
> +		offset = guc_ads_golden_ctxt_offset(guc);
> +		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> +		info = &blob->system_info;
> +	} else {
> +		memset(&local_info, 0, sizeof(local_info));
> +		info = &local_info;
> +		fill_engine_enable_masks(gt, info);
> +	}
> +
> +	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
> +		if (engine_class == OTHER_CLASS)
> +			continue;
> +
> +		guc_class = engine_class_to_guc_class(engine_class);
> +
> +		if (!info->engine_enabled_masks[guc_class])
> +			continue;
> +
> +		real_size = intel_engine_context_size(gt, engine_class);
> +		alloc_size = PAGE_ALIGN(real_size);
> +		total_size += alloc_size;
> +
> +		if (!blob)
> +			continue;
> +
> +		blob->ads.eng_state_size[guc_class] = real_size;
> +		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
> +		addr_ggtt += alloc_size;
> +	}
> +
> +	if (!blob)
> +		return total_size;
> +
> +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
> +	return total_size;
> +}
> +
> +static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
> +{
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	for_each_engine(engine, gt, id) {
> +		if (engine->class != engine_class)
> +			continue;
> +
> +		if (!engine->default_state)
> +			continue;
> +
> +		return engine;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void guc_init_golden_context(struct intel_guc *guc)
> +{
>  	struct __guc_ads_blob *blob = guc->ads_blob;
> -	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
> -	u32 base;
> +	struct intel_engine_cs *engine;
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	u32 addr_ggtt, offset;
> +	u32 total_size = 0, alloc_size, real_size;
>  	u8 engine_class, guc_class;
> +	u8 *ptr;
>  
> -	/* GuC scheduling policies */
> -	guc_policies_init(guc, &blob->policies);
> +	if (!intel_uc_uses_guc_submission(&gt->uc))
> +		return;
> +
> +	GEM_BUG_ON(!blob);
>  
>  	/*
> -	 * GuC expects a per-engine-class context image and size
> -	 * (minus hwsp and ring context). The context image will be
> -	 * used to reinitialize engines after a reset. It must exist
> -	 * and be pinned in the GGTT, so that the address won't change after
> -	 * we have told GuC where to find it. The context size will be used
> -	 * to validate that the LRC base + size fall within allowed GGTT.
> +	 * Go back and fill in the golden context data now that it is
> +	 * available.
>  	 */
> +	offset = guc_ads_golden_ctxt_offset(guc);
> +	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> +	ptr = ((u8 *) blob) + offset;
> +
>  	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
>  		if (engine_class == OTHER_CLASS)
>  			continue;
>  
>  		guc_class = engine_class_to_guc_class(engine_class);
>  
> -		/*
> -		 * TODO: Set context pointer to default state to allow
> -		 * GuC to re-init guilty contexts after internal reset.
> -		 */
> -		blob->ads.golden_context_lrca[guc_class] = 0;
> -		blob->ads.eng_state_size[guc_class] =
> -			intel_engine_context_size(gt, engine_class) -
> -			skipped_size;
> +		if (!blob->system_info.engine_enabled_masks[guc_class])
> +			continue;
> +
> +		real_size = intel_engine_context_size(gt, engine_class);
> +		alloc_size = PAGE_ALIGN(real_size);
> +		total_size += alloc_size;
> +
> +		engine = find_engine_state(gt, engine_class);
> +		if (!engine) {
> +			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
> +			blob->ads.eng_state_size[guc_class] = 0;
> +			blob->ads.golden_context_lrca[guc_class] = 0;
> +			continue;
> +		}
> +
> +		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
> +		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
> +		addr_ggtt += alloc_size;
> +
> +		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);

There is a confirmed memory corruption here, should be:
shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size - SKIP_SIZE);

Also a bit of nit but I think s/SKIP_SIZE/GOLDEN_CONTEXT_SKIP_SIZE/g

As long you agree John, I'll fix both of these in the next rev.

Matt

> +		ptr += alloc_size;
>  	}
>  
> +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
> +}
> +
> +static void __guc_ads_init(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct drm_i915_private *i915 = gt->i915;
> +	struct __guc_ads_blob *blob = guc->ads_blob;
> +	u32 base;
> +
> +	/* GuC scheduling policies */
> +	guc_policies_init(guc, &blob->policies);
> +
>  	/* System info */
> -	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
> -	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
> -	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
> -	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
> +	fill_engine_enable_masks(gt, &blob->system_info);
>  
>  	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
>  		hweight8(gt->info.sseu.slice_mask);
> @@ -380,6 +511,9 @@ static void __guc_ads_init(struct intel_guc *guc)
>  			 GEN12_DOORBELLS_PER_SQIDI) + 1;
>  	}
>  
> +	/* Golden contexts for re-initialising after a watchdog reset */
> +	guc_prep_golden_context(guc, blob);
> +
>  	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
>  
>  	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
> @@ -417,6 +551,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
>  		return ret;
>  	guc->ads_regset_size = ret;
>  
> +	/* Likewise the golden contexts: */
> +	ret = guc_prep_golden_context(guc, NULL);
> +	if (ret < 0)
> +		return ret;
> +	guc->ads_golden_ctxt_size = ret;
> +
> +	/* Now the total size can be determined: */
>  	size = guc_ads_blob_size(guc);
>  
>  	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
> @@ -429,6 +570,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
>  	return 0;
>  }
>  
> +void intel_guc_ads_init_late(struct intel_guc *guc)
> +{
> +	/*
> +	 * The golden context setup requires the saved engine state from
> +	 * __engines_record_defaults(). However, that requires engines to be
> +	 * operational which means the ADS must already have been configured.
> +	 * Fortunately, the golden context state is not needed until a hang
> +	 * occurs, so it can be filled in during this late init phase.
> +	 */
> +	guc_init_golden_context(guc);
> +}
> +
>  void intel_guc_ads_destroy(struct intel_guc *guc)
>  {
>  	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> index bdcb339a5321..3d85051d57e4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> @@ -11,6 +11,7 @@ struct drm_printer;
>  
>  int intel_guc_ads_create(struct intel_guc *guc);
>  void intel_guc_ads_destroy(struct intel_guc *guc);
> +void intel_guc_ads_init_late(struct intel_guc *guc);
>  void intel_guc_ads_reset(struct intel_guc *guc);
>  void intel_guc_ads_print_policy_info(struct intel_guc *guc,
>  				     struct drm_printer *p);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 77c1fe2ed883..7a69c3c027e9 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
>  		uc->ops = &uc_ops_off;
>  }
>  
> +void intel_uc_init_late(struct intel_uc *uc)
> +{
> +	intel_guc_init_late(&uc->guc);
> +}
> +
>  void intel_uc_driver_late_release(struct intel_uc *uc)
>  {
>  }
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index 91315e3f1c58..e2da2b6e76e1 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -35,6 +35,7 @@ struct intel_uc {
>  };
>  
>  void intel_uc_init_early(struct intel_uc *uc);
> +void intel_uc_init_late(struct intel_uc *uc);
>  void intel_uc_driver_late_release(struct intel_uc *uc);
>  void intel_uc_driver_remove(struct intel_uc *uc);
>  void intel_uc_init_mmio(struct intel_uc *uc);
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS
@ 2021-07-19 17:24     ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 17:24 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:14PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The media watchdog mechanism involves GuC doing a silent reset and
> continue of the hung context. This requires the i915 driver provide a
> golden context to GuC in the ADS.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
>  drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
>  7 files changed, 199 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index acfdd53b2678..ceeb517ba259 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
>  	if (err)
>  		goto err_gt;
>  
> +	intel_uc_init_late(&gt->uc);
> +
>  	err = i915_inject_probe_error(gt->i915, -EIO);
>  	if (err)
>  		goto err_gt;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 68266cbffd1f..979128e28372 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
>  	}
>  }
>  
> +void intel_guc_init_late(struct intel_guc *guc)
> +{
> +	intel_guc_ads_init_late(guc);
> +}
> +
>  static u32 guc_ctl_debug_flags(struct intel_guc *guc)
>  {
>  	u32 level = intel_guc_log_get_level(&guc->log);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index bc71635c70b9..dc18ac510ac8 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -60,6 +60,7 @@ struct intel_guc {
>  	struct i915_vma *ads_vma;
>  	struct __guc_ads_blob *ads_blob;
>  	u32 ads_regset_size;
> +	u32 ads_golden_ctxt_size;
>  
>  	struct i915_vma *lrc_desc_pool;
>  	void *lrc_desc_pool_vaddr;
> @@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
>  }
>  
>  void intel_guc_init_early(struct intel_guc *guc);
> +void intel_guc_init_late(struct intel_guc *guc);
>  void intel_guc_init_send_regs(struct intel_guc *guc);
>  void intel_guc_write_params(struct intel_guc *guc);
>  int intel_guc_init(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 93b0ac35a508..241b3089b658 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -7,6 +7,7 @@
>  
>  #include "gt/intel_gt.h"
>  #include "gt/intel_lrc.h"
> +#include "gt/shmem_utils.h"
>  #include "intel_guc_ads.h"
>  #include "intel_guc_fwif.h"
>  #include "intel_uc.h"
> @@ -33,6 +34,10 @@
>   *      +---------------------------------------+ <== dynamic
>   *      | padding                               |
>   *      +---------------------------------------+ <== 4K aligned
> + *      | golden contexts                       |
> + *      +---------------------------------------+
> + *      | padding                               |
> + *      +---------------------------------------+ <== 4K aligned
>   *      | private data                          |
>   *      +---------------------------------------+
>   *      | padding                               |
> @@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
>  	return guc->ads_regset_size;
>  }
>  
> +static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
> +{
> +	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
> +}
> +
>  static u32 guc_ads_private_data_size(struct intel_guc *guc)
>  {
>  	return PAGE_ALIGN(guc->fw.private_data_size);
> @@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
>  	return offsetof(struct __guc_ads_blob, regset);
>  }
>  
> -static u32 guc_ads_private_data_offset(struct intel_guc *guc)
> +static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
>  {
>  	u32 offset;
>  
>  	offset = guc_ads_regset_offset(guc) +
>  		 guc_ads_regset_size(guc);
> +
> +	return PAGE_ALIGN(offset);
> +}
> +
> +static u32 guc_ads_private_data_offset(struct intel_guc *guc)
> +{
> +	u32 offset;
> +
> +	offset = guc_ads_golden_ctxt_offset(guc) +
> +		 guc_ads_golden_ctxt_size(guc);
> +
>  	return PAGE_ALIGN(offset);
>  }
>  
> @@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
>  	GEM_BUG_ON(temp_set.size);
>  }
>  
> -/*
> - * The first 80 dwords of the register state context, containing the
> - * execlists and ppgtt registers.
> - */
> -#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
> +static void fill_engine_enable_masks(struct intel_gt *gt,
> +				     struct guc_gt_system_info *info)
> +{
> +	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
> +	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
> +	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
> +	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
> +}
>  
> -static void __guc_ads_init(struct intel_guc *guc)
> +/* Skip execlist and PPGTT registers */
> +#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
> +#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
> +
> +static int guc_prep_golden_context(struct intel_guc *guc,
> +				   struct __guc_ads_blob *blob)
>  {
>  	struct intel_gt *gt = guc_to_gt(guc);
> -	struct drm_i915_private *i915 = gt->i915;
> +	u32 addr_ggtt, offset;
> +	u32 total_size = 0, alloc_size, real_size;
> +	u8 engine_class, guc_class;
> +	struct guc_gt_system_info *info, local_info;
> +
> +	/*
> +	 * Reserve the memory for the golden contexts and point GuC at it but
> +	 * leave it empty for now. The context data will be filled in later
> +	 * once there is something available to put there.
> +	 *
> +	 * Note that the HWSP and ring context are not included.
> +	 *
> +	 * Note also that the storage must be pinned in the GGTT, so that the
> +	 * address won't change after GuC has been told where to find it. The
> +	 * GuC will also validate that the LRC base + size fall within the
> +	 * allowed GGTT range.
> +	 */
> +	if (blob) {
> +		offset = guc_ads_golden_ctxt_offset(guc);
> +		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> +		info = &blob->system_info;
> +	} else {
> +		memset(&local_info, 0, sizeof(local_info));
> +		info = &local_info;
> +		fill_engine_enable_masks(gt, info);
> +	}
> +
> +	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
> +		if (engine_class == OTHER_CLASS)
> +			continue;
> +
> +		guc_class = engine_class_to_guc_class(engine_class);
> +
> +		if (!info->engine_enabled_masks[guc_class])
> +			continue;
> +
> +		real_size = intel_engine_context_size(gt, engine_class);
> +		alloc_size = PAGE_ALIGN(real_size);
> +		total_size += alloc_size;
> +
> +		if (!blob)
> +			continue;
> +
> +		blob->ads.eng_state_size[guc_class] = real_size;
> +		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
> +		addr_ggtt += alloc_size;
> +	}
> +
> +	if (!blob)
> +		return total_size;
> +
> +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
> +	return total_size;
> +}
> +
> +static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
> +{
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	for_each_engine(engine, gt, id) {
> +		if (engine->class != engine_class)
> +			continue;
> +
> +		if (!engine->default_state)
> +			continue;
> +
> +		return engine;
> +	}
> +
> +	return NULL;
> +}
> +
> +static void guc_init_golden_context(struct intel_guc *guc)
> +{
>  	struct __guc_ads_blob *blob = guc->ads_blob;
> -	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
> -	u32 base;
> +	struct intel_engine_cs *engine;
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	u32 addr_ggtt, offset;
> +	u32 total_size = 0, alloc_size, real_size;
>  	u8 engine_class, guc_class;
> +	u8 *ptr;
>  
> -	/* GuC scheduling policies */
> -	guc_policies_init(guc, &blob->policies);
> +	if (!intel_uc_uses_guc_submission(&gt->uc))
> +		return;
> +
> +	GEM_BUG_ON(!blob);
>  
>  	/*
> -	 * GuC expects a per-engine-class context image and size
> -	 * (minus hwsp and ring context). The context image will be
> -	 * used to reinitialize engines after a reset. It must exist
> -	 * and be pinned in the GGTT, so that the address won't change after
> -	 * we have told GuC where to find it. The context size will be used
> -	 * to validate that the LRC base + size fall within allowed GGTT.
> +	 * Go back and fill in the golden context data now that it is
> +	 * available.
>  	 */
> +	offset = guc_ads_golden_ctxt_offset(guc);
> +	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> +	ptr = ((u8 *) blob) + offset;
> +
>  	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
>  		if (engine_class == OTHER_CLASS)
>  			continue;
>  
>  		guc_class = engine_class_to_guc_class(engine_class);
>  
> -		/*
> -		 * TODO: Set context pointer to default state to allow
> -		 * GuC to re-init guilty contexts after internal reset.
> -		 */
> -		blob->ads.golden_context_lrca[guc_class] = 0;
> -		blob->ads.eng_state_size[guc_class] =
> -			intel_engine_context_size(gt, engine_class) -
> -			skipped_size;
> +		if (!blob->system_info.engine_enabled_masks[guc_class])
> +			continue;
> +
> +		real_size = intel_engine_context_size(gt, engine_class);
> +		alloc_size = PAGE_ALIGN(real_size);
> +		total_size += alloc_size;
> +
> +		engine = find_engine_state(gt, engine_class);
> +		if (!engine) {
> +			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
> +			blob->ads.eng_state_size[guc_class] = 0;
> +			blob->ads.golden_context_lrca[guc_class] = 0;
> +			continue;
> +		}
> +
> +		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
> +		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
> +		addr_ggtt += alloc_size;
> +
> +		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);

There is a confirmed memory corruption here, should be:
shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size - SKIP_SIZE);

Also a bit of nit but I think s/SKIP_SIZE/GOLDEN_CONTEXT_SKIP_SIZE/g

As long you agree John, I'll fix both of these in the next rev.

Matt

> +		ptr += alloc_size;
>  	}
>  
> +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
> +}
> +
> +static void __guc_ads_init(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct drm_i915_private *i915 = gt->i915;
> +	struct __guc_ads_blob *blob = guc->ads_blob;
> +	u32 base;
> +
> +	/* GuC scheduling policies */
> +	guc_policies_init(guc, &blob->policies);
> +
>  	/* System info */
> -	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
> -	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
> -	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
> -	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
> +	fill_engine_enable_masks(gt, &blob->system_info);
>  
>  	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
>  		hweight8(gt->info.sseu.slice_mask);
> @@ -380,6 +511,9 @@ static void __guc_ads_init(struct intel_guc *guc)
>  			 GEN12_DOORBELLS_PER_SQIDI) + 1;
>  	}
>  
> +	/* Golden contexts for re-initialising after a watchdog reset */
> +	guc_prep_golden_context(guc, blob);
> +
>  	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
>  
>  	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
> @@ -417,6 +551,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
>  		return ret;
>  	guc->ads_regset_size = ret;
>  
> +	/* Likewise the golden contexts: */
> +	ret = guc_prep_golden_context(guc, NULL);
> +	if (ret < 0)
> +		return ret;
> +	guc->ads_golden_ctxt_size = ret;
> +
> +	/* Now the total size can be determined: */
>  	size = guc_ads_blob_size(guc);
>  
>  	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
> @@ -429,6 +570,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
>  	return 0;
>  }
>  
> +void intel_guc_ads_init_late(struct intel_guc *guc)
> +{
> +	/*
> +	 * The golden context setup requires the saved engine state from
> +	 * __engines_record_defaults(). However, that requires engines to be
> +	 * operational which means the ADS must already have been configured.
> +	 * Fortunately, the golden context state is not needed until a hang
> +	 * occurs, so it can be filled in during this late init phase.
> +	 */
> +	guc_init_golden_context(guc);
> +}
> +
>  void intel_guc_ads_destroy(struct intel_guc *guc)
>  {
>  	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> index bdcb339a5321..3d85051d57e4 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> @@ -11,6 +11,7 @@ struct drm_printer;
>  
>  int intel_guc_ads_create(struct intel_guc *guc);
>  void intel_guc_ads_destroy(struct intel_guc *guc);
> +void intel_guc_ads_init_late(struct intel_guc *guc);
>  void intel_guc_ads_reset(struct intel_guc *guc);
>  void intel_guc_ads_print_policy_info(struct intel_guc *guc,
>  				     struct drm_printer *p);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 77c1fe2ed883..7a69c3c027e9 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
>  		uc->ops = &uc_ops_off;
>  }
>  
> +void intel_uc_init_late(struct intel_uc *uc)
> +{
> +	intel_guc_init_late(&uc->guc);
> +}
> +
>  void intel_uc_driver_late_release(struct intel_uc *uc)
>  {
>  }
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index 91315e3f1c58..e2da2b6e76e1 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -35,6 +35,7 @@ struct intel_uc {
>  };
>  
>  void intel_uc_init_early(struct intel_uc *uc);
> +void intel_uc_init_late(struct intel_uc *uc);
>  void intel_uc_driver_late_release(struct intel_uc *uc);
>  void intel_uc_driver_remove(struct intel_uc *uc);
>  void intel_uc_init_mmio(struct intel_uc *uc);
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS
  2021-07-19 17:24     ` Matthew Brost
@ 2021-07-19 18:25       ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-19 18:25 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/19/2021 10:24, Matthew Brost wrote:
> On Fri, Jul 16, 2021 at 01:17:14PM -0700, Matthew Brost wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The media watchdog mechanism involves GuC doing a silent reset and
>> continue of the hung context. This requires the i915 driver provide a
>> golden context to GuC in the ADS.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
>>   drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
>>   drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
>>   7 files changed, 199 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>> index acfdd53b2678..ceeb517ba259 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>> @@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
>>   	if (err)
>>   		goto err_gt;
>>   
>> +	intel_uc_init_late(&gt->uc);
>> +
>>   	err = i915_inject_probe_error(gt->i915, -EIO);
>>   	if (err)
>>   		goto err_gt;
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> index 68266cbffd1f..979128e28372 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> @@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
>>   	}
>>   }
>>   
>> +void intel_guc_init_late(struct intel_guc *guc)
>> +{
>> +	intel_guc_ads_init_late(guc);
>> +}
>> +
>>   static u32 guc_ctl_debug_flags(struct intel_guc *guc)
>>   {
>>   	u32 level = intel_guc_log_get_level(&guc->log);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> index bc71635c70b9..dc18ac510ac8 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> @@ -60,6 +60,7 @@ struct intel_guc {
>>   	struct i915_vma *ads_vma;
>>   	struct __guc_ads_blob *ads_blob;
>>   	u32 ads_regset_size;
>> +	u32 ads_golden_ctxt_size;
>>   
>>   	struct i915_vma *lrc_desc_pool;
>>   	void *lrc_desc_pool_vaddr;
>> @@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
>>   }
>>   
>>   void intel_guc_init_early(struct intel_guc *guc);
>> +void intel_guc_init_late(struct intel_guc *guc);
>>   void intel_guc_init_send_regs(struct intel_guc *guc);
>>   void intel_guc_write_params(struct intel_guc *guc);
>>   int intel_guc_init(struct intel_guc *guc);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
>> index 93b0ac35a508..241b3089b658 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
>> @@ -7,6 +7,7 @@
>>   
>>   #include "gt/intel_gt.h"
>>   #include "gt/intel_lrc.h"
>> +#include "gt/shmem_utils.h"
>>   #include "intel_guc_ads.h"
>>   #include "intel_guc_fwif.h"
>>   #include "intel_uc.h"
>> @@ -33,6 +34,10 @@
>>    *      +---------------------------------------+ <== dynamic
>>    *      | padding                               |
>>    *      +---------------------------------------+ <== 4K aligned
>> + *      | golden contexts                       |
>> + *      +---------------------------------------+
>> + *      | padding                               |
>> + *      +---------------------------------------+ <== 4K aligned
>>    *      | private data                          |
>>    *      +---------------------------------------+
>>    *      | padding                               |
>> @@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
>>   	return guc->ads_regset_size;
>>   }
>>   
>> +static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
>> +{
>> +	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
>> +}
>> +
>>   static u32 guc_ads_private_data_size(struct intel_guc *guc)
>>   {
>>   	return PAGE_ALIGN(guc->fw.private_data_size);
>> @@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
>>   	return offsetof(struct __guc_ads_blob, regset);
>>   }
>>   
>> -static u32 guc_ads_private_data_offset(struct intel_guc *guc)
>> +static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
>>   {
>>   	u32 offset;
>>   
>>   	offset = guc_ads_regset_offset(guc) +
>>   		 guc_ads_regset_size(guc);
>> +
>> +	return PAGE_ALIGN(offset);
>> +}
>> +
>> +static u32 guc_ads_private_data_offset(struct intel_guc *guc)
>> +{
>> +	u32 offset;
>> +
>> +	offset = guc_ads_golden_ctxt_offset(guc) +
>> +		 guc_ads_golden_ctxt_size(guc);
>> +
>>   	return PAGE_ALIGN(offset);
>>   }
>>   
>> @@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
>>   	GEM_BUG_ON(temp_set.size);
>>   }
>>   
>> -/*
>> - * The first 80 dwords of the register state context, containing the
>> - * execlists and ppgtt registers.
>> - */
>> -#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
>> +static void fill_engine_enable_masks(struct intel_gt *gt,
>> +				     struct guc_gt_system_info *info)
>> +{
>> +	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
>> +	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
>> +	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
>> +	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
>> +}
>>   
>> -static void __guc_ads_init(struct intel_guc *guc)
>> +/* Skip execlist and PPGTT registers */
>> +#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
>> +#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
>> +
>> +static int guc_prep_golden_context(struct intel_guc *guc,
>> +				   struct __guc_ads_blob *blob)
>>   {
>>   	struct intel_gt *gt = guc_to_gt(guc);
>> -	struct drm_i915_private *i915 = gt->i915;
>> +	u32 addr_ggtt, offset;
>> +	u32 total_size = 0, alloc_size, real_size;
>> +	u8 engine_class, guc_class;
>> +	struct guc_gt_system_info *info, local_info;
>> +
>> +	/*
>> +	 * Reserve the memory for the golden contexts and point GuC at it but
>> +	 * leave it empty for now. The context data will be filled in later
>> +	 * once there is something available to put there.
>> +	 *
>> +	 * Note that the HWSP and ring context are not included.
>> +	 *
>> +	 * Note also that the storage must be pinned in the GGTT, so that the
>> +	 * address won't change after GuC has been told where to find it. The
>> +	 * GuC will also validate that the LRC base + size fall within the
>> +	 * allowed GGTT range.
>> +	 */
>> +	if (blob) {
>> +		offset = guc_ads_golden_ctxt_offset(guc);
>> +		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
>> +		info = &blob->system_info;
>> +	} else {
>> +		memset(&local_info, 0, sizeof(local_info));
>> +		info = &local_info;
>> +		fill_engine_enable_masks(gt, info);
>> +	}
>> +
>> +	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
>> +		if (engine_class == OTHER_CLASS)
>> +			continue;
>> +
>> +		guc_class = engine_class_to_guc_class(engine_class);
>> +
>> +		if (!info->engine_enabled_masks[guc_class])
>> +			continue;
>> +
>> +		real_size = intel_engine_context_size(gt, engine_class);
>> +		alloc_size = PAGE_ALIGN(real_size);
>> +		total_size += alloc_size;
>> +
>> +		if (!blob)
>> +			continue;
>> +
>> +		blob->ads.eng_state_size[guc_class] = real_size;
>> +		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
>> +		addr_ggtt += alloc_size;
>> +	}
>> +
>> +	if (!blob)
>> +		return total_size;
>> +
>> +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
>> +	return total_size;
>> +}
>> +
>> +static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
>> +{
>> +	struct intel_engine_cs *engine;
>> +	enum intel_engine_id id;
>> +
>> +	for_each_engine(engine, gt, id) {
>> +		if (engine->class != engine_class)
>> +			continue;
>> +
>> +		if (!engine->default_state)
>> +			continue;
>> +
>> +		return engine;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static void guc_init_golden_context(struct intel_guc *guc)
>> +{
>>   	struct __guc_ads_blob *blob = guc->ads_blob;
>> -	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
>> -	u32 base;
>> +	struct intel_engine_cs *engine;
>> +	struct intel_gt *gt = guc_to_gt(guc);
>> +	u32 addr_ggtt, offset;
>> +	u32 total_size = 0, alloc_size, real_size;
>>   	u8 engine_class, guc_class;
>> +	u8 *ptr;
>>   
>> -	/* GuC scheduling policies */
>> -	guc_policies_init(guc, &blob->policies);
>> +	if (!intel_uc_uses_guc_submission(&gt->uc))
>> +		return;
>> +
>> +	GEM_BUG_ON(!blob);
>>   
>>   	/*
>> -	 * GuC expects a per-engine-class context image and size
>> -	 * (minus hwsp and ring context). The context image will be
>> -	 * used to reinitialize engines after a reset. It must exist
>> -	 * and be pinned in the GGTT, so that the address won't change after
>> -	 * we have told GuC where to find it. The context size will be used
>> -	 * to validate that the LRC base + size fall within allowed GGTT.
>> +	 * Go back and fill in the golden context data now that it is
>> +	 * available.
>>   	 */
>> +	offset = guc_ads_golden_ctxt_offset(guc);
>> +	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
>> +	ptr = ((u8 *) blob) + offset;
>> +
>>   	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
>>   		if (engine_class == OTHER_CLASS)
>>   			continue;
>>   
>>   		guc_class = engine_class_to_guc_class(engine_class);
>>   
>> -		/*
>> -		 * TODO: Set context pointer to default state to allow
>> -		 * GuC to re-init guilty contexts after internal reset.
>> -		 */
>> -		blob->ads.golden_context_lrca[guc_class] = 0;
>> -		blob->ads.eng_state_size[guc_class] =
>> -			intel_engine_context_size(gt, engine_class) -
>> -			skipped_size;
>> +		if (!blob->system_info.engine_enabled_masks[guc_class])
>> +			continue;
>> +
>> +		real_size = intel_engine_context_size(gt, engine_class);
>> +		alloc_size = PAGE_ALIGN(real_size);
>> +		total_size += alloc_size;
>> +
>> +		engine = find_engine_state(gt, engine_class);
>> +		if (!engine) {
>> +			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
>> +			blob->ads.eng_state_size[guc_class] = 0;
>> +			blob->ads.golden_context_lrca[guc_class] = 0;
>> +			continue;
>> +		}
>> +
>> +		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
>> +		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
>> +		addr_ggtt += alloc_size;
>> +
>> +		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);
> There is a confirmed memory corruption here, should be:
> shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size - SKIP_SIZE);
>
> Also a bit of nit but I think s/SKIP_SIZE/GOLDEN_CONTEXT_SKIP_SIZE/g
>
> As long you agree John, I'll fix both of these in the next rev.
>
> Matt
Yup. Good catch.

Rather than ballooning the name out, maybe make it and the LR_HW_ define 
local enums within this function? I don't think they are used anywhere else.

John.


>
>> +		ptr += alloc_size;
>>   	}
>>   
>> +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
>> +}
>> +
>> +static void __guc_ads_init(struct intel_guc *guc)
>> +{
>> +	struct intel_gt *gt = guc_to_gt(guc);
>> +	struct drm_i915_private *i915 = gt->i915;
>> +	struct __guc_ads_blob *blob = guc->ads_blob;
>> +	u32 base;
>> +
>> +	/* GuC scheduling policies */
>> +	guc_policies_init(guc, &blob->policies);
>> +
>>   	/* System info */
>> -	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
>> -	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
>> -	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
>> -	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
>> +	fill_engine_enable_masks(gt, &blob->system_info);
>>   
>>   	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
>>   		hweight8(gt->info.sseu.slice_mask);
>> @@ -380,6 +511,9 @@ static void __guc_ads_init(struct intel_guc *guc)
>>   			 GEN12_DOORBELLS_PER_SQIDI) + 1;
>>   	}
>>   
>> +	/* Golden contexts for re-initialising after a watchdog reset */
>> +	guc_prep_golden_context(guc, blob);
>> +
>>   	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
>>   
>>   	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
>> @@ -417,6 +551,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
>>   		return ret;
>>   	guc->ads_regset_size = ret;
>>   
>> +	/* Likewise the golden contexts: */
>> +	ret = guc_prep_golden_context(guc, NULL);
>> +	if (ret < 0)
>> +		return ret;
>> +	guc->ads_golden_ctxt_size = ret;
>> +
>> +	/* Now the total size can be determined: */
>>   	size = guc_ads_blob_size(guc);
>>   
>>   	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
>> @@ -429,6 +570,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
>>   	return 0;
>>   }
>>   
>> +void intel_guc_ads_init_late(struct intel_guc *guc)
>> +{
>> +	/*
>> +	 * The golden context setup requires the saved engine state from
>> +	 * __engines_record_defaults(). However, that requires engines to be
>> +	 * operational which means the ADS must already have been configured.
>> +	 * Fortunately, the golden context state is not needed until a hang
>> +	 * occurs, so it can be filled in during this late init phase.
>> +	 */
>> +	guc_init_golden_context(guc);
>> +}
>> +
>>   void intel_guc_ads_destroy(struct intel_guc *guc)
>>   {
>>   	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
>> index bdcb339a5321..3d85051d57e4 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
>> @@ -11,6 +11,7 @@ struct drm_printer;
>>   
>>   int intel_guc_ads_create(struct intel_guc *guc);
>>   void intel_guc_ads_destroy(struct intel_guc *guc);
>> +void intel_guc_ads_init_late(struct intel_guc *guc);
>>   void intel_guc_ads_reset(struct intel_guc *guc);
>>   void intel_guc_ads_print_policy_info(struct intel_guc *guc,
>>   				     struct drm_printer *p);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> index 77c1fe2ed883..7a69c3c027e9 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> @@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
>>   		uc->ops = &uc_ops_off;
>>   }
>>   
>> +void intel_uc_init_late(struct intel_uc *uc)
>> +{
>> +	intel_guc_init_late(&uc->guc);
>> +}
>> +
>>   void intel_uc_driver_late_release(struct intel_uc *uc)
>>   {
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>> index 91315e3f1c58..e2da2b6e76e1 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>> @@ -35,6 +35,7 @@ struct intel_uc {
>>   };
>>   
>>   void intel_uc_init_early(struct intel_uc *uc);
>> +void intel_uc_init_late(struct intel_uc *uc);
>>   void intel_uc_driver_late_release(struct intel_uc *uc);
>>   void intel_uc_driver_remove(struct intel_uc *uc);
>>   void intel_uc_init_mmio(struct intel_uc *uc);
>> -- 
>> 2.28.0
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS
@ 2021-07-19 18:25       ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-19 18:25 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/19/2021 10:24, Matthew Brost wrote:
> On Fri, Jul 16, 2021 at 01:17:14PM -0700, Matthew Brost wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The media watchdog mechanism involves GuC doing a silent reset and
>> continue of the hung context. This requires the i915 driver provide a
>> golden context to GuC in the ADS.
>>
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
>>   drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
>>   drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
>>   7 files changed, 199 insertions(+), 30 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>> index acfdd53b2678..ceeb517ba259 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>> @@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
>>   	if (err)
>>   		goto err_gt;
>>   
>> +	intel_uc_init_late(&gt->uc);
>> +
>>   	err = i915_inject_probe_error(gt->i915, -EIO);
>>   	if (err)
>>   		goto err_gt;
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> index 68266cbffd1f..979128e28372 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> @@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
>>   	}
>>   }
>>   
>> +void intel_guc_init_late(struct intel_guc *guc)
>> +{
>> +	intel_guc_ads_init_late(guc);
>> +}
>> +
>>   static u32 guc_ctl_debug_flags(struct intel_guc *guc)
>>   {
>>   	u32 level = intel_guc_log_get_level(&guc->log);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> index bc71635c70b9..dc18ac510ac8 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>> @@ -60,6 +60,7 @@ struct intel_guc {
>>   	struct i915_vma *ads_vma;
>>   	struct __guc_ads_blob *ads_blob;
>>   	u32 ads_regset_size;
>> +	u32 ads_golden_ctxt_size;
>>   
>>   	struct i915_vma *lrc_desc_pool;
>>   	void *lrc_desc_pool_vaddr;
>> @@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
>>   }
>>   
>>   void intel_guc_init_early(struct intel_guc *guc);
>> +void intel_guc_init_late(struct intel_guc *guc);
>>   void intel_guc_init_send_regs(struct intel_guc *guc);
>>   void intel_guc_write_params(struct intel_guc *guc);
>>   int intel_guc_init(struct intel_guc *guc);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
>> index 93b0ac35a508..241b3089b658 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
>> @@ -7,6 +7,7 @@
>>   
>>   #include "gt/intel_gt.h"
>>   #include "gt/intel_lrc.h"
>> +#include "gt/shmem_utils.h"
>>   #include "intel_guc_ads.h"
>>   #include "intel_guc_fwif.h"
>>   #include "intel_uc.h"
>> @@ -33,6 +34,10 @@
>>    *      +---------------------------------------+ <== dynamic
>>    *      | padding                               |
>>    *      +---------------------------------------+ <== 4K aligned
>> + *      | golden contexts                       |
>> + *      +---------------------------------------+
>> + *      | padding                               |
>> + *      +---------------------------------------+ <== 4K aligned
>>    *      | private data                          |
>>    *      +---------------------------------------+
>>    *      | padding                               |
>> @@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
>>   	return guc->ads_regset_size;
>>   }
>>   
>> +static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
>> +{
>> +	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
>> +}
>> +
>>   static u32 guc_ads_private_data_size(struct intel_guc *guc)
>>   {
>>   	return PAGE_ALIGN(guc->fw.private_data_size);
>> @@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
>>   	return offsetof(struct __guc_ads_blob, regset);
>>   }
>>   
>> -static u32 guc_ads_private_data_offset(struct intel_guc *guc)
>> +static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
>>   {
>>   	u32 offset;
>>   
>>   	offset = guc_ads_regset_offset(guc) +
>>   		 guc_ads_regset_size(guc);
>> +
>> +	return PAGE_ALIGN(offset);
>> +}
>> +
>> +static u32 guc_ads_private_data_offset(struct intel_guc *guc)
>> +{
>> +	u32 offset;
>> +
>> +	offset = guc_ads_golden_ctxt_offset(guc) +
>> +		 guc_ads_golden_ctxt_size(guc);
>> +
>>   	return PAGE_ALIGN(offset);
>>   }
>>   
>> @@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
>>   	GEM_BUG_ON(temp_set.size);
>>   }
>>   
>> -/*
>> - * The first 80 dwords of the register state context, containing the
>> - * execlists and ppgtt registers.
>> - */
>> -#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
>> +static void fill_engine_enable_masks(struct intel_gt *gt,
>> +				     struct guc_gt_system_info *info)
>> +{
>> +	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
>> +	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
>> +	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
>> +	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
>> +}
>>   
>> -static void __guc_ads_init(struct intel_guc *guc)
>> +/* Skip execlist and PPGTT registers */
>> +#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
>> +#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
>> +
>> +static int guc_prep_golden_context(struct intel_guc *guc,
>> +				   struct __guc_ads_blob *blob)
>>   {
>>   	struct intel_gt *gt = guc_to_gt(guc);
>> -	struct drm_i915_private *i915 = gt->i915;
>> +	u32 addr_ggtt, offset;
>> +	u32 total_size = 0, alloc_size, real_size;
>> +	u8 engine_class, guc_class;
>> +	struct guc_gt_system_info *info, local_info;
>> +
>> +	/*
>> +	 * Reserve the memory for the golden contexts and point GuC at it but
>> +	 * leave it empty for now. The context data will be filled in later
>> +	 * once there is something available to put there.
>> +	 *
>> +	 * Note that the HWSP and ring context are not included.
>> +	 *
>> +	 * Note also that the storage must be pinned in the GGTT, so that the
>> +	 * address won't change after GuC has been told where to find it. The
>> +	 * GuC will also validate that the LRC base + size fall within the
>> +	 * allowed GGTT range.
>> +	 */
>> +	if (blob) {
>> +		offset = guc_ads_golden_ctxt_offset(guc);
>> +		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
>> +		info = &blob->system_info;
>> +	} else {
>> +		memset(&local_info, 0, sizeof(local_info));
>> +		info = &local_info;
>> +		fill_engine_enable_masks(gt, info);
>> +	}
>> +
>> +	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
>> +		if (engine_class == OTHER_CLASS)
>> +			continue;
>> +
>> +		guc_class = engine_class_to_guc_class(engine_class);
>> +
>> +		if (!info->engine_enabled_masks[guc_class])
>> +			continue;
>> +
>> +		real_size = intel_engine_context_size(gt, engine_class);
>> +		alloc_size = PAGE_ALIGN(real_size);
>> +		total_size += alloc_size;
>> +
>> +		if (!blob)
>> +			continue;
>> +
>> +		blob->ads.eng_state_size[guc_class] = real_size;
>> +		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
>> +		addr_ggtt += alloc_size;
>> +	}
>> +
>> +	if (!blob)
>> +		return total_size;
>> +
>> +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
>> +	return total_size;
>> +}
>> +
>> +static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
>> +{
>> +	struct intel_engine_cs *engine;
>> +	enum intel_engine_id id;
>> +
>> +	for_each_engine(engine, gt, id) {
>> +		if (engine->class != engine_class)
>> +			continue;
>> +
>> +		if (!engine->default_state)
>> +			continue;
>> +
>> +		return engine;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static void guc_init_golden_context(struct intel_guc *guc)
>> +{
>>   	struct __guc_ads_blob *blob = guc->ads_blob;
>> -	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
>> -	u32 base;
>> +	struct intel_engine_cs *engine;
>> +	struct intel_gt *gt = guc_to_gt(guc);
>> +	u32 addr_ggtt, offset;
>> +	u32 total_size = 0, alloc_size, real_size;
>>   	u8 engine_class, guc_class;
>> +	u8 *ptr;
>>   
>> -	/* GuC scheduling policies */
>> -	guc_policies_init(guc, &blob->policies);
>> +	if (!intel_uc_uses_guc_submission(&gt->uc))
>> +		return;
>> +
>> +	GEM_BUG_ON(!blob);
>>   
>>   	/*
>> -	 * GuC expects a per-engine-class context image and size
>> -	 * (minus hwsp and ring context). The context image will be
>> -	 * used to reinitialize engines after a reset. It must exist
>> -	 * and be pinned in the GGTT, so that the address won't change after
>> -	 * we have told GuC where to find it. The context size will be used
>> -	 * to validate that the LRC base + size fall within allowed GGTT.
>> +	 * Go back and fill in the golden context data now that it is
>> +	 * available.
>>   	 */
>> +	offset = guc_ads_golden_ctxt_offset(guc);
>> +	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
>> +	ptr = ((u8 *) blob) + offset;
>> +
>>   	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
>>   		if (engine_class == OTHER_CLASS)
>>   			continue;
>>   
>>   		guc_class = engine_class_to_guc_class(engine_class);
>>   
>> -		/*
>> -		 * TODO: Set context pointer to default state to allow
>> -		 * GuC to re-init guilty contexts after internal reset.
>> -		 */
>> -		blob->ads.golden_context_lrca[guc_class] = 0;
>> -		blob->ads.eng_state_size[guc_class] =
>> -			intel_engine_context_size(gt, engine_class) -
>> -			skipped_size;
>> +		if (!blob->system_info.engine_enabled_masks[guc_class])
>> +			continue;
>> +
>> +		real_size = intel_engine_context_size(gt, engine_class);
>> +		alloc_size = PAGE_ALIGN(real_size);
>> +		total_size += alloc_size;
>> +
>> +		engine = find_engine_state(gt, engine_class);
>> +		if (!engine) {
>> +			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
>> +			blob->ads.eng_state_size[guc_class] = 0;
>> +			blob->ads.golden_context_lrca[guc_class] = 0;
>> +			continue;
>> +		}
>> +
>> +		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
>> +		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
>> +		addr_ggtt += alloc_size;
>> +
>> +		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);
> There is a confirmed memory corruption here, should be:
> shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size - SKIP_SIZE);
>
> Also a bit of nit but I think s/SKIP_SIZE/GOLDEN_CONTEXT_SKIP_SIZE/g
>
> As long you agree John, I'll fix both of these in the next rev.
>
> Matt
Yup. Good catch.

Rather than ballooning the name out, maybe make it and the LR_HW_ define 
local enums within this function? I don't think they are used anywhere else.

John.


>
>> +		ptr += alloc_size;
>>   	}
>>   
>> +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
>> +}
>> +
>> +static void __guc_ads_init(struct intel_guc *guc)
>> +{
>> +	struct intel_gt *gt = guc_to_gt(guc);
>> +	struct drm_i915_private *i915 = gt->i915;
>> +	struct __guc_ads_blob *blob = guc->ads_blob;
>> +	u32 base;
>> +
>> +	/* GuC scheduling policies */
>> +	guc_policies_init(guc, &blob->policies);
>> +
>>   	/* System info */
>> -	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
>> -	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
>> -	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
>> -	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
>> +	fill_engine_enable_masks(gt, &blob->system_info);
>>   
>>   	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
>>   		hweight8(gt->info.sseu.slice_mask);
>> @@ -380,6 +511,9 @@ static void __guc_ads_init(struct intel_guc *guc)
>>   			 GEN12_DOORBELLS_PER_SQIDI) + 1;
>>   	}
>>   
>> +	/* Golden contexts for re-initialising after a watchdog reset */
>> +	guc_prep_golden_context(guc, blob);
>> +
>>   	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
>>   
>>   	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
>> @@ -417,6 +551,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
>>   		return ret;
>>   	guc->ads_regset_size = ret;
>>   
>> +	/* Likewise the golden contexts: */
>> +	ret = guc_prep_golden_context(guc, NULL);
>> +	if (ret < 0)
>> +		return ret;
>> +	guc->ads_golden_ctxt_size = ret;
>> +
>> +	/* Now the total size can be determined: */
>>   	size = guc_ads_blob_size(guc);
>>   
>>   	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
>> @@ -429,6 +570,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
>>   	return 0;
>>   }
>>   
>> +void intel_guc_ads_init_late(struct intel_guc *guc)
>> +{
>> +	/*
>> +	 * The golden context setup requires the saved engine state from
>> +	 * __engines_record_defaults(). However, that requires engines to be
>> +	 * operational which means the ADS must already have been configured.
>> +	 * Fortunately, the golden context state is not needed until a hang
>> +	 * occurs, so it can be filled in during this late init phase.
>> +	 */
>> +	guc_init_golden_context(guc);
>> +}
>> +
>>   void intel_guc_ads_destroy(struct intel_guc *guc)
>>   {
>>   	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
>> index bdcb339a5321..3d85051d57e4 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
>> @@ -11,6 +11,7 @@ struct drm_printer;
>>   
>>   int intel_guc_ads_create(struct intel_guc *guc);
>>   void intel_guc_ads_destroy(struct intel_guc *guc);
>> +void intel_guc_ads_init_late(struct intel_guc *guc);
>>   void intel_guc_ads_reset(struct intel_guc *guc);
>>   void intel_guc_ads_print_policy_info(struct intel_guc *guc,
>>   				     struct drm_printer *p);
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> index 77c1fe2ed883..7a69c3c027e9 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
>> @@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
>>   		uc->ops = &uc_ops_off;
>>   }
>>   
>> +void intel_uc_init_late(struct intel_uc *uc)
>> +{
>> +	intel_guc_init_late(&uc->guc);
>> +}
>> +
>>   void intel_uc_driver_late_release(struct intel_uc *uc)
>>   {
>>   }
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>> index 91315e3f1c58..e2da2b6e76e1 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>> @@ -35,6 +35,7 @@ struct intel_uc {
>>   };
>>   
>>   void intel_uc_init_early(struct intel_uc *uc);
>> +void intel_uc_init_late(struct intel_uc *uc);
>>   void intel_uc_driver_late_release(struct intel_uc *uc);
>>   void intel_uc_driver_remove(struct intel_uc *uc);
>>   void intel_uc_init_mmio(struct intel_uc *uc);
>> -- 
>> 2.28.0
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS
  2021-07-19 18:25       ` John Harrison
@ 2021-07-19 18:30         ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 18:30 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 11:25:55AM -0700, John Harrison wrote:
> On 7/19/2021 10:24, Matthew Brost wrote:
> > On Fri, Jul 16, 2021 at 01:17:14PM -0700, Matthew Brost wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > The media watchdog mechanism involves GuC doing a silent reset and
> > > continue of the hung context. This requires the i915 driver provide a
> > > golden context to GuC in the ADS.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
> > >   7 files changed, 199 insertions(+), 30 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > index acfdd53b2678..ceeb517ba259 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > @@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
> > >   	if (err)
> > >   		goto err_gt;
> > > +	intel_uc_init_late(&gt->uc);
> > > +
> > >   	err = i915_inject_probe_error(gt->i915, -EIO);
> > >   	if (err)
> > >   		goto err_gt;
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > index 68266cbffd1f..979128e28372 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > @@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
> > >   	}
> > >   }
> > > +void intel_guc_init_late(struct intel_guc *guc)
> > > +{
> > > +	intel_guc_ads_init_late(guc);
> > > +}
> > > +
> > >   static u32 guc_ctl_debug_flags(struct intel_guc *guc)
> > >   {
> > >   	u32 level = intel_guc_log_get_level(&guc->log);
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > index bc71635c70b9..dc18ac510ac8 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > @@ -60,6 +60,7 @@ struct intel_guc {
> > >   	struct i915_vma *ads_vma;
> > >   	struct __guc_ads_blob *ads_blob;
> > >   	u32 ads_regset_size;
> > > +	u32 ads_golden_ctxt_size;
> > >   	struct i915_vma *lrc_desc_pool;
> > >   	void *lrc_desc_pool_vaddr;
> > > @@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
> > >   }
> > >   void intel_guc_init_early(struct intel_guc *guc);
> > > +void intel_guc_init_late(struct intel_guc *guc);
> > >   void intel_guc_init_send_regs(struct intel_guc *guc);
> > >   void intel_guc_write_params(struct intel_guc *guc);
> > >   int intel_guc_init(struct intel_guc *guc);
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> > > index 93b0ac35a508..241b3089b658 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> > > @@ -7,6 +7,7 @@
> > >   #include "gt/intel_gt.h"
> > >   #include "gt/intel_lrc.h"
> > > +#include "gt/shmem_utils.h"
> > >   #include "intel_guc_ads.h"
> > >   #include "intel_guc_fwif.h"
> > >   #include "intel_uc.h"
> > > @@ -33,6 +34,10 @@
> > >    *      +---------------------------------------+ <== dynamic
> > >    *      | padding                               |
> > >    *      +---------------------------------------+ <== 4K aligned
> > > + *      | golden contexts                       |
> > > + *      +---------------------------------------+
> > > + *      | padding                               |
> > > + *      +---------------------------------------+ <== 4K aligned
> > >    *      | private data                          |
> > >    *      +---------------------------------------+
> > >    *      | padding                               |
> > > @@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
> > >   	return guc->ads_regset_size;
> > >   }
> > > +static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
> > > +{
> > > +	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
> > > +}
> > > +
> > >   static u32 guc_ads_private_data_size(struct intel_guc *guc)
> > >   {
> > >   	return PAGE_ALIGN(guc->fw.private_data_size);
> > > @@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
> > >   	return offsetof(struct __guc_ads_blob, regset);
> > >   }
> > > -static u32 guc_ads_private_data_offset(struct intel_guc *guc)
> > > +static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
> > >   {
> > >   	u32 offset;
> > >   	offset = guc_ads_regset_offset(guc) +
> > >   		 guc_ads_regset_size(guc);
> > > +
> > > +	return PAGE_ALIGN(offset);
> > > +}
> > > +
> > > +static u32 guc_ads_private_data_offset(struct intel_guc *guc)
> > > +{
> > > +	u32 offset;
> > > +
> > > +	offset = guc_ads_golden_ctxt_offset(guc) +
> > > +		 guc_ads_golden_ctxt_size(guc);
> > > +
> > >   	return PAGE_ALIGN(offset);
> > >   }
> > > @@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
> > >   	GEM_BUG_ON(temp_set.size);
> > >   }
> > > -/*
> > > - * The first 80 dwords of the register state context, containing the
> > > - * execlists and ppgtt registers.
> > > - */
> > > -#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
> > > +static void fill_engine_enable_masks(struct intel_gt *gt,
> > > +				     struct guc_gt_system_info *info)
> > > +{
> > > +	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
> > > +	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
> > > +	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
> > > +	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
> > > +}
> > > -static void __guc_ads_init(struct intel_guc *guc)
> > > +/* Skip execlist and PPGTT registers */
> > > +#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
> > > +#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
> > > +
> > > +static int guc_prep_golden_context(struct intel_guc *guc,
> > > +				   struct __guc_ads_blob *blob)
> > >   {
> > >   	struct intel_gt *gt = guc_to_gt(guc);
> > > -	struct drm_i915_private *i915 = gt->i915;
> > > +	u32 addr_ggtt, offset;
> > > +	u32 total_size = 0, alloc_size, real_size;
> > > +	u8 engine_class, guc_class;
> > > +	struct guc_gt_system_info *info, local_info;
> > > +
> > > +	/*
> > > +	 * Reserve the memory for the golden contexts and point GuC at it but
> > > +	 * leave it empty for now. The context data will be filled in later
> > > +	 * once there is something available to put there.
> > > +	 *
> > > +	 * Note that the HWSP and ring context are not included.
> > > +	 *
> > > +	 * Note also that the storage must be pinned in the GGTT, so that the
> > > +	 * address won't change after GuC has been told where to find it. The
> > > +	 * GuC will also validate that the LRC base + size fall within the
> > > +	 * allowed GGTT range.
> > > +	 */
> > > +	if (blob) {
> > > +		offset = guc_ads_golden_ctxt_offset(guc);
> > > +		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> > > +		info = &blob->system_info;
> > > +	} else {
> > > +		memset(&local_info, 0, sizeof(local_info));
> > > +		info = &local_info;
> > > +		fill_engine_enable_masks(gt, info);
> > > +	}
> > > +
> > > +	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
> > > +		if (engine_class == OTHER_CLASS)
> > > +			continue;
> > > +
> > > +		guc_class = engine_class_to_guc_class(engine_class);
> > > +
> > > +		if (!info->engine_enabled_masks[guc_class])
> > > +			continue;
> > > +
> > > +		real_size = intel_engine_context_size(gt, engine_class);
> > > +		alloc_size = PAGE_ALIGN(real_size);
> > > +		total_size += alloc_size;
> > > +
> > > +		if (!blob)
> > > +			continue;
> > > +
> > > +		blob->ads.eng_state_size[guc_class] = real_size;
> > > +		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
> > > +		addr_ggtt += alloc_size;
> > > +	}
> > > +
> > > +	if (!blob)
> > > +		return total_size;
> > > +
> > > +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
> > > +	return total_size;
> > > +}
> > > +
> > > +static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
> > > +{
> > > +	struct intel_engine_cs *engine;
> > > +	enum intel_engine_id id;
> > > +
> > > +	for_each_engine(engine, gt, id) {
> > > +		if (engine->class != engine_class)
> > > +			continue;
> > > +
> > > +		if (!engine->default_state)
> > > +			continue;
> > > +
> > > +		return engine;
> > > +	}
> > > +
> > > +	return NULL;
> > > +}
> > > +
> > > +static void guc_init_golden_context(struct intel_guc *guc)
> > > +{
> > >   	struct __guc_ads_blob *blob = guc->ads_blob;
> > > -	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
> > > -	u32 base;
> > > +	struct intel_engine_cs *engine;
> > > +	struct intel_gt *gt = guc_to_gt(guc);
> > > +	u32 addr_ggtt, offset;
> > > +	u32 total_size = 0, alloc_size, real_size;
> > >   	u8 engine_class, guc_class;
> > > +	u8 *ptr;
> > > -	/* GuC scheduling policies */
> > > -	guc_policies_init(guc, &blob->policies);
> > > +	if (!intel_uc_uses_guc_submission(&gt->uc))
> > > +		return;
> > > +
> > > +	GEM_BUG_ON(!blob);
> > >   	/*
> > > -	 * GuC expects a per-engine-class context image and size
> > > -	 * (minus hwsp and ring context). The context image will be
> > > -	 * used to reinitialize engines after a reset. It must exist
> > > -	 * and be pinned in the GGTT, so that the address won't change after
> > > -	 * we have told GuC where to find it. The context size will be used
> > > -	 * to validate that the LRC base + size fall within allowed GGTT.
> > > +	 * Go back and fill in the golden context data now that it is
> > > +	 * available.
> > >   	 */
> > > +	offset = guc_ads_golden_ctxt_offset(guc);
> > > +	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> > > +	ptr = ((u8 *) blob) + offset;
> > > +
> > >   	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
> > >   		if (engine_class == OTHER_CLASS)
> > >   			continue;
> > >   		guc_class = engine_class_to_guc_class(engine_class);
> > > -		/*
> > > -		 * TODO: Set context pointer to default state to allow
> > > -		 * GuC to re-init guilty contexts after internal reset.
> > > -		 */
> > > -		blob->ads.golden_context_lrca[guc_class] = 0;
> > > -		blob->ads.eng_state_size[guc_class] =
> > > -			intel_engine_context_size(gt, engine_class) -
> > > -			skipped_size;
> > > +		if (!blob->system_info.engine_enabled_masks[guc_class])
> > > +			continue;
> > > +
> > > +		real_size = intel_engine_context_size(gt, engine_class);
> > > +		alloc_size = PAGE_ALIGN(real_size);
> > > +		total_size += alloc_size;
> > > +
> > > +		engine = find_engine_state(gt, engine_class);
> > > +		if (!engine) {
> > > +			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
> > > +			blob->ads.eng_state_size[guc_class] = 0;
> > > +			blob->ads.golden_context_lrca[guc_class] = 0;
> > > +			continue;
> > > +		}
> > > +
> > > +		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
> > > +		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
> > > +		addr_ggtt += alloc_size;
> > > +
> > > +		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);
> > There is a confirmed memory corruption here, should be:
> > shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size - SKIP_SIZE);
> > 
> > Also a bit of nit but I think s/SKIP_SIZE/GOLDEN_CONTEXT_SKIP_SIZE/g
> > 
> > As long you agree John, I'll fix both of these in the next rev.
> > 
> > Matt
> Yup. Good catch.
> 
> Rather than ballooning the name out, maybe make it and the LR_HW_ define
> local enums within this function? I don't think they are used anywhere else.
> 

I like this better too as you say they are specific to a single function.

With those changes:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> John.
> 
> 
> > 
> > > +		ptr += alloc_size;
> > >   	}
> > > +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
> > > +}
> > > +
> > > +static void __guc_ads_init(struct intel_guc *guc)
> > > +{
> > > +	struct intel_gt *gt = guc_to_gt(guc);
> > > +	struct drm_i915_private *i915 = gt->i915;
> > > +	struct __guc_ads_blob *blob = guc->ads_blob;
> > > +	u32 base;
> > > +
> > > +	/* GuC scheduling policies */
> > > +	guc_policies_init(guc, &blob->policies);
> > > +
> > >   	/* System info */
> > > -	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
> > > -	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
> > > -	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
> > > -	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
> > > +	fill_engine_enable_masks(gt, &blob->system_info);
> > >   	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
> > >   		hweight8(gt->info.sseu.slice_mask);
> > > @@ -380,6 +511,9 @@ static void __guc_ads_init(struct intel_guc *guc)
> > >   			 GEN12_DOORBELLS_PER_SQIDI) + 1;
> > >   	}
> > > +	/* Golden contexts for re-initialising after a watchdog reset */
> > > +	guc_prep_golden_context(guc, blob);
> > > +
> > >   	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
> > >   	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
> > > @@ -417,6 +551,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
> > >   		return ret;
> > >   	guc->ads_regset_size = ret;
> > > +	/* Likewise the golden contexts: */
> > > +	ret = guc_prep_golden_context(guc, NULL);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +	guc->ads_golden_ctxt_size = ret;
> > > +
> > > +	/* Now the total size can be determined: */
> > >   	size = guc_ads_blob_size(guc);
> > >   	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
> > > @@ -429,6 +570,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
> > >   	return 0;
> > >   }
> > > +void intel_guc_ads_init_late(struct intel_guc *guc)
> > > +{
> > > +	/*
> > > +	 * The golden context setup requires the saved engine state from
> > > +	 * __engines_record_defaults(). However, that requires engines to be
> > > +	 * operational which means the ADS must already have been configured.
> > > +	 * Fortunately, the golden context state is not needed until a hang
> > > +	 * occurs, so it can be filled in during this late init phase.
> > > +	 */
> > > +	guc_init_golden_context(guc);
> > > +}
> > > +
> > >   void intel_guc_ads_destroy(struct intel_guc *guc)
> > >   {
> > >   	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> > > index bdcb339a5321..3d85051d57e4 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> > > @@ -11,6 +11,7 @@ struct drm_printer;
> > >   int intel_guc_ads_create(struct intel_guc *guc);
> > >   void intel_guc_ads_destroy(struct intel_guc *guc);
> > > +void intel_guc_ads_init_late(struct intel_guc *guc);
> > >   void intel_guc_ads_reset(struct intel_guc *guc);
> > >   void intel_guc_ads_print_policy_info(struct intel_guc *guc,
> > >   				     struct drm_printer *p);
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > index 77c1fe2ed883..7a69c3c027e9 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > @@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
> > >   		uc->ops = &uc_ops_off;
> > >   }
> > > +void intel_uc_init_late(struct intel_uc *uc)
> > > +{
> > > +	intel_guc_init_late(&uc->guc);
> > > +}
> > > +
> > >   void intel_uc_driver_late_release(struct intel_uc *uc)
> > >   {
> > >   }
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > index 91315e3f1c58..e2da2b6e76e1 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > @@ -35,6 +35,7 @@ struct intel_uc {
> > >   };
> > >   void intel_uc_init_early(struct intel_uc *uc);
> > > +void intel_uc_init_late(struct intel_uc *uc);
> > >   void intel_uc_driver_late_release(struct intel_uc *uc);
> > >   void intel_uc_driver_remove(struct intel_uc *uc);
> > >   void intel_uc_init_mmio(struct intel_uc *uc);
> > > -- 
> > > 2.28.0
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS
@ 2021-07-19 18:30         ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 18:30 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 11:25:55AM -0700, John Harrison wrote:
> On 7/19/2021 10:24, Matthew Brost wrote:
> > On Fri, Jul 16, 2021 at 01:17:14PM -0700, Matthew Brost wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > The media watchdog mechanism involves GuC doing a silent reset and
> > > continue of the hung context. This requires the i915 driver provide a
> > > golden context to GuC in the ADS.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_gt.c         |   2 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc.c     |   5 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc.h     |   2 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 213 ++++++++++++++++++---
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h |   1 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_uc.c      |   5 +
> > >   drivers/gpu/drm/i915/gt/uc/intel_uc.h      |   1 +
> > >   7 files changed, 199 insertions(+), 30 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > index acfdd53b2678..ceeb517ba259 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > > @@ -654,6 +654,8 @@ int intel_gt_init(struct intel_gt *gt)
> > >   	if (err)
> > >   		goto err_gt;
> > > +	intel_uc_init_late(&gt->uc);
> > > +
> > >   	err = i915_inject_probe_error(gt->i915, -EIO);
> > >   	if (err)
> > >   		goto err_gt;
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > index 68266cbffd1f..979128e28372 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > > @@ -180,6 +180,11 @@ void intel_guc_init_early(struct intel_guc *guc)
> > >   	}
> > >   }
> > > +void intel_guc_init_late(struct intel_guc *guc)
> > > +{
> > > +	intel_guc_ads_init_late(guc);
> > > +}
> > > +
> > >   static u32 guc_ctl_debug_flags(struct intel_guc *guc)
> > >   {
> > >   	u32 level = intel_guc_log_get_level(&guc->log);
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > index bc71635c70b9..dc18ac510ac8 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > @@ -60,6 +60,7 @@ struct intel_guc {
> > >   	struct i915_vma *ads_vma;
> > >   	struct __guc_ads_blob *ads_blob;
> > >   	u32 ads_regset_size;
> > > +	u32 ads_golden_ctxt_size;
> > >   	struct i915_vma *lrc_desc_pool;
> > >   	void *lrc_desc_pool_vaddr;
> > > @@ -176,6 +177,7 @@ static inline u32 intel_guc_ggtt_offset(struct intel_guc *guc,
> > >   }
> > >   void intel_guc_init_early(struct intel_guc *guc);
> > > +void intel_guc_init_late(struct intel_guc *guc);
> > >   void intel_guc_init_send_regs(struct intel_guc *guc);
> > >   void intel_guc_write_params(struct intel_guc *guc);
> > >   int intel_guc_init(struct intel_guc *guc);
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> > > index 93b0ac35a508..241b3089b658 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> > > @@ -7,6 +7,7 @@
> > >   #include "gt/intel_gt.h"
> > >   #include "gt/intel_lrc.h"
> > > +#include "gt/shmem_utils.h"
> > >   #include "intel_guc_ads.h"
> > >   #include "intel_guc_fwif.h"
> > >   #include "intel_uc.h"
> > > @@ -33,6 +34,10 @@
> > >    *      +---------------------------------------+ <== dynamic
> > >    *      | padding                               |
> > >    *      +---------------------------------------+ <== 4K aligned
> > > + *      | golden contexts                       |
> > > + *      +---------------------------------------+
> > > + *      | padding                               |
> > > + *      +---------------------------------------+ <== 4K aligned
> > >    *      | private data                          |
> > >    *      +---------------------------------------+
> > >    *      | padding                               |
> > > @@ -52,6 +57,11 @@ static u32 guc_ads_regset_size(struct intel_guc *guc)
> > >   	return guc->ads_regset_size;
> > >   }
> > > +static u32 guc_ads_golden_ctxt_size(struct intel_guc *guc)
> > > +{
> > > +	return PAGE_ALIGN(guc->ads_golden_ctxt_size);
> > > +}
> > > +
> > >   static u32 guc_ads_private_data_size(struct intel_guc *guc)
> > >   {
> > >   	return PAGE_ALIGN(guc->fw.private_data_size);
> > > @@ -62,12 +72,23 @@ static u32 guc_ads_regset_offset(struct intel_guc *guc)
> > >   	return offsetof(struct __guc_ads_blob, regset);
> > >   }
> > > -static u32 guc_ads_private_data_offset(struct intel_guc *guc)
> > > +static u32 guc_ads_golden_ctxt_offset(struct intel_guc *guc)
> > >   {
> > >   	u32 offset;
> > >   	offset = guc_ads_regset_offset(guc) +
> > >   		 guc_ads_regset_size(guc);
> > > +
> > > +	return PAGE_ALIGN(offset);
> > > +}
> > > +
> > > +static u32 guc_ads_private_data_offset(struct intel_guc *guc)
> > > +{
> > > +	u32 offset;
> > > +
> > > +	offset = guc_ads_golden_ctxt_offset(guc) +
> > > +		 guc_ads_golden_ctxt_size(guc);
> > > +
> > >   	return PAGE_ALIGN(offset);
> > >   }
> > > @@ -319,53 +340,163 @@ static void guc_mmio_reg_state_init(struct intel_guc *guc,
> > >   	GEM_BUG_ON(temp_set.size);
> > >   }
> > > -/*
> > > - * The first 80 dwords of the register state context, containing the
> > > - * execlists and ppgtt registers.
> > > - */
> > > -#define LR_HW_CONTEXT_SIZE	(80 * sizeof(u32))
> > > +static void fill_engine_enable_masks(struct intel_gt *gt,
> > > +				     struct guc_gt_system_info *info)
> > > +{
> > > +	info->engine_enabled_masks[GUC_RENDER_CLASS] = 1;
> > > +	info->engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
> > > +	info->engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
> > > +	info->engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
> > > +}
> > > -static void __guc_ads_init(struct intel_guc *guc)
> > > +/* Skip execlist and PPGTT registers */
> > > +#define LR_HW_CONTEXT_SIZE      (80 * sizeof(u32))
> > > +#define SKIP_SIZE               (LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE)
> > > +
> > > +static int guc_prep_golden_context(struct intel_guc *guc,
> > > +				   struct __guc_ads_blob *blob)
> > >   {
> > >   	struct intel_gt *gt = guc_to_gt(guc);
> > > -	struct drm_i915_private *i915 = gt->i915;
> > > +	u32 addr_ggtt, offset;
> > > +	u32 total_size = 0, alloc_size, real_size;
> > > +	u8 engine_class, guc_class;
> > > +	struct guc_gt_system_info *info, local_info;
> > > +
> > > +	/*
> > > +	 * Reserve the memory for the golden contexts and point GuC at it but
> > > +	 * leave it empty for now. The context data will be filled in later
> > > +	 * once there is something available to put there.
> > > +	 *
> > > +	 * Note that the HWSP and ring context are not included.
> > > +	 *
> > > +	 * Note also that the storage must be pinned in the GGTT, so that the
> > > +	 * address won't change after GuC has been told where to find it. The
> > > +	 * GuC will also validate that the LRC base + size fall within the
> > > +	 * allowed GGTT range.
> > > +	 */
> > > +	if (blob) {
> > > +		offset = guc_ads_golden_ctxt_offset(guc);
> > > +		addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> > > +		info = &blob->system_info;
> > > +	} else {
> > > +		memset(&local_info, 0, sizeof(local_info));
> > > +		info = &local_info;
> > > +		fill_engine_enable_masks(gt, info);
> > > +	}
> > > +
> > > +	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
> > > +		if (engine_class == OTHER_CLASS)
> > > +			continue;
> > > +
> > > +		guc_class = engine_class_to_guc_class(engine_class);
> > > +
> > > +		if (!info->engine_enabled_masks[guc_class])
> > > +			continue;
> > > +
> > > +		real_size = intel_engine_context_size(gt, engine_class);
> > > +		alloc_size = PAGE_ALIGN(real_size);
> > > +		total_size += alloc_size;
> > > +
> > > +		if (!blob)
> > > +			continue;
> > > +
> > > +		blob->ads.eng_state_size[guc_class] = real_size;
> > > +		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
> > > +		addr_ggtt += alloc_size;
> > > +	}
> > > +
> > > +	if (!blob)
> > > +		return total_size;
> > > +
> > > +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
> > > +	return total_size;
> > > +}
> > > +
> > > +static struct intel_engine_cs *find_engine_state(struct intel_gt *gt, u8 engine_class)
> > > +{
> > > +	struct intel_engine_cs *engine;
> > > +	enum intel_engine_id id;
> > > +
> > > +	for_each_engine(engine, gt, id) {
> > > +		if (engine->class != engine_class)
> > > +			continue;
> > > +
> > > +		if (!engine->default_state)
> > > +			continue;
> > > +
> > > +		return engine;
> > > +	}
> > > +
> > > +	return NULL;
> > > +}
> > > +
> > > +static void guc_init_golden_context(struct intel_guc *guc)
> > > +{
> > >   	struct __guc_ads_blob *blob = guc->ads_blob;
> > > -	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
> > > -	u32 base;
> > > +	struct intel_engine_cs *engine;
> > > +	struct intel_gt *gt = guc_to_gt(guc);
> > > +	u32 addr_ggtt, offset;
> > > +	u32 total_size = 0, alloc_size, real_size;
> > >   	u8 engine_class, guc_class;
> > > +	u8 *ptr;
> > > -	/* GuC scheduling policies */
> > > -	guc_policies_init(guc, &blob->policies);
> > > +	if (!intel_uc_uses_guc_submission(&gt->uc))
> > > +		return;
> > > +
> > > +	GEM_BUG_ON(!blob);
> > >   	/*
> > > -	 * GuC expects a per-engine-class context image and size
> > > -	 * (minus hwsp and ring context). The context image will be
> > > -	 * used to reinitialize engines after a reset. It must exist
> > > -	 * and be pinned in the GGTT, so that the address won't change after
> > > -	 * we have told GuC where to find it. The context size will be used
> > > -	 * to validate that the LRC base + size fall within allowed GGTT.
> > > +	 * Go back and fill in the golden context data now that it is
> > > +	 * available.
> > >   	 */
> > > +	offset = guc_ads_golden_ctxt_offset(guc);
> > > +	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> > > +	ptr = ((u8 *) blob) + offset;
> > > +
> > >   	for (engine_class = 0; engine_class <= MAX_ENGINE_CLASS; ++engine_class) {
> > >   		if (engine_class == OTHER_CLASS)
> > >   			continue;
> > >   		guc_class = engine_class_to_guc_class(engine_class);
> > > -		/*
> > > -		 * TODO: Set context pointer to default state to allow
> > > -		 * GuC to re-init guilty contexts after internal reset.
> > > -		 */
> > > -		blob->ads.golden_context_lrca[guc_class] = 0;
> > > -		blob->ads.eng_state_size[guc_class] =
> > > -			intel_engine_context_size(gt, engine_class) -
> > > -			skipped_size;
> > > +		if (!blob->system_info.engine_enabled_masks[guc_class])
> > > +			continue;
> > > +
> > > +		real_size = intel_engine_context_size(gt, engine_class);
> > > +		alloc_size = PAGE_ALIGN(real_size);
> > > +		total_size += alloc_size;
> > > +
> > > +		engine = find_engine_state(gt, engine_class);
> > > +		if (!engine) {
> > > +			drm_err(&gt->i915->drm, "No engine state recorded for class %d!\n", engine_class);
> > > +			blob->ads.eng_state_size[guc_class] = 0;
> > > +			blob->ads.golden_context_lrca[guc_class] = 0;
> > > +			continue;
> > > +		}
> > > +
> > > +		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
> > > +		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
> > > +		addr_ggtt += alloc_size;
> > > +
> > > +		shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size);
> > There is a confirmed memory corruption here, should be:
> > shmem_read(engine->default_state, SKIP_SIZE, ptr + SKIP_SIZE, real_size - SKIP_SIZE);
> > 
> > Also a bit of nit but I think s/SKIP_SIZE/GOLDEN_CONTEXT_SKIP_SIZE/g
> > 
> > As long you agree John, I'll fix both of these in the next rev.
> > 
> > Matt
> Yup. Good catch.
> 
> Rather than ballooning the name out, maybe make it and the LR_HW_ define
> local enums within this function? I don't think they are used anywhere else.
> 

I like this better too as you say they are specific to a single function.

With those changes:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> John.
> 
> 
> > 
> > > +		ptr += alloc_size;
> > >   	}
> > > +	GEM_BUG_ON(guc->ads_golden_ctxt_size != total_size);
> > > +}
> > > +
> > > +static void __guc_ads_init(struct intel_guc *guc)
> > > +{
> > > +	struct intel_gt *gt = guc_to_gt(guc);
> > > +	struct drm_i915_private *i915 = gt->i915;
> > > +	struct __guc_ads_blob *blob = guc->ads_blob;
> > > +	u32 base;
> > > +
> > > +	/* GuC scheduling policies */
> > > +	guc_policies_init(guc, &blob->policies);
> > > +
> > >   	/* System info */
> > > -	blob->system_info.engine_enabled_masks[GUC_RENDER_CLASS] = 1;
> > > -	blob->system_info.engine_enabled_masks[GUC_BLITTER_CLASS] = 1;
> > > -	blob->system_info.engine_enabled_masks[GUC_VIDEO_CLASS] = VDBOX_MASK(gt);
> > > -	blob->system_info.engine_enabled_masks[GUC_VIDEOENHANCE_CLASS] = VEBOX_MASK(gt);
> > > +	fill_engine_enable_masks(gt, &blob->system_info);
> > >   	blob->system_info.generic_gt_sysinfo[GUC_GENERIC_GT_SYSINFO_SLICE_ENABLED] =
> > >   		hweight8(gt->info.sseu.slice_mask);
> > > @@ -380,6 +511,9 @@ static void __guc_ads_init(struct intel_guc *guc)
> > >   			 GEN12_DOORBELLS_PER_SQIDI) + 1;
> > >   	}
> > > +	/* Golden contexts for re-initialising after a watchdog reset */
> > > +	guc_prep_golden_context(guc, blob);
> > > +
> > >   	guc_mapping_table_init(guc_to_gt(guc), &blob->system_info);
> > >   	base = intel_guc_ggtt_offset(guc, guc->ads_vma);
> > > @@ -417,6 +551,13 @@ int intel_guc_ads_create(struct intel_guc *guc)
> > >   		return ret;
> > >   	guc->ads_regset_size = ret;
> > > +	/* Likewise the golden contexts: */
> > > +	ret = guc_prep_golden_context(guc, NULL);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +	guc->ads_golden_ctxt_size = ret;
> > > +
> > > +	/* Now the total size can be determined: */
> > >   	size = guc_ads_blob_size(guc);
> > >   	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
> > > @@ -429,6 +570,18 @@ int intel_guc_ads_create(struct intel_guc *guc)
> > >   	return 0;
> > >   }
> > > +void intel_guc_ads_init_late(struct intel_guc *guc)
> > > +{
> > > +	/*
> > > +	 * The golden context setup requires the saved engine state from
> > > +	 * __engines_record_defaults(). However, that requires engines to be
> > > +	 * operational which means the ADS must already have been configured.
> > > +	 * Fortunately, the golden context state is not needed until a hang
> > > +	 * occurs, so it can be filled in during this late init phase.
> > > +	 */
> > > +	guc_init_golden_context(guc);
> > > +}
> > > +
> > >   void intel_guc_ads_destroy(struct intel_guc *guc)
> > >   {
> > >   	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> > > index bdcb339a5321..3d85051d57e4 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h
> > > @@ -11,6 +11,7 @@ struct drm_printer;
> > >   int intel_guc_ads_create(struct intel_guc *guc);
> > >   void intel_guc_ads_destroy(struct intel_guc *guc);
> > > +void intel_guc_ads_init_late(struct intel_guc *guc);
> > >   void intel_guc_ads_reset(struct intel_guc *guc);
> > >   void intel_guc_ads_print_policy_info(struct intel_guc *guc,
> > >   				     struct drm_printer *p);
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > index 77c1fe2ed883..7a69c3c027e9 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > > @@ -120,6 +120,11 @@ void intel_uc_init_early(struct intel_uc *uc)
> > >   		uc->ops = &uc_ops_off;
> > >   }
> > > +void intel_uc_init_late(struct intel_uc *uc)
> > > +{
> > > +	intel_guc_init_late(&uc->guc);
> > > +}
> > > +
> > >   void intel_uc_driver_late_release(struct intel_uc *uc)
> > >   {
> > >   }
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > index 91315e3f1c58..e2da2b6e76e1 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > > @@ -35,6 +35,7 @@ struct intel_uc {
> > >   };
> > >   void intel_uc_init_early(struct intel_uc *uc);
> > > +void intel_uc_init_late(struct intel_uc *uc);
> > >   void intel_uc_driver_late_release(struct intel_uc *uc);
> > >   void intel_uc_driver_remove(struct intel_uc *uc);
> > >   void intel_uc_init_mmio(struct intel_uc *uc);
> > > -- 
> > > 2.28.0
> > > 
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
  2021-07-19 23:01     ` [Intel-gfx] " John Harrison
@ 2021-07-19 22:55       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 22:55 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On Mon, Jul 19, 2021 at 04:01:56PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > Implement GuC submission tasklet for new interface. The new GuC
> > interface uses H2G to submit contexts to the GuC. Since H2G use a single
> > channel, a single tasklet submits is used for the submission path.
> This still needs fixing - 'a single tasklet submits is used' is not valid
> English.
> 

Will fix.

> It also seems that the idea of splitting all the deletes of old code into a
> separate patch didn't happen. It really does obfuscate things significantly
> having completely unrelated deletes and adds interspersed :(.
>

I don't recall promising to do that.

Matt

> John.
> 
> 
> > 
> > Also the per engine interrupt handler has been updated to disable the
> > rescheduling of the physical engine tasklet, when using GuC scheduling,
> > as the physical engine tasklet is no longer used.
> > 
> > In this patch the field, guc_id, has been added to intel_context and is
> > not assigned. Patches later in the series will assign this value.
> > 
> > v2:
> >   (John Harrison)
> >    - Clean up some comments
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++---------
> >   3 files changed, 127 insertions(+), 117 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 90026c177105..6d99631d19b9 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -137,6 +137,15 @@ struct intel_context {
> >   	struct intel_sseu sseu;
> >   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> > +
> > +	/* GuC scheduling state flags that do not require a lock. */
> > +	atomic_t guc_sched_state_no_lock;
> > +
> > +	/*
> > +	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> > +	 * in the series will.
> > +	 */
> > +	u16 guc_id;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 35783558d261..8c7b92f699f1 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -30,6 +30,10 @@ struct intel_guc {
> >   	struct intel_guc_log log;
> >   	struct intel_guc_ct ct;
> > +	/* Global engine used to submit requests to GuC */
> > +	struct i915_sched_engine *sched_engine;
> > +	struct i915_request *stalled_request;
> > +
> >   	/* intel_guc_recv interrupt related state */
> >   	spinlock_t irq_lock;
> >   	unsigned int msg_enabled_mask;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 23a94a896a0b..ca0717166a27 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -60,6 +60,31 @@
> >   #define GUC_REQUEST_SIZE 64 /* bytes */
> > +/*
> > + * Below is a set of functions which control the GuC scheduling state which do
> > + * not require a lock as all state transitions are mutually exclusive. i.e. It
> > + * is not possible for the context pinning code and submission, for the same
> > + * context, to be executing simultaneously. We still need an atomic as it is
> > + * possible for some of the bits to changing at the same time though.
> > + */
> > +#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> > +static inline bool context_enabled(struct intel_context *ce)
> > +{
> > +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > +		SCHED_STATE_NO_LOCK_ENABLED);
> > +}
> > +
> > +static inline void set_context_enabled(struct intel_context *ce)
> > +{
> > +	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
> > +}
> > +
> > +static inline void clr_context_enabled(struct intel_context *ce)
> > +{
> > +	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
> > +		   &ce->guc_sched_state_no_lock);
> > +}
> > +
> >   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >   {
> >   	return rb_entry(rb, struct i915_priolist, node);
> > @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> >   }
> > -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> > -	/* Leaving stub as this function will be used in future patches */
> > -}
> > +	int err;
> > +	struct intel_context *ce = rq->context;
> > +	u32 action[3];
> > +	int len = 0;
> > +	bool enabled = context_enabled(ce);
> > -/*
> > - * When we're doing submissions using regular execlists backend, writing to
> > - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
> > - * pinned in mappable aperture portion of GGTT are visible to command streamer.
> > - * Writes done by GuC on our behalf are not guaranteeing such ordering,
> > - * therefore, to ensure the flush, we're issuing a POSTING READ.
> > - */
> > -static void flush_ggtt_writes(struct i915_vma *vma)
> > -{
> > -	if (i915_vma_is_map_and_fenceable(vma))
> > -		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
> > -					     GUC_STATUS);
> > -}
> > +	if (!enabled) {
> > +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > +		action[len++] = ce->guc_id;
> > +		action[len++] = GUC_CONTEXT_ENABLE;
> > +	} else {
> > +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> > +		action[len++] = ce->guc_id;
> > +	}
> > -static void guc_submit(struct intel_engine_cs *engine,
> > -		       struct i915_request **out,
> > -		       struct i915_request **end)
> > -{
> > -	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	err = intel_guc_send_nb(guc, action, len);
> > -	do {
> > -		struct i915_request *rq = *out++;
> > +	if (!enabled && !err)
> > +		set_context_enabled(ce);
> > -		flush_ggtt_writes(rq->ring->vma);
> > -		guc_add_request(guc, rq);
> > -	} while (out != end);
> > +	return err;
> >   }
> >   static inline int rq_prio(const struct i915_request *rq)
> > @@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq)
> >   	return rq->sched.attr.priority;
> >   }
> > -static struct i915_request *schedule_in(struct i915_request *rq, int idx)
> > +static int guc_dequeue_one_context(struct intel_guc *guc)
> >   {
> > -	trace_i915_request_in(rq, idx);
> > -
> > -	/*
> > -	 * Currently we are not tracking the rq->context being inflight
> > -	 * (ce->inflight = rq->engine). It is only used by the execlists
> > -	 * backend at the moment, a similar counting strategy would be
> > -	 * required if we generalise the inflight tracking.
> > -	 */
> > -
> > -	__intel_gt_pm_get(rq->engine->gt);
> > -	return i915_request_get(rq);
> > -}
> > -
> > -static void schedule_out(struct i915_request *rq)
> > -{
> > -	trace_i915_request_out(rq);
> > -
> > -	intel_gt_pm_put_async(rq->engine->gt);
> > -	i915_request_put(rq);
> > -}
> > -
> > -static void __guc_dequeue(struct intel_engine_cs *engine)
> > -{
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > -	struct i915_request **first = execlists->inflight;
> > -	struct i915_request ** const last_port = first + execlists->port_mask;
> > -	struct i915_request *last = first[0];
> > -	struct i915_request **port;
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	struct i915_request *last = NULL;
> >   	bool submit = false;
> >   	struct rb_node *rb;
> > +	int ret;
> >   	lockdep_assert_held(&sched_engine->lock);
> > -	if (last) {
> > -		if (*++first)
> > -			return;
> > -
> > -		last = NULL;
> > +	if (guc->stalled_request) {
> > +		submit = true;
> > +		last = guc->stalled_request;
> > +		goto resubmit;
> >   	}
> > -	/*
> > -	 * We write directly into the execlists->inflight queue and don't use
> > -	 * the execlists->pending queue, as we don't have a distinct switch
> > -	 * event.
> > -	 */
> > -	port = first;
> >   	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >   		struct i915_priolist *p = to_priolist(rb);
> >   		struct i915_request *rq, *rn;
> >   		priolist_for_each_request_consume(rq, rn, p) {
> > -			if (last && rq->context != last->context) {
> > -				if (port == last_port)
> > -					goto done;
> > -
> > -				*port = schedule_in(last,
> > -						    port - execlists->inflight);
> > -				port++;
> > -			}
> > +			if (last && rq->context != last->context)
> > +				goto done;
> >   			list_del_init(&rq->sched.link);
> > +
> >   			__i915_request_submit(rq);
> > -			submit = true;
> > +
> > +			trace_i915_request_in(rq, 0);
> >   			last = rq;
> > +			submit = true;
> >   		}
> >   		rb_erase_cached(&p->node, &sched_engine->queue);
> >   		i915_priolist_free(p);
> >   	}
> >   done:
> > -	sched_engine->queue_priority_hint =
> > -		rb ? to_priolist(rb)->priority : INT_MIN;
> >   	if (submit) {
> > -		*port = schedule_in(last, port - execlists->inflight);
> > -		*++port = NULL;
> > -		guc_submit(engine, first, port);
> > +		last->context->lrc_reg_state[CTX_RING_TAIL] =
> > +			intel_ring_set_tail(last->ring, last->tail);
> > +resubmit:
> > +		/*
> > +		 * We only check for -EBUSY here even though it is possible for
> > +		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> > +		 * died and a full GT reset needs to be done. The hangcheck will
> > +		 * eventually detect that the GuC has died and trigger this
> > +		 * reset so no need to handle -EDEADLK here.
> > +		 */
> > +		ret = guc_add_request(guc, last);
> > +		if (ret == -EBUSY) {
> > +			tasklet_schedule(&sched_engine->tasklet);
> > +			guc->stalled_request = last;
> > +			return false;
> > +		}
> >   	}
> > -	execlists->active = execlists->inflight;
> > +
> > +	guc->stalled_request = NULL;
> > +	return submit;
> >   }
> >   static void guc_submission_tasklet(struct tasklet_struct *t)
> >   {
> >   	struct i915_sched_engine *sched_engine =
> >   		from_tasklet(sched_engine, t, tasklet);
> > -	struct intel_engine_cs * const engine = sched_engine->private_data;
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_request **port, *rq;
> >   	unsigned long flags;
> > +	bool loop;
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > -
> > -	for (port = execlists->inflight; (rq = *port); port++) {
> > -		if (!i915_request_completed(rq))
> > -			break;
> > -
> > -		schedule_out(rq);
> > -	}
> > -	if (port != execlists->inflight) {
> > -		int idx = port - execlists->inflight;
> > -		int rem = ARRAY_SIZE(execlists->inflight) - idx;
> > -		memmove(execlists->inflight, port, rem * sizeof(*port));
> > -	}
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	__guc_dequeue(engine);
> > +	do {
> > +		loop = guc_dequeue_one_context(sched_engine->private_data);
> > +	} while (loop);
> > -	i915_sched_engine_reset_on_empty(engine->sched_engine);
> > +	i915_sched_engine_reset_on_empty(sched_engine);
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> >   static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> >   {
> > -	if (iir & GT_RENDER_USER_INTERRUPT) {
> > +	if (iir & GT_RENDER_USER_INTERRUPT)
> >   		intel_engine_signal_breadcrumbs(engine);
> > -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> > -	}
> >   }
> >   static void guc_reset_prepare(struct intel_engine_cs *engine)
> > @@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	struct rb_node *rb;
> >   	unsigned long flags;
> > +	/* Can be called during boot if GuC fails to load */
> > +	if (!engine->gt)
> > +		return;
> > +
> >   	ENGINE_TRACE(engine, "\n");
> >   	/*
> > @@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >   void intel_guc_submission_fini(struct intel_guc *guc)
> >   {
> > -	if (guc->lrc_desc_pool)
> > -		guc_lrc_desc_pool_destroy(guc);
> > +	if (!guc->lrc_desc_pool)
> > +		return;
> > +
> > +	guc_lrc_desc_pool_destroy(guc);
> > +	i915_sched_engine_put(guc->sched_engine);
> >   }
> >   static int guc_context_alloc(struct intel_context *ce)
> > @@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request)
> >   	return 0;
> >   }
> > -static inline void queue_request(struct intel_engine_cs *engine,
> > +static inline void queue_request(struct i915_sched_engine *sched_engine,
> >   				 struct i915_request *rq,
> >   				 int prio)
> >   {
> >   	GEM_BUG_ON(!list_empty(&rq->sched.link));
> >   	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(engine->sched_engine, prio));
> > +		      i915_sched_lookup_priolist(sched_engine, prio));
> >   	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >   }
> >   static void guc_submit_request(struct i915_request *rq)
> >   {
> > -	struct intel_engine_cs *engine = rq->engine;
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> >   	unsigned long flags;
> >   	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	queue_request(engine, rq, rq_prio(rq));
> > +	queue_request(sched_engine, rq, rq_prio(rq));
> > -	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> > +	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> >   	GEM_BUG_ON(list_empty(&rq->sched.link));
> > -	tasklet_hi_schedule(&engine->sched_engine->tasklet);
> > +	tasklet_hi_schedule(&sched_engine->tasklet);
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine)
> >   {
> >   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> > -	tasklet_kill(&engine->sched_engine->tasklet);
> > -
> >   	intel_engine_cleanup_common(engine);
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > @@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
> >   int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> >   	/*
> >   	 * The setup relies on several assumptions (e.g. irqs always enabled)
> > @@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   	 */
> >   	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
> > -	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> > +	if (!guc->sched_engine) {
> > +		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
> > +		if (!guc->sched_engine)
> > +			return -ENOMEM;
> > +
> > +		guc->sched_engine->schedule = i915_schedule;
> > +		guc->sched_engine->private_data = guc;
> > +		tasklet_setup(&guc->sched_engine->tasklet,
> > +			      guc_submission_tasklet);
> > +	}
> > +	i915_sched_engine_put(engine->sched_engine);
> > +	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
> >   	guc_default_vfuncs(engine);
> >   	guc_default_irqs(engine);
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
@ 2021-07-19 22:55       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 22:55 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 04:01:56PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > Implement GuC submission tasklet for new interface. The new GuC
> > interface uses H2G to submit contexts to the GuC. Since H2G use a single
> > channel, a single tasklet submits is used for the submission path.
> This still needs fixing - 'a single tasklet submits is used' is not valid
> English.
> 

Will fix.

> It also seems that the idea of splitting all the deletes of old code into a
> separate patch didn't happen. It really does obfuscate things significantly
> having completely unrelated deletes and adds interspersed :(.
>

I don't recall promising to do that.

Matt

> John.
> 
> 
> > 
> > Also the per engine interrupt handler has been updated to disable the
> > rescheduling of the physical engine tasklet, when using GuC scheduling,
> > as the physical engine tasklet is no longer used.
> > 
> > In this patch the field, guc_id, has been added to intel_context and is
> > not assigned. Patches later in the series will assign this value.
> > 
> > v2:
> >   (John Harrison)
> >    - Clean up some comments
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++---------
> >   3 files changed, 127 insertions(+), 117 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 90026c177105..6d99631d19b9 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -137,6 +137,15 @@ struct intel_context {
> >   	struct intel_sseu sseu;
> >   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> > +
> > +	/* GuC scheduling state flags that do not require a lock. */
> > +	atomic_t guc_sched_state_no_lock;
> > +
> > +	/*
> > +	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> > +	 * in the series will.
> > +	 */
> > +	u16 guc_id;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 35783558d261..8c7b92f699f1 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -30,6 +30,10 @@ struct intel_guc {
> >   	struct intel_guc_log log;
> >   	struct intel_guc_ct ct;
> > +	/* Global engine used to submit requests to GuC */
> > +	struct i915_sched_engine *sched_engine;
> > +	struct i915_request *stalled_request;
> > +
> >   	/* intel_guc_recv interrupt related state */
> >   	spinlock_t irq_lock;
> >   	unsigned int msg_enabled_mask;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 23a94a896a0b..ca0717166a27 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -60,6 +60,31 @@
> >   #define GUC_REQUEST_SIZE 64 /* bytes */
> > +/*
> > + * Below is a set of functions which control the GuC scheduling state which do
> > + * not require a lock as all state transitions are mutually exclusive. i.e. It
> > + * is not possible for the context pinning code and submission, for the same
> > + * context, to be executing simultaneously. We still need an atomic as it is
> > + * possible for some of the bits to changing at the same time though.
> > + */
> > +#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> > +static inline bool context_enabled(struct intel_context *ce)
> > +{
> > +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > +		SCHED_STATE_NO_LOCK_ENABLED);
> > +}
> > +
> > +static inline void set_context_enabled(struct intel_context *ce)
> > +{
> > +	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
> > +}
> > +
> > +static inline void clr_context_enabled(struct intel_context *ce)
> > +{
> > +	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
> > +		   &ce->guc_sched_state_no_lock);
> > +}
> > +
> >   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >   {
> >   	return rb_entry(rb, struct i915_priolist, node);
> > @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> >   }
> > -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> > +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> > -	/* Leaving stub as this function will be used in future patches */
> > -}
> > +	int err;
> > +	struct intel_context *ce = rq->context;
> > +	u32 action[3];
> > +	int len = 0;
> > +	bool enabled = context_enabled(ce);
> > -/*
> > - * When we're doing submissions using regular execlists backend, writing to
> > - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
> > - * pinned in mappable aperture portion of GGTT are visible to command streamer.
> > - * Writes done by GuC on our behalf are not guaranteeing such ordering,
> > - * therefore, to ensure the flush, we're issuing a POSTING READ.
> > - */
> > -static void flush_ggtt_writes(struct i915_vma *vma)
> > -{
> > -	if (i915_vma_is_map_and_fenceable(vma))
> > -		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
> > -					     GUC_STATUS);
> > -}
> > +	if (!enabled) {
> > +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> > +		action[len++] = ce->guc_id;
> > +		action[len++] = GUC_CONTEXT_ENABLE;
> > +	} else {
> > +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> > +		action[len++] = ce->guc_id;
> > +	}
> > -static void guc_submit(struct intel_engine_cs *engine,
> > -		       struct i915_request **out,
> > -		       struct i915_request **end)
> > -{
> > -	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	err = intel_guc_send_nb(guc, action, len);
> > -	do {
> > -		struct i915_request *rq = *out++;
> > +	if (!enabled && !err)
> > +		set_context_enabled(ce);
> > -		flush_ggtt_writes(rq->ring->vma);
> > -		guc_add_request(guc, rq);
> > -	} while (out != end);
> > +	return err;
> >   }
> >   static inline int rq_prio(const struct i915_request *rq)
> > @@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq)
> >   	return rq->sched.attr.priority;
> >   }
> > -static struct i915_request *schedule_in(struct i915_request *rq, int idx)
> > +static int guc_dequeue_one_context(struct intel_guc *guc)
> >   {
> > -	trace_i915_request_in(rq, idx);
> > -
> > -	/*
> > -	 * Currently we are not tracking the rq->context being inflight
> > -	 * (ce->inflight = rq->engine). It is only used by the execlists
> > -	 * backend at the moment, a similar counting strategy would be
> > -	 * required if we generalise the inflight tracking.
> > -	 */
> > -
> > -	__intel_gt_pm_get(rq->engine->gt);
> > -	return i915_request_get(rq);
> > -}
> > -
> > -static void schedule_out(struct i915_request *rq)
> > -{
> > -	trace_i915_request_out(rq);
> > -
> > -	intel_gt_pm_put_async(rq->engine->gt);
> > -	i915_request_put(rq);
> > -}
> > -
> > -static void __guc_dequeue(struct intel_engine_cs *engine)
> > -{
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> > -	struct i915_request **first = execlists->inflight;
> > -	struct i915_request ** const last_port = first + execlists->port_mask;
> > -	struct i915_request *last = first[0];
> > -	struct i915_request **port;
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	struct i915_request *last = NULL;
> >   	bool submit = false;
> >   	struct rb_node *rb;
> > +	int ret;
> >   	lockdep_assert_held(&sched_engine->lock);
> > -	if (last) {
> > -		if (*++first)
> > -			return;
> > -
> > -		last = NULL;
> > +	if (guc->stalled_request) {
> > +		submit = true;
> > +		last = guc->stalled_request;
> > +		goto resubmit;
> >   	}
> > -	/*
> > -	 * We write directly into the execlists->inflight queue and don't use
> > -	 * the execlists->pending queue, as we don't have a distinct switch
> > -	 * event.
> > -	 */
> > -	port = first;
> >   	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >   		struct i915_priolist *p = to_priolist(rb);
> >   		struct i915_request *rq, *rn;
> >   		priolist_for_each_request_consume(rq, rn, p) {
> > -			if (last && rq->context != last->context) {
> > -				if (port == last_port)
> > -					goto done;
> > -
> > -				*port = schedule_in(last,
> > -						    port - execlists->inflight);
> > -				port++;
> > -			}
> > +			if (last && rq->context != last->context)
> > +				goto done;
> >   			list_del_init(&rq->sched.link);
> > +
> >   			__i915_request_submit(rq);
> > -			submit = true;
> > +
> > +			trace_i915_request_in(rq, 0);
> >   			last = rq;
> > +			submit = true;
> >   		}
> >   		rb_erase_cached(&p->node, &sched_engine->queue);
> >   		i915_priolist_free(p);
> >   	}
> >   done:
> > -	sched_engine->queue_priority_hint =
> > -		rb ? to_priolist(rb)->priority : INT_MIN;
> >   	if (submit) {
> > -		*port = schedule_in(last, port - execlists->inflight);
> > -		*++port = NULL;
> > -		guc_submit(engine, first, port);
> > +		last->context->lrc_reg_state[CTX_RING_TAIL] =
> > +			intel_ring_set_tail(last->ring, last->tail);
> > +resubmit:
> > +		/*
> > +		 * We only check for -EBUSY here even though it is possible for
> > +		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> > +		 * died and a full GT reset needs to be done. The hangcheck will
> > +		 * eventually detect that the GuC has died and trigger this
> > +		 * reset so no need to handle -EDEADLK here.
> > +		 */
> > +		ret = guc_add_request(guc, last);
> > +		if (ret == -EBUSY) {
> > +			tasklet_schedule(&sched_engine->tasklet);
> > +			guc->stalled_request = last;
> > +			return false;
> > +		}
> >   	}
> > -	execlists->active = execlists->inflight;
> > +
> > +	guc->stalled_request = NULL;
> > +	return submit;
> >   }
> >   static void guc_submission_tasklet(struct tasklet_struct *t)
> >   {
> >   	struct i915_sched_engine *sched_engine =
> >   		from_tasklet(sched_engine, t, tasklet);
> > -	struct intel_engine_cs * const engine = sched_engine->private_data;
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_request **port, *rq;
> >   	unsigned long flags;
> > +	bool loop;
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > -
> > -	for (port = execlists->inflight; (rq = *port); port++) {
> > -		if (!i915_request_completed(rq))
> > -			break;
> > -
> > -		schedule_out(rq);
> > -	}
> > -	if (port != execlists->inflight) {
> > -		int idx = port - execlists->inflight;
> > -		int rem = ARRAY_SIZE(execlists->inflight) - idx;
> > -		memmove(execlists->inflight, port, rem * sizeof(*port));
> > -	}
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	__guc_dequeue(engine);
> > +	do {
> > +		loop = guc_dequeue_one_context(sched_engine->private_data);
> > +	} while (loop);
> > -	i915_sched_engine_reset_on_empty(engine->sched_engine);
> > +	i915_sched_engine_reset_on_empty(sched_engine);
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> >   static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> >   {
> > -	if (iir & GT_RENDER_USER_INTERRUPT) {
> > +	if (iir & GT_RENDER_USER_INTERRUPT)
> >   		intel_engine_signal_breadcrumbs(engine);
> > -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> > -	}
> >   }
> >   static void guc_reset_prepare(struct intel_engine_cs *engine)
> > @@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	struct rb_node *rb;
> >   	unsigned long flags;
> > +	/* Can be called during boot if GuC fails to load */
> > +	if (!engine->gt)
> > +		return;
> > +
> >   	ENGINE_TRACE(engine, "\n");
> >   	/*
> > @@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >   void intel_guc_submission_fini(struct intel_guc *guc)
> >   {
> > -	if (guc->lrc_desc_pool)
> > -		guc_lrc_desc_pool_destroy(guc);
> > +	if (!guc->lrc_desc_pool)
> > +		return;
> > +
> > +	guc_lrc_desc_pool_destroy(guc);
> > +	i915_sched_engine_put(guc->sched_engine);
> >   }
> >   static int guc_context_alloc(struct intel_context *ce)
> > @@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request)
> >   	return 0;
> >   }
> > -static inline void queue_request(struct intel_engine_cs *engine,
> > +static inline void queue_request(struct i915_sched_engine *sched_engine,
> >   				 struct i915_request *rq,
> >   				 int prio)
> >   {
> >   	GEM_BUG_ON(!list_empty(&rq->sched.link));
> >   	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(engine->sched_engine, prio));
> > +		      i915_sched_lookup_priolist(sched_engine, prio));
> >   	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >   }
> >   static void guc_submit_request(struct i915_request *rq)
> >   {
> > -	struct intel_engine_cs *engine = rq->engine;
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> >   	unsigned long flags;
> >   	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	queue_request(engine, rq, rq_prio(rq));
> > +	queue_request(sched_engine, rq, rq_prio(rq));
> > -	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> > +	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> >   	GEM_BUG_ON(list_empty(&rq->sched.link));
> > -	tasklet_hi_schedule(&engine->sched_engine->tasklet);
> > +	tasklet_hi_schedule(&sched_engine->tasklet);
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine)
> >   {
> >   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> > -	tasklet_kill(&engine->sched_engine->tasklet);
> > -
> >   	intel_engine_cleanup_common(engine);
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > @@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
> >   int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> >   	/*
> >   	 * The setup relies on several assumptions (e.g. irqs always enabled)
> > @@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   	 */
> >   	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
> > -	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> > +	if (!guc->sched_engine) {
> > +		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
> > +		if (!guc->sched_engine)
> > +			return -ENOMEM;
> > +
> > +		guc->sched_engine->schedule = i915_schedule;
> > +		guc->sched_engine->private_data = guc;
> > +		tasklet_setup(&guc->sched_engine->tasklet,
> > +			      guc_submission_tasklet);
> > +	}
> > +	i915_sched_engine_put(engine->sched_engine);
> > +	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
> >   	guc_default_vfuncs(engine);
> >   	guc_default_irqs(engine);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-19 23:01     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-19 23:01 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> Implement GuC submission tasklet for new interface. The new GuC
> interface uses H2G to submit contexts to the GuC. Since H2G use a single
> channel, a single tasklet submits is used for the submission path.
This still needs fixing - 'a single tasklet submits is used' is not 
valid English.

It also seems that the idea of splitting all the deletes of old code 
into a separate patch didn't happen. It really does obfuscate things 
significantly having completely unrelated deletes and adds interspersed :(.

John.


>
> Also the per engine interrupt handler has been updated to disable the
> rescheduling of the physical engine tasklet, when using GuC scheduling,
> as the physical engine tasklet is no longer used.
>
> In this patch the field, guc_id, has been added to intel_context and is
> not assigned. Patches later in the series will assign this value.
>
> v2:
>   (John Harrison)
>    - Clean up some comments
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++---------
>   3 files changed, 127 insertions(+), 117 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 90026c177105..6d99631d19b9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -137,6 +137,15 @@ struct intel_context {
>   	struct intel_sseu sseu;
>   
>   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> +
> +	/* GuC scheduling state flags that do not require a lock. */
> +	atomic_t guc_sched_state_no_lock;
> +
> +	/*
> +	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> +	 * in the series will.
> +	 */
> +	u16 guc_id;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 35783558d261..8c7b92f699f1 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -30,6 +30,10 @@ struct intel_guc {
>   	struct intel_guc_log log;
>   	struct intel_guc_ct ct;
>   
> +	/* Global engine used to submit requests to GuC */
> +	struct i915_sched_engine *sched_engine;
> +	struct i915_request *stalled_request;
> +
>   	/* intel_guc_recv interrupt related state */
>   	spinlock_t irq_lock;
>   	unsigned int msg_enabled_mask;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 23a94a896a0b..ca0717166a27 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -60,6 +60,31 @@
>   
>   #define GUC_REQUEST_SIZE 64 /* bytes */
>   
> +/*
> + * Below is a set of functions which control the GuC scheduling state which do
> + * not require a lock as all state transitions are mutually exclusive. i.e. It
> + * is not possible for the context pinning code and submission, for the same
> + * context, to be executing simultaneously. We still need an atomic as it is
> + * possible for some of the bits to changing at the same time though.
> + */
> +#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> +static inline bool context_enabled(struct intel_context *ce)
> +{
> +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> +		SCHED_STATE_NO_LOCK_ENABLED);
> +}
> +
> +static inline void set_context_enabled(struct intel_context *ce)
> +{
> +	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
> +}
> +
> +static inline void clr_context_enabled(struct intel_context *ce)
> +{
> +	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
> +		   &ce->guc_sched_state_no_lock);
> +}
> +
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
>   	return rb_entry(rb, struct i915_priolist, node);
> @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>   }
>   
> -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
> -	/* Leaving stub as this function will be used in future patches */
> -}
> +	int err;
> +	struct intel_context *ce = rq->context;
> +	u32 action[3];
> +	int len = 0;
> +	bool enabled = context_enabled(ce);
>   
> -/*
> - * When we're doing submissions using regular execlists backend, writing to
> - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
> - * pinned in mappable aperture portion of GGTT are visible to command streamer.
> - * Writes done by GuC on our behalf are not guaranteeing such ordering,
> - * therefore, to ensure the flush, we're issuing a POSTING READ.
> - */
> -static void flush_ggtt_writes(struct i915_vma *vma)
> -{
> -	if (i915_vma_is_map_and_fenceable(vma))
> -		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
> -					     GUC_STATUS);
> -}
> +	if (!enabled) {
> +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> +		action[len++] = ce->guc_id;
> +		action[len++] = GUC_CONTEXT_ENABLE;
> +	} else {
> +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> +		action[len++] = ce->guc_id;
> +	}
>   
> -static void guc_submit(struct intel_engine_cs *engine,
> -		       struct i915_request **out,
> -		       struct i915_request **end)
> -{
> -	struct intel_guc *guc = &engine->gt->uc.guc;
> +	err = intel_guc_send_nb(guc, action, len);
>   
> -	do {
> -		struct i915_request *rq = *out++;
> +	if (!enabled && !err)
> +		set_context_enabled(ce);
>   
> -		flush_ggtt_writes(rq->ring->vma);
> -		guc_add_request(guc, rq);
> -	} while (out != end);
> +	return err;
>   }
>   
>   static inline int rq_prio(const struct i915_request *rq)
> @@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq)
>   	return rq->sched.attr.priority;
>   }
>   
> -static struct i915_request *schedule_in(struct i915_request *rq, int idx)
> +static int guc_dequeue_one_context(struct intel_guc *guc)
>   {
> -	trace_i915_request_in(rq, idx);
> -
> -	/*
> -	 * Currently we are not tracking the rq->context being inflight
> -	 * (ce->inflight = rq->engine). It is only used by the execlists
> -	 * backend at the moment, a similar counting strategy would be
> -	 * required if we generalise the inflight tracking.
> -	 */
> -
> -	__intel_gt_pm_get(rq->engine->gt);
> -	return i915_request_get(rq);
> -}
> -
> -static void schedule_out(struct i915_request *rq)
> -{
> -	trace_i915_request_out(rq);
> -
> -	intel_gt_pm_put_async(rq->engine->gt);
> -	i915_request_put(rq);
> -}
> -
> -static void __guc_dequeue(struct intel_engine_cs *engine)
> -{
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> -	struct i915_request **first = execlists->inflight;
> -	struct i915_request ** const last_port = first + execlists->port_mask;
> -	struct i915_request *last = first[0];
> -	struct i915_request **port;
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	struct i915_request *last = NULL;
>   	bool submit = false;
>   	struct rb_node *rb;
> +	int ret;
>   
>   	lockdep_assert_held(&sched_engine->lock);
>   
> -	if (last) {
> -		if (*++first)
> -			return;
> -
> -		last = NULL;
> +	if (guc->stalled_request) {
> +		submit = true;
> +		last = guc->stalled_request;
> +		goto resubmit;
>   	}
>   
> -	/*
> -	 * We write directly into the execlists->inflight queue and don't use
> -	 * the execlists->pending queue, as we don't have a distinct switch
> -	 * event.
> -	 */
> -	port = first;
>   	while ((rb = rb_first_cached(&sched_engine->queue))) {
>   		struct i915_priolist *p = to_priolist(rb);
>   		struct i915_request *rq, *rn;
>   
>   		priolist_for_each_request_consume(rq, rn, p) {
> -			if (last && rq->context != last->context) {
> -				if (port == last_port)
> -					goto done;
> -
> -				*port = schedule_in(last,
> -						    port - execlists->inflight);
> -				port++;
> -			}
> +			if (last && rq->context != last->context)
> +				goto done;
>   
>   			list_del_init(&rq->sched.link);
> +
>   			__i915_request_submit(rq);
> -			submit = true;
> +
> +			trace_i915_request_in(rq, 0);
>   			last = rq;
> +			submit = true;
>   		}
>   
>   		rb_erase_cached(&p->node, &sched_engine->queue);
>   		i915_priolist_free(p);
>   	}
>   done:
> -	sched_engine->queue_priority_hint =
> -		rb ? to_priolist(rb)->priority : INT_MIN;
>   	if (submit) {
> -		*port = schedule_in(last, port - execlists->inflight);
> -		*++port = NULL;
> -		guc_submit(engine, first, port);
> +		last->context->lrc_reg_state[CTX_RING_TAIL] =
> +			intel_ring_set_tail(last->ring, last->tail);
> +resubmit:
> +		/*
> +		 * We only check for -EBUSY here even though it is possible for
> +		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> +		 * died and a full GT reset needs to be done. The hangcheck will
> +		 * eventually detect that the GuC has died and trigger this
> +		 * reset so no need to handle -EDEADLK here.
> +		 */
> +		ret = guc_add_request(guc, last);
> +		if (ret == -EBUSY) {
> +			tasklet_schedule(&sched_engine->tasklet);
> +			guc->stalled_request = last;
> +			return false;
> +		}
>   	}
> -	execlists->active = execlists->inflight;
> +
> +	guc->stalled_request = NULL;
> +	return submit;
>   }
>   
>   static void guc_submission_tasklet(struct tasklet_struct *t)
>   {
>   	struct i915_sched_engine *sched_engine =
>   		from_tasklet(sched_engine, t, tasklet);
> -	struct intel_engine_cs * const engine = sched_engine->private_data;
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_request **port, *rq;
>   	unsigned long flags;
> +	bool loop;
>   
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> -
> -	for (port = execlists->inflight; (rq = *port); port++) {
> -		if (!i915_request_completed(rq))
> -			break;
> -
> -		schedule_out(rq);
> -	}
> -	if (port != execlists->inflight) {
> -		int idx = port - execlists->inflight;
> -		int rem = ARRAY_SIZE(execlists->inflight) - idx;
> -		memmove(execlists->inflight, port, rem * sizeof(*port));
> -	}
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	__guc_dequeue(engine);
> +	do {
> +		loop = guc_dequeue_one_context(sched_engine->private_data);
> +	} while (loop);
>   
> -	i915_sched_engine_reset_on_empty(engine->sched_engine);
> +	i915_sched_engine_reset_on_empty(sched_engine);
>   
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
>   static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>   {
> -	if (iir & GT_RENDER_USER_INTERRUPT) {
> +	if (iir & GT_RENDER_USER_INTERRUPT)
>   		intel_engine_signal_breadcrumbs(engine);
> -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> -	}
>   }
>   
>   static void guc_reset_prepare(struct intel_engine_cs *engine)
> @@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	struct rb_node *rb;
>   	unsigned long flags;
>   
> +	/* Can be called during boot if GuC fails to load */
> +	if (!engine->gt)
> +		return;
> +
>   	ENGINE_TRACE(engine, "\n");
>   
>   	/*
> @@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   
>   void intel_guc_submission_fini(struct intel_guc *guc)
>   {
> -	if (guc->lrc_desc_pool)
> -		guc_lrc_desc_pool_destroy(guc);
> +	if (!guc->lrc_desc_pool)
> +		return;
> +
> +	guc_lrc_desc_pool_destroy(guc);
> +	i915_sched_engine_put(guc->sched_engine);
>   }
>   
>   static int guc_context_alloc(struct intel_context *ce)
> @@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request)
>   	return 0;
>   }
>   
> -static inline void queue_request(struct intel_engine_cs *engine,
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
>   				 struct i915_request *rq,
>   				 int prio)
>   {
>   	GEM_BUG_ON(!list_empty(&rq->sched.link));
>   	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(engine->sched_engine, prio));
> +		      i915_sched_lookup_priolist(sched_engine, prio));
>   	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>   }
>   
>   static void guc_submit_request(struct i915_request *rq)
>   {
> -	struct intel_engine_cs *engine = rq->engine;
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>   	unsigned long flags;
>   
>   	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	queue_request(engine, rq, rq_prio(rq));
> +	queue_request(sched_engine, rq, rq_prio(rq));
>   
> -	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> +	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
>   	GEM_BUG_ON(list_empty(&rq->sched.link));
>   
> -	tasklet_hi_schedule(&engine->sched_engine->tasklet);
> +	tasklet_hi_schedule(&sched_engine->tasklet);
>   
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine)
>   {
>   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
>   
> -	tasklet_kill(&engine->sched_engine->tasklet);
> -
>   	intel_engine_cleanup_common(engine);
>   	lrc_fini_wa_ctx(engine);
>   }
> @@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
>   int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
>   
>   	/*
>   	 * The setup relies on several assumptions (e.g. irqs always enabled)
> @@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   	 */
>   	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
>   
> -	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> +	if (!guc->sched_engine) {
> +		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
> +		if (!guc->sched_engine)
> +			return -ENOMEM;
> +
> +		guc->sched_engine->schedule = i915_schedule;
> +		guc->sched_engine->private_data = guc;
> +		tasklet_setup(&guc->sched_engine->tasklet,
> +			      guc_submission_tasklet);
> +	}
> +	i915_sched_engine_put(engine->sched_engine);
> +	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
>   
>   	guc_default_vfuncs(engine);
>   	guc_default_irqs(engine);


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
@ 2021-07-19 23:01     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-19 23:01 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> Implement GuC submission tasklet for new interface. The new GuC
> interface uses H2G to submit contexts to the GuC. Since H2G use a single
> channel, a single tasklet submits is used for the submission path.
This still needs fixing - 'a single tasklet submits is used' is not 
valid English.

It also seems that the idea of splitting all the deletes of old code 
into a separate patch didn't happen. It really does obfuscate things 
significantly having completely unrelated deletes and adds interspersed :(.

John.


>
> Also the per engine interrupt handler has been updated to disable the
> rescheduling of the physical engine tasklet, when using GuC scheduling,
> as the physical engine tasklet is no longer used.
>
> In this patch the field, guc_id, has been added to intel_context and is
> not assigned. Patches later in the series will assign this value.
>
> v2:
>   (John Harrison)
>    - Clean up some comments
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 231 +++++++++---------
>   3 files changed, 127 insertions(+), 117 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 90026c177105..6d99631d19b9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -137,6 +137,15 @@ struct intel_context {
>   	struct intel_sseu sseu;
>   
>   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> +
> +	/* GuC scheduling state flags that do not require a lock. */
> +	atomic_t guc_sched_state_no_lock;
> +
> +	/*
> +	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> +	 * in the series will.
> +	 */
> +	u16 guc_id;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 35783558d261..8c7b92f699f1 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -30,6 +30,10 @@ struct intel_guc {
>   	struct intel_guc_log log;
>   	struct intel_guc_ct ct;
>   
> +	/* Global engine used to submit requests to GuC */
> +	struct i915_sched_engine *sched_engine;
> +	struct i915_request *stalled_request;
> +
>   	/* intel_guc_recv interrupt related state */
>   	spinlock_t irq_lock;
>   	unsigned int msg_enabled_mask;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 23a94a896a0b..ca0717166a27 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -60,6 +60,31 @@
>   
>   #define GUC_REQUEST_SIZE 64 /* bytes */
>   
> +/*
> + * Below is a set of functions which control the GuC scheduling state which do
> + * not require a lock as all state transitions are mutually exclusive. i.e. It
> + * is not possible for the context pinning code and submission, for the same
> + * context, to be executing simultaneously. We still need an atomic as it is
> + * possible for some of the bits to changing at the same time though.
> + */
> +#define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> +static inline bool context_enabled(struct intel_context *ce)
> +{
> +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> +		SCHED_STATE_NO_LOCK_ENABLED);
> +}
> +
> +static inline void set_context_enabled(struct intel_context *ce)
> +{
> +	atomic_or(SCHED_STATE_NO_LOCK_ENABLED, &ce->guc_sched_state_no_lock);
> +}
> +
> +static inline void clr_context_enabled(struct intel_context *ce)
> +{
> +	atomic_and((u32)~SCHED_STATE_NO_LOCK_ENABLED,
> +		   &ce->guc_sched_state_no_lock);
> +}
> +
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
>   	return rb_entry(rb, struct i915_priolist, node);
> @@ -122,37 +147,29 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>   }
>   
> -static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> +static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
> -	/* Leaving stub as this function will be used in future patches */
> -}
> +	int err;
> +	struct intel_context *ce = rq->context;
> +	u32 action[3];
> +	int len = 0;
> +	bool enabled = context_enabled(ce);
>   
> -/*
> - * When we're doing submissions using regular execlists backend, writing to
> - * ELSP from CPU side is enough to make sure that writes to ringbuffer pages
> - * pinned in mappable aperture portion of GGTT are visible to command streamer.
> - * Writes done by GuC on our behalf are not guaranteeing such ordering,
> - * therefore, to ensure the flush, we're issuing a POSTING READ.
> - */
> -static void flush_ggtt_writes(struct i915_vma *vma)
> -{
> -	if (i915_vma_is_map_and_fenceable(vma))
> -		intel_uncore_posting_read_fw(vma->vm->gt->uncore,
> -					     GUC_STATUS);
> -}
> +	if (!enabled) {
> +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> +		action[len++] = ce->guc_id;
> +		action[len++] = GUC_CONTEXT_ENABLE;
> +	} else {
> +		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT;
> +		action[len++] = ce->guc_id;
> +	}
>   
> -static void guc_submit(struct intel_engine_cs *engine,
> -		       struct i915_request **out,
> -		       struct i915_request **end)
> -{
> -	struct intel_guc *guc = &engine->gt->uc.guc;
> +	err = intel_guc_send_nb(guc, action, len);
>   
> -	do {
> -		struct i915_request *rq = *out++;
> +	if (!enabled && !err)
> +		set_context_enabled(ce);
>   
> -		flush_ggtt_writes(rq->ring->vma);
> -		guc_add_request(guc, rq);
> -	} while (out != end);
> +	return err;
>   }
>   
>   static inline int rq_prio(const struct i915_request *rq)
> @@ -160,125 +177,88 @@ static inline int rq_prio(const struct i915_request *rq)
>   	return rq->sched.attr.priority;
>   }
>   
> -static struct i915_request *schedule_in(struct i915_request *rq, int idx)
> +static int guc_dequeue_one_context(struct intel_guc *guc)
>   {
> -	trace_i915_request_in(rq, idx);
> -
> -	/*
> -	 * Currently we are not tracking the rq->context being inflight
> -	 * (ce->inflight = rq->engine). It is only used by the execlists
> -	 * backend at the moment, a similar counting strategy would be
> -	 * required if we generalise the inflight tracking.
> -	 */
> -
> -	__intel_gt_pm_get(rq->engine->gt);
> -	return i915_request_get(rq);
> -}
> -
> -static void schedule_out(struct i915_request *rq)
> -{
> -	trace_i915_request_out(rq);
> -
> -	intel_gt_pm_put_async(rq->engine->gt);
> -	i915_request_put(rq);
> -}
> -
> -static void __guc_dequeue(struct intel_engine_cs *engine)
> -{
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> -	struct i915_request **first = execlists->inflight;
> -	struct i915_request ** const last_port = first + execlists->port_mask;
> -	struct i915_request *last = first[0];
> -	struct i915_request **port;
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	struct i915_request *last = NULL;
>   	bool submit = false;
>   	struct rb_node *rb;
> +	int ret;
>   
>   	lockdep_assert_held(&sched_engine->lock);
>   
> -	if (last) {
> -		if (*++first)
> -			return;
> -
> -		last = NULL;
> +	if (guc->stalled_request) {
> +		submit = true;
> +		last = guc->stalled_request;
> +		goto resubmit;
>   	}
>   
> -	/*
> -	 * We write directly into the execlists->inflight queue and don't use
> -	 * the execlists->pending queue, as we don't have a distinct switch
> -	 * event.
> -	 */
> -	port = first;
>   	while ((rb = rb_first_cached(&sched_engine->queue))) {
>   		struct i915_priolist *p = to_priolist(rb);
>   		struct i915_request *rq, *rn;
>   
>   		priolist_for_each_request_consume(rq, rn, p) {
> -			if (last && rq->context != last->context) {
> -				if (port == last_port)
> -					goto done;
> -
> -				*port = schedule_in(last,
> -						    port - execlists->inflight);
> -				port++;
> -			}
> +			if (last && rq->context != last->context)
> +				goto done;
>   
>   			list_del_init(&rq->sched.link);
> +
>   			__i915_request_submit(rq);
> -			submit = true;
> +
> +			trace_i915_request_in(rq, 0);
>   			last = rq;
> +			submit = true;
>   		}
>   
>   		rb_erase_cached(&p->node, &sched_engine->queue);
>   		i915_priolist_free(p);
>   	}
>   done:
> -	sched_engine->queue_priority_hint =
> -		rb ? to_priolist(rb)->priority : INT_MIN;
>   	if (submit) {
> -		*port = schedule_in(last, port - execlists->inflight);
> -		*++port = NULL;
> -		guc_submit(engine, first, port);
> +		last->context->lrc_reg_state[CTX_RING_TAIL] =
> +			intel_ring_set_tail(last->ring, last->tail);
> +resubmit:
> +		/*
> +		 * We only check for -EBUSY here even though it is possible for
> +		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> +		 * died and a full GT reset needs to be done. The hangcheck will
> +		 * eventually detect that the GuC has died and trigger this
> +		 * reset so no need to handle -EDEADLK here.
> +		 */
> +		ret = guc_add_request(guc, last);
> +		if (ret == -EBUSY) {
> +			tasklet_schedule(&sched_engine->tasklet);
> +			guc->stalled_request = last;
> +			return false;
> +		}
>   	}
> -	execlists->active = execlists->inflight;
> +
> +	guc->stalled_request = NULL;
> +	return submit;
>   }
>   
>   static void guc_submission_tasklet(struct tasklet_struct *t)
>   {
>   	struct i915_sched_engine *sched_engine =
>   		from_tasklet(sched_engine, t, tasklet);
> -	struct intel_engine_cs * const engine = sched_engine->private_data;
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_request **port, *rq;
>   	unsigned long flags;
> +	bool loop;
>   
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> -
> -	for (port = execlists->inflight; (rq = *port); port++) {
> -		if (!i915_request_completed(rq))
> -			break;
> -
> -		schedule_out(rq);
> -	}
> -	if (port != execlists->inflight) {
> -		int idx = port - execlists->inflight;
> -		int rem = ARRAY_SIZE(execlists->inflight) - idx;
> -		memmove(execlists->inflight, port, rem * sizeof(*port));
> -	}
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	__guc_dequeue(engine);
> +	do {
> +		loop = guc_dequeue_one_context(sched_engine->private_data);
> +	} while (loop);
>   
> -	i915_sched_engine_reset_on_empty(engine->sched_engine);
> +	i915_sched_engine_reset_on_empty(sched_engine);
>   
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
>   static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>   {
> -	if (iir & GT_RENDER_USER_INTERRUPT) {
> +	if (iir & GT_RENDER_USER_INTERRUPT)
>   		intel_engine_signal_breadcrumbs(engine);
> -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> -	}
>   }
>   
>   static void guc_reset_prepare(struct intel_engine_cs *engine)
> @@ -349,6 +329,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	struct rb_node *rb;
>   	unsigned long flags;
>   
> +	/* Can be called during boot if GuC fails to load */
> +	if (!engine->gt)
> +		return;
> +
>   	ENGINE_TRACE(engine, "\n");
>   
>   	/*
> @@ -433,8 +417,11 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   
>   void intel_guc_submission_fini(struct intel_guc *guc)
>   {
> -	if (guc->lrc_desc_pool)
> -		guc_lrc_desc_pool_destroy(guc);
> +	if (!guc->lrc_desc_pool)
> +		return;
> +
> +	guc_lrc_desc_pool_destroy(guc);
> +	i915_sched_engine_put(guc->sched_engine);
>   }
>   
>   static int guc_context_alloc(struct intel_context *ce)
> @@ -499,32 +486,32 @@ static int guc_request_alloc(struct i915_request *request)
>   	return 0;
>   }
>   
> -static inline void queue_request(struct intel_engine_cs *engine,
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
>   				 struct i915_request *rq,
>   				 int prio)
>   {
>   	GEM_BUG_ON(!list_empty(&rq->sched.link));
>   	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(engine->sched_engine, prio));
> +		      i915_sched_lookup_priolist(sched_engine, prio));
>   	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>   }
>   
>   static void guc_submit_request(struct i915_request *rq)
>   {
> -	struct intel_engine_cs *engine = rq->engine;
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>   	unsigned long flags;
>   
>   	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> +	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	queue_request(engine, rq, rq_prio(rq));
> +	queue_request(sched_engine, rq, rq_prio(rq));
>   
> -	GEM_BUG_ON(i915_sched_engine_is_empty(engine->sched_engine));
> +	GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
>   	GEM_BUG_ON(list_empty(&rq->sched.link));
>   
> -	tasklet_hi_schedule(&engine->sched_engine->tasklet);
> +	tasklet_hi_schedule(&sched_engine->tasklet);
>   
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -602,8 +589,6 @@ static void guc_release(struct intel_engine_cs *engine)
>   {
>   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
>   
> -	tasklet_kill(&engine->sched_engine->tasklet);
> -
>   	intel_engine_cleanup_common(engine);
>   	lrc_fini_wa_ctx(engine);
>   }
> @@ -674,6 +659,7 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
>   int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
>   
>   	/*
>   	 * The setup relies on several assumptions (e.g. irqs always enabled)
> @@ -681,7 +667,18 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   	 */
>   	GEM_BUG_ON(GRAPHICS_VER(i915) < 11);
>   
> -	tasklet_setup(&engine->sched_engine->tasklet, guc_submission_tasklet);
> +	if (!guc->sched_engine) {
> +		guc->sched_engine = i915_sched_engine_create(ENGINE_VIRTUAL);
> +		if (!guc->sched_engine)
> +			return -ENOMEM;
> +
> +		guc->sched_engine->schedule = i915_schedule;
> +		guc->sched_engine->private_data = guc;
> +		tasklet_setup(&guc->sched_engine->tasklet,
> +			      guc_submission_tasklet);
> +	}
> +	i915_sched_engine_put(engine->sched_engine);
> +	engine->sched_engine = i915_sched_engine_get(guc->sched_engine);
>   
>   	guc_default_vfuncs(engine);
>   	guc_default_irqs(engine);

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 19/51] drm/i915/guc: GuC virtual engines
  2021-07-19 23:33     ` [Intel-gfx] " Daniele Ceraolo Spurio
@ 2021-07-19 23:27       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 23:27 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, john.c.harrison, dri-devel

On Mon, Jul 19, 2021 at 04:33:56PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > Implement GuC virtual engines. Rather simple implementation, basically
> > just allocate an engine, setup context enter / exit function to virtual
> > engine specific functions, set all other variables / functions to guc
> > versions, and set the engine mask to that of all the siblings.
> > 
> > v2: Update to work with proto-ctx
> > 
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
> >   drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
> >   drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
> >   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
> >   .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
> >   .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> >   drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
> >   10 files changed, 308 insertions(+), 35 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 64659802d4df..edefe299bd76 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -74,7 +74,6 @@
> >   #include "gt/intel_context_param.h"
> >   #include "gt/intel_engine_heartbeat.h"
> >   #include "gt/intel_engine_user.h"
> > -#include "gt/intel_execlists_submission.h" /* virtual_engine */
> >   #include "gt/intel_gpu_commands.h"
> >   #include "gt/intel_ring.h"
> > @@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
> >   	if (!HAS_EXECLISTS(i915))
> >   		return -ENODEV;
> > -	if (intel_uc_uses_guc_submission(&i915->gt.uc))
> > -		return -ENODEV; /* not implement yet */
> > -
> >   	if (get_user(idx, &ext->engine_index))
> >   		return -EFAULT;
> > @@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
> >   			break;
> >   		case I915_GEM_ENGINE_TYPE_BALANCED:
> > -			ce = intel_execlists_create_virtual(pe[n].siblings,
> > -							    pe[n].num_siblings);
> > +			ce = intel_engine_create_virtual(pe[n].siblings,
> > +							 pe[n].num_siblings);
> >   			break;
> >   		case I915_GEM_ENGINE_TYPE_INVALID:
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > index 20411db84914..2639c719a7a6 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > @@ -10,6 +10,7 @@
> >   #include "i915_gem_context_types.h"
> >   #include "gt/intel_context.h"
> > +#include "gt/intel_engine.h"
> 
> Apologies for the late question, but why do you need this include in this
> header? nothing else is changing within the file.
> 

It likely doesn't need to be included. Let me see what the build / CI
says if I remove it. 

> >   #include "i915_drv.h"
> >   #include "i915_gem.h"
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 4a5518d295c2..542c98418771 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -47,6 +47,12 @@ struct intel_context_ops {
> >   	void (*reset)(struct intel_context *ce);
> >   	void (*destroy)(struct kref *kref);
> > +
> > +	/* virtual engine/context interface */
> > +	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
> > +						unsigned int count);
> > +	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
> > +					       unsigned int sibling);
> >   };
> >   struct intel_context {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > index f911c1224ab2..9fec0aca5f4b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > @@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
> >   	return intel_engine_has_preemption(engine);
> >   }
> > +struct intel_context *
> > +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> > +			    unsigned int count);
> > +
> > +static inline bool
> > +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
> > +{
> > +	if (intel_engine_uses_guc(engine))
> > +		return intel_guc_virtual_engine_has_heartbeat(engine);
> > +	else
> > +		GEM_BUG_ON("Only should be called in GuC submission");
> 
> I insist that this needs a better explanation, as I've commented on the
> previous rev. Given that this is a shared file, it is not immediately
> evident why this call shouldn't be called with the execlists backend.
> 

Sure, can add a comment. Basically it comes down to it is always called
an active request which in execlists mode is always a physical engine.

Matt 

> Daniele
> 
> > +
> > +	return false;
> > +}
> > +
> >   static inline bool
> >   intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
> >   {
> >   	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
> >   		return false;
> > -	return READ_ONCE(engine->props.heartbeat_interval_ms);
> > +	if (intel_engine_is_virtual(engine))
> > +		return intel_virtual_engine_has_heartbeat(engine);
> > +	else
> > +		return READ_ONCE(engine->props.heartbeat_interval_ms);
> > +}
> > +
> > +static inline struct intel_engine_cs *
> > +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> > +{
> > +	GEM_BUG_ON(!intel_engine_is_virtual(engine));
> > +	return engine->cops->get_sibling(engine, sibling);
> >   }
> >   #endif /* _INTEL_RINGBUFFER_H_ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index d561573ed98c..b7292d5cb7da 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
> >   	return total;
> >   }
> > +struct intel_context *
> > +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> > +			    unsigned int count)
> > +{
> > +	if (count == 0)
> > +		return ERR_PTR(-EINVAL);
> > +
> > +	if (count == 1)
> > +		return intel_context_create(siblings[0]);
> > +
> > +	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
> > +	return siblings[0]->cops->create_virtual(siblings, count);
> > +}
> > +
> >   static bool match_ring(struct i915_request *rq)
> >   {
> >   	u32 ring = ENGINE_READ(rq->engine, RING_START);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 56e25090da67..28492cdce706 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
> >   	return container_of(engine, struct virtual_engine, base);
> >   }
> > +static struct intel_context *
> > +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> > +
> >   static struct i915_request *
> >   __active_request(const struct intel_timeline * const tl,
> >   		 struct i915_request *rq,
> > @@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
> >   	.reset = lrc_reset,
> >   	.destroy = lrc_destroy,
> > +
> > +	.create_virtual = execlists_create_virtual,
> >   };
> >   static int emit_pdps(struct i915_request *rq)
> > @@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
> >   		intel_engine_pm_put(ve->siblings[n]);
> >   }
> > +static struct intel_engine_cs *
> > +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> > +{
> > +	struct virtual_engine *ve = to_virtual_engine(engine);
> > +
> > +	if (sibling >= ve->num_siblings)
> > +		return NULL;
> > +
> > +	return ve->siblings[sibling];
> > +}
> > +
> >   static const struct intel_context_ops virtual_context_ops = {
> >   	.flags = COPS_HAS_INFLIGHT,
> > @@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
> >   	.exit = virtual_context_exit,
> >   	.destroy = virtual_context_destroy,
> > +
> > +	.get_sibling = virtual_get_sibling,
> >   };
> >   static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
> > @@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
> >   	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
> >   }
> > -struct intel_context *
> > -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > -			       unsigned int count)
> > +static struct intel_context *
> > +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   {
> >   	struct virtual_engine *ve;
> >   	unsigned int n;
> >   	int err;
> > -	if (count == 0)
> > -		return ERR_PTR(-EINVAL);
> > -
> > -	if (count == 1)
> > -		return intel_context_create(siblings[0]);
> > -
> >   	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
> >   	if (!ve)
> >   		return ERR_PTR(-ENOMEM);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > index ad4f3e1a0fde..a1aa92c983a5 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > @@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
> >   							int indent),
> >   				   unsigned int max);
> > -struct intel_context *
> > -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > -			       unsigned int count);
> > -
> >   bool
> >   intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > index 73ddc6e14730..59cf8afc6d6f 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
> >   	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
> >   	for (n = 0; n < nctx; n++) {
> > -		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
> > +		ve[n] = intel_engine_create_virtual(siblings, nsibling);
> >   		if (IS_ERR(ve[n])) {
> >   			err = PTR_ERR(ve[n]);
> >   			nctx = n;
> > @@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
> >   	 * restrict it to our desired engine within the virtual engine.
> >   	 */
> > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > +	ve = intel_engine_create_virtual(siblings, nsibling);
> >   	if (IS_ERR(ve)) {
> >   		err = PTR_ERR(ve);
> >   		goto out_close;
> > @@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
> >   		i915_request_add(rq);
> >   	}
> > -	ce = intel_execlists_create_virtual(siblings, nsibling);
> > +	ce = intel_engine_create_virtual(siblings, nsibling);
> >   	if (IS_ERR(ce)) {
> >   		err = PTR_ERR(ce);
> >   		goto out;
> > @@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
> >   	/* XXX We do not handle oversubscription and fairness with normal rq */
> >   	for (n = 0; n < nsibling; n++) {
> > -		ce = intel_execlists_create_virtual(siblings, nsibling);
> > +		ce = intel_engine_create_virtual(siblings, nsibling);
> >   		if (IS_ERR(ce)) {
> >   			err = PTR_ERR(ce);
> >   			goto out;
> > @@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
> >   	if (err)
> >   		goto out_scratch;
> > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > +	ve = intel_engine_create_virtual(siblings, nsibling);
> >   	if (IS_ERR(ve)) {
> >   		err = PTR_ERR(ve);
> >   		goto out_scratch;
> > @@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
> >   	if (igt_spinner_init(&spin, gt))
> >   		return -ENOMEM;
> > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > +	ve = intel_engine_create_virtual(siblings, nsibling);
> >   	if (IS_ERR(ve)) {
> >   		err = PTR_ERR(ve);
> >   		goto out_spin;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 05958260e849..7b3e1c91e689 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -60,6 +60,15 @@
> >    *
> >    */
> > +/* GuC Virtual Engine */
> > +struct guc_virtual_engine {
> > +	struct intel_engine_cs base;
> > +	struct intel_context context;
> > +};
> > +
> > +static struct intel_context *
> > +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> > +
> >   #define GUC_REQUEST_SIZE 64 /* bytes */
> >   /*
> > @@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
> >   	return ret;
> >   }
> > -static int guc_context_pre_pin(struct intel_context *ce,
> > -			       struct i915_gem_ww_ctx *ww,
> > -			       void **vaddr)
> > +static int __guc_context_pre_pin(struct intel_context *ce,
> > +				 struct intel_engine_cs *engine,
> > +				 struct i915_gem_ww_ctx *ww,
> > +				 void **vaddr)
> >   {
> > -	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
> > +	return lrc_pre_pin(ce, engine, ww, vaddr);
> >   }
> > -static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > +static int __guc_context_pin(struct intel_context *ce,
> > +			     struct intel_engine_cs *engine,
> > +			     void *vaddr)
> >   {
> >   	if (i915_ggtt_offset(ce->state) !=
> >   	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> >   		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> > -	return lrc_pin(ce, ce->engine, vaddr);
> > +	return lrc_pin(ce, engine, vaddr);
> > +}
> > +
> > +static int guc_context_pre_pin(struct intel_context *ce,
> > +			       struct i915_gem_ww_ctx *ww,
> > +			       void **vaddr)
> > +{
> > +	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
> > +}
> > +
> > +static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > +{
> > +	return __guc_context_pin(ce, ce->engine, vaddr);
> >   }
> >   static void guc_context_unpin(struct intel_context *ce)
> > @@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >   	deregister_context(ce, ce->guc_id);
> >   }
> > +static void __guc_context_destroy(struct intel_context *ce)
> > +{
> > +	lrc_fini(ce);
> > +	intel_context_fini(ce);
> > +
> > +	if (intel_engine_is_virtual(ce->engine)) {
> > +		struct guc_virtual_engine *ve =
> > +			container_of(ce, typeof(*ve), context);
> > +
> > +		kfree(ve);
> > +	} else {
> > +		intel_context_free(ce);
> > +	}
> > +}
> > +
> >   static void guc_context_destroy(struct kref *kref)
> >   {
> >   	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > @@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
> >   	if (context_guc_id_invalid(ce) ||
> >   	    !lrc_desc_registered(guc, ce->guc_id)) {
> >   		release_guc_id(guc, ce);
> > -		lrc_destroy(kref);
> > +		__guc_context_destroy(ce);
> >   		return;
> >   	}
> > @@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
> >   	if (context_guc_id_invalid(ce)) {
> >   		__release_guc_id(guc, ce);
> >   		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > -		lrc_destroy(kref);
> > +		__guc_context_destroy(ce);
> >   		return;
> >   	}
> > @@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
> >   	.reset = lrc_reset,
> >   	.destroy = guc_context_destroy,
> > +
> > +	.create_virtual = guc_create_virtual,
> >   };
> >   static void __guc_signal_context_fence(struct intel_context *ce)
> > @@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
> >   	return 0;
> >   }
> > +static struct intel_engine_cs *
> > +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > +{
> > +	struct intel_engine_cs *engine;
> > +	intel_engine_mask_t tmp, mask = ve->mask;
> > +	unsigned int num_siblings = 0;
> > +
> > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > +		if (num_siblings++ == sibling)
> > +			return engine;
> > +
> > +	return NULL;
> > +}
> > +
> > +static int guc_virtual_context_pre_pin(struct intel_context *ce,
> > +				       struct i915_gem_ww_ctx *ww,
> > +				       void **vaddr)
> > +{
> > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > +
> > +	return __guc_context_pre_pin(ce, engine, ww, vaddr);
> > +}
> > +
> > +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
> > +{
> > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > +
> > +	return __guc_context_pin(ce, engine, vaddr);
> > +}
> > +
> > +static void guc_virtual_context_enter(struct intel_context *ce)
> > +{
> > +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> > +	struct intel_engine_cs *engine;
> > +
> > +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > +		intel_engine_pm_get(engine);
> > +
> > +	intel_timeline_enter(ce->timeline);
> > +}
> > +
> > +static void guc_virtual_context_exit(struct intel_context *ce)
> > +{
> > +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> > +	struct intel_engine_cs *engine;
> > +
> > +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > +		intel_engine_pm_put(engine);
> > +
> > +	intel_timeline_exit(ce->timeline);
> > +}
> > +
> > +static int guc_virtual_context_alloc(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > +
> > +	return lrc_alloc(ce, engine);
> > +}
> > +
> > +static const struct intel_context_ops virtual_guc_context_ops = {
> > +	.alloc = guc_virtual_context_alloc,
> > +
> > +	.pre_pin = guc_virtual_context_pre_pin,
> > +	.pin = guc_virtual_context_pin,
> > +	.unpin = guc_context_unpin,
> > +	.post_unpin = guc_context_post_unpin,
> > +
> > +	.enter = guc_virtual_context_enter,
> > +	.exit = guc_virtual_context_exit,
> > +
> > +	.sched_disable = guc_context_sched_disable,
> > +
> > +	.destroy = guc_context_destroy,
> > +
> > +	.get_sibling = guc_virtual_get_sibling,
> > +};
> > +
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> >   {
> >   	struct intel_timeline *tl;
> > @@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   	} else if (context_destroyed(ce)) {
> >   		/* Context has been destroyed */
> >   		release_guc_id(guc, ce);
> > -		lrc_destroy(&ce->ref);
> > +		__guc_context_destroy(ce);
> >   	}
> >   	decr_outstanding_submission_g2h(guc);
> > @@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >   			   atomic_read(&ce->guc_sched_state_no_lock));
> >   	}
> >   }
> > +
> > +static struct intel_context *
> > +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > +{
> > +	struct guc_virtual_engine *ve;
> > +	struct intel_guc *guc;
> > +	unsigned int n;
> > +	int err;
> > +
> > +	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
> > +	if (!ve)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	guc = &siblings[0]->gt->uc.guc;
> > +
> > +	ve->base.i915 = siblings[0]->i915;
> > +	ve->base.gt = siblings[0]->gt;
> > +	ve->base.uncore = siblings[0]->uncore;
> > +	ve->base.id = -1;
> > +
> > +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> > +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> > +	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> > +	ve->base.saturated = ALL_ENGINES;
> > +	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> > +	if (!ve->base.breadcrumbs) {
> > +		kfree(ve);
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> > +
> > +	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
> > +
> > +	ve->base.cops = &virtual_guc_context_ops;
> > +	ve->base.request_alloc = guc_request_alloc;
> > +
> > +	ve->base.submit_request = guc_submit_request;
> > +
> > +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> > +
> > +	intel_context_init(&ve->context, &ve->base);
> > +
> > +	for (n = 0; n < count; n++) {
> > +		struct intel_engine_cs *sibling = siblings[n];
> > +
> > +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
> > +		if (sibling->mask & ve->base.mask) {
> > +			DRM_DEBUG("duplicate %s entry in load balancer\n",
> > +				  sibling->name);
> > +			err = -EINVAL;
> > +			goto err_put;
> > +		}
> > +
> > +		ve->base.mask |= sibling->mask;
> > +
> > +		if (n != 0 && ve->base.class != sibling->class) {
> > +			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
> > +				  sibling->class, ve->base.class);
> > +			err = -EINVAL;
> > +			goto err_put;
> > +		} else if (n == 0) {
> > +			ve->base.class = sibling->class;
> > +			ve->base.uabi_class = sibling->uabi_class;
> > +			snprintf(ve->base.name, sizeof(ve->base.name),
> > +				 "v%dx%d", ve->base.class, count);
> > +			ve->base.context_size = sibling->context_size;
> > +
> > +			ve->base.emit_bb_start = sibling->emit_bb_start;
> > +			ve->base.emit_flush = sibling->emit_flush;
> > +			ve->base.emit_init_breadcrumb =
> > +				sibling->emit_init_breadcrumb;
> > +			ve->base.emit_fini_breadcrumb =
> > +				sibling->emit_fini_breadcrumb;
> > +			ve->base.emit_fini_breadcrumb_dw =
> > +				sibling->emit_fini_breadcrumb_dw;
> > +
> > +			ve->base.flags |= sibling->flags;
> > +
> > +			ve->base.props.timeslice_duration_ms =
> > +				sibling->props.timeslice_duration_ms;
> > +		}
> > +	}
> > +
> > +	return &ve->context;
> > +
> > +err_put:
> > +	intel_context_put(&ve->context);
> > +	return ERR_PTR(err);
> > +}
> > +
> > +
> > +
> > +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
> > +{
> > +	struct intel_engine_cs *engine;
> > +	intel_engine_mask_t tmp, mask = ve->mask;
> > +
> > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > +		if (READ_ONCE(engine->props.heartbeat_interval_ms))
> > +			return true;
> > +
> > +	return false;
> > +}
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > index 2b9470c90558..5f263ac4f46a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > @@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> >   void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >   					     struct drm_printer *p);
> > +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
> > +
> >   static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
> >   {
> >   	/* XXX: GuC submission is unavailable for now */
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 19/51] drm/i915/guc: GuC virtual engines
@ 2021-07-19 23:27       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 23:27 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 04:33:56PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > Implement GuC virtual engines. Rather simple implementation, basically
> > just allocate an engine, setup context enter / exit function to virtual
> > engine specific functions, set all other variables / functions to guc
> > versions, and set the engine mask to that of all the siblings.
> > 
> > v2: Update to work with proto-ctx
> > 
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
> >   drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
> >   drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
> >   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
> >   .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
> >   .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> >   drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
> >   10 files changed, 308 insertions(+), 35 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 64659802d4df..edefe299bd76 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -74,7 +74,6 @@
> >   #include "gt/intel_context_param.h"
> >   #include "gt/intel_engine_heartbeat.h"
> >   #include "gt/intel_engine_user.h"
> > -#include "gt/intel_execlists_submission.h" /* virtual_engine */
> >   #include "gt/intel_gpu_commands.h"
> >   #include "gt/intel_ring.h"
> > @@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
> >   	if (!HAS_EXECLISTS(i915))
> >   		return -ENODEV;
> > -	if (intel_uc_uses_guc_submission(&i915->gt.uc))
> > -		return -ENODEV; /* not implement yet */
> > -
> >   	if (get_user(idx, &ext->engine_index))
> >   		return -EFAULT;
> > @@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
> >   			break;
> >   		case I915_GEM_ENGINE_TYPE_BALANCED:
> > -			ce = intel_execlists_create_virtual(pe[n].siblings,
> > -							    pe[n].num_siblings);
> > +			ce = intel_engine_create_virtual(pe[n].siblings,
> > +							 pe[n].num_siblings);
> >   			break;
> >   		case I915_GEM_ENGINE_TYPE_INVALID:
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > index 20411db84914..2639c719a7a6 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > @@ -10,6 +10,7 @@
> >   #include "i915_gem_context_types.h"
> >   #include "gt/intel_context.h"
> > +#include "gt/intel_engine.h"
> 
> Apologies for the late question, but why do you need this include in this
> header? nothing else is changing within the file.
> 

It likely doesn't need to be included. Let me see what the build / CI
says if I remove it. 

> >   #include "i915_drv.h"
> >   #include "i915_gem.h"
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 4a5518d295c2..542c98418771 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -47,6 +47,12 @@ struct intel_context_ops {
> >   	void (*reset)(struct intel_context *ce);
> >   	void (*destroy)(struct kref *kref);
> > +
> > +	/* virtual engine/context interface */
> > +	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
> > +						unsigned int count);
> > +	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
> > +					       unsigned int sibling);
> >   };
> >   struct intel_context {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > index f911c1224ab2..9fec0aca5f4b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > @@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
> >   	return intel_engine_has_preemption(engine);
> >   }
> > +struct intel_context *
> > +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> > +			    unsigned int count);
> > +
> > +static inline bool
> > +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
> > +{
> > +	if (intel_engine_uses_guc(engine))
> > +		return intel_guc_virtual_engine_has_heartbeat(engine);
> > +	else
> > +		GEM_BUG_ON("Only should be called in GuC submission");
> 
> I insist that this needs a better explanation, as I've commented on the
> previous rev. Given that this is a shared file, it is not immediately
> evident why this call shouldn't be called with the execlists backend.
> 

Sure, can add a comment. Basically it comes down to it is always called
an active request which in execlists mode is always a physical engine.

Matt 

> Daniele
> 
> > +
> > +	return false;
> > +}
> > +
> >   static inline bool
> >   intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
> >   {
> >   	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
> >   		return false;
> > -	return READ_ONCE(engine->props.heartbeat_interval_ms);
> > +	if (intel_engine_is_virtual(engine))
> > +		return intel_virtual_engine_has_heartbeat(engine);
> > +	else
> > +		return READ_ONCE(engine->props.heartbeat_interval_ms);
> > +}
> > +
> > +static inline struct intel_engine_cs *
> > +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> > +{
> > +	GEM_BUG_ON(!intel_engine_is_virtual(engine));
> > +	return engine->cops->get_sibling(engine, sibling);
> >   }
> >   #endif /* _INTEL_RINGBUFFER_H_ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index d561573ed98c..b7292d5cb7da 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
> >   	return total;
> >   }
> > +struct intel_context *
> > +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> > +			    unsigned int count)
> > +{
> > +	if (count == 0)
> > +		return ERR_PTR(-EINVAL);
> > +
> > +	if (count == 1)
> > +		return intel_context_create(siblings[0]);
> > +
> > +	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
> > +	return siblings[0]->cops->create_virtual(siblings, count);
> > +}
> > +
> >   static bool match_ring(struct i915_request *rq)
> >   {
> >   	u32 ring = ENGINE_READ(rq->engine, RING_START);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 56e25090da67..28492cdce706 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
> >   	return container_of(engine, struct virtual_engine, base);
> >   }
> > +static struct intel_context *
> > +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> > +
> >   static struct i915_request *
> >   __active_request(const struct intel_timeline * const tl,
> >   		 struct i915_request *rq,
> > @@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
> >   	.reset = lrc_reset,
> >   	.destroy = lrc_destroy,
> > +
> > +	.create_virtual = execlists_create_virtual,
> >   };
> >   static int emit_pdps(struct i915_request *rq)
> > @@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
> >   		intel_engine_pm_put(ve->siblings[n]);
> >   }
> > +static struct intel_engine_cs *
> > +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> > +{
> > +	struct virtual_engine *ve = to_virtual_engine(engine);
> > +
> > +	if (sibling >= ve->num_siblings)
> > +		return NULL;
> > +
> > +	return ve->siblings[sibling];
> > +}
> > +
> >   static const struct intel_context_ops virtual_context_ops = {
> >   	.flags = COPS_HAS_INFLIGHT,
> > @@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
> >   	.exit = virtual_context_exit,
> >   	.destroy = virtual_context_destroy,
> > +
> > +	.get_sibling = virtual_get_sibling,
> >   };
> >   static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
> > @@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
> >   	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
> >   }
> > -struct intel_context *
> > -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > -			       unsigned int count)
> > +static struct intel_context *
> > +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   {
> >   	struct virtual_engine *ve;
> >   	unsigned int n;
> >   	int err;
> > -	if (count == 0)
> > -		return ERR_PTR(-EINVAL);
> > -
> > -	if (count == 1)
> > -		return intel_context_create(siblings[0]);
> > -
> >   	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
> >   	if (!ve)
> >   		return ERR_PTR(-ENOMEM);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > index ad4f3e1a0fde..a1aa92c983a5 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > @@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
> >   							int indent),
> >   				   unsigned int max);
> > -struct intel_context *
> > -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > -			       unsigned int count);
> > -
> >   bool
> >   intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > index 73ddc6e14730..59cf8afc6d6f 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
> >   	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
> >   	for (n = 0; n < nctx; n++) {
> > -		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
> > +		ve[n] = intel_engine_create_virtual(siblings, nsibling);
> >   		if (IS_ERR(ve[n])) {
> >   			err = PTR_ERR(ve[n]);
> >   			nctx = n;
> > @@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
> >   	 * restrict it to our desired engine within the virtual engine.
> >   	 */
> > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > +	ve = intel_engine_create_virtual(siblings, nsibling);
> >   	if (IS_ERR(ve)) {
> >   		err = PTR_ERR(ve);
> >   		goto out_close;
> > @@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
> >   		i915_request_add(rq);
> >   	}
> > -	ce = intel_execlists_create_virtual(siblings, nsibling);
> > +	ce = intel_engine_create_virtual(siblings, nsibling);
> >   	if (IS_ERR(ce)) {
> >   		err = PTR_ERR(ce);
> >   		goto out;
> > @@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
> >   	/* XXX We do not handle oversubscription and fairness with normal rq */
> >   	for (n = 0; n < nsibling; n++) {
> > -		ce = intel_execlists_create_virtual(siblings, nsibling);
> > +		ce = intel_engine_create_virtual(siblings, nsibling);
> >   		if (IS_ERR(ce)) {
> >   			err = PTR_ERR(ce);
> >   			goto out;
> > @@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
> >   	if (err)
> >   		goto out_scratch;
> > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > +	ve = intel_engine_create_virtual(siblings, nsibling);
> >   	if (IS_ERR(ve)) {
> >   		err = PTR_ERR(ve);
> >   		goto out_scratch;
> > @@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
> >   	if (igt_spinner_init(&spin, gt))
> >   		return -ENOMEM;
> > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > +	ve = intel_engine_create_virtual(siblings, nsibling);
> >   	if (IS_ERR(ve)) {
> >   		err = PTR_ERR(ve);
> >   		goto out_spin;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 05958260e849..7b3e1c91e689 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -60,6 +60,15 @@
> >    *
> >    */
> > +/* GuC Virtual Engine */
> > +struct guc_virtual_engine {
> > +	struct intel_engine_cs base;
> > +	struct intel_context context;
> > +};
> > +
> > +static struct intel_context *
> > +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> > +
> >   #define GUC_REQUEST_SIZE 64 /* bytes */
> >   /*
> > @@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
> >   	return ret;
> >   }
> > -static int guc_context_pre_pin(struct intel_context *ce,
> > -			       struct i915_gem_ww_ctx *ww,
> > -			       void **vaddr)
> > +static int __guc_context_pre_pin(struct intel_context *ce,
> > +				 struct intel_engine_cs *engine,
> > +				 struct i915_gem_ww_ctx *ww,
> > +				 void **vaddr)
> >   {
> > -	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
> > +	return lrc_pre_pin(ce, engine, ww, vaddr);
> >   }
> > -static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > +static int __guc_context_pin(struct intel_context *ce,
> > +			     struct intel_engine_cs *engine,
> > +			     void *vaddr)
> >   {
> >   	if (i915_ggtt_offset(ce->state) !=
> >   	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> >   		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> > -	return lrc_pin(ce, ce->engine, vaddr);
> > +	return lrc_pin(ce, engine, vaddr);
> > +}
> > +
> > +static int guc_context_pre_pin(struct intel_context *ce,
> > +			       struct i915_gem_ww_ctx *ww,
> > +			       void **vaddr)
> > +{
> > +	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
> > +}
> > +
> > +static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > +{
> > +	return __guc_context_pin(ce, ce->engine, vaddr);
> >   }
> >   static void guc_context_unpin(struct intel_context *ce)
> > @@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >   	deregister_context(ce, ce->guc_id);
> >   }
> > +static void __guc_context_destroy(struct intel_context *ce)
> > +{
> > +	lrc_fini(ce);
> > +	intel_context_fini(ce);
> > +
> > +	if (intel_engine_is_virtual(ce->engine)) {
> > +		struct guc_virtual_engine *ve =
> > +			container_of(ce, typeof(*ve), context);
> > +
> > +		kfree(ve);
> > +	} else {
> > +		intel_context_free(ce);
> > +	}
> > +}
> > +
> >   static void guc_context_destroy(struct kref *kref)
> >   {
> >   	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > @@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
> >   	if (context_guc_id_invalid(ce) ||
> >   	    !lrc_desc_registered(guc, ce->guc_id)) {
> >   		release_guc_id(guc, ce);
> > -		lrc_destroy(kref);
> > +		__guc_context_destroy(ce);
> >   		return;
> >   	}
> > @@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
> >   	if (context_guc_id_invalid(ce)) {
> >   		__release_guc_id(guc, ce);
> >   		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > -		lrc_destroy(kref);
> > +		__guc_context_destroy(ce);
> >   		return;
> >   	}
> > @@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
> >   	.reset = lrc_reset,
> >   	.destroy = guc_context_destroy,
> > +
> > +	.create_virtual = guc_create_virtual,
> >   };
> >   static void __guc_signal_context_fence(struct intel_context *ce)
> > @@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
> >   	return 0;
> >   }
> > +static struct intel_engine_cs *
> > +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > +{
> > +	struct intel_engine_cs *engine;
> > +	intel_engine_mask_t tmp, mask = ve->mask;
> > +	unsigned int num_siblings = 0;
> > +
> > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > +		if (num_siblings++ == sibling)
> > +			return engine;
> > +
> > +	return NULL;
> > +}
> > +
> > +static int guc_virtual_context_pre_pin(struct intel_context *ce,
> > +				       struct i915_gem_ww_ctx *ww,
> > +				       void **vaddr)
> > +{
> > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > +
> > +	return __guc_context_pre_pin(ce, engine, ww, vaddr);
> > +}
> > +
> > +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
> > +{
> > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > +
> > +	return __guc_context_pin(ce, engine, vaddr);
> > +}
> > +
> > +static void guc_virtual_context_enter(struct intel_context *ce)
> > +{
> > +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> > +	struct intel_engine_cs *engine;
> > +
> > +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > +		intel_engine_pm_get(engine);
> > +
> > +	intel_timeline_enter(ce->timeline);
> > +}
> > +
> > +static void guc_virtual_context_exit(struct intel_context *ce)
> > +{
> > +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> > +	struct intel_engine_cs *engine;
> > +
> > +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > +		intel_engine_pm_put(engine);
> > +
> > +	intel_timeline_exit(ce->timeline);
> > +}
> > +
> > +static int guc_virtual_context_alloc(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > +
> > +	return lrc_alloc(ce, engine);
> > +}
> > +
> > +static const struct intel_context_ops virtual_guc_context_ops = {
> > +	.alloc = guc_virtual_context_alloc,
> > +
> > +	.pre_pin = guc_virtual_context_pre_pin,
> > +	.pin = guc_virtual_context_pin,
> > +	.unpin = guc_context_unpin,
> > +	.post_unpin = guc_context_post_unpin,
> > +
> > +	.enter = guc_virtual_context_enter,
> > +	.exit = guc_virtual_context_exit,
> > +
> > +	.sched_disable = guc_context_sched_disable,
> > +
> > +	.destroy = guc_context_destroy,
> > +
> > +	.get_sibling = guc_virtual_get_sibling,
> > +};
> > +
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> >   {
> >   	struct intel_timeline *tl;
> > @@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   	} else if (context_destroyed(ce)) {
> >   		/* Context has been destroyed */
> >   		release_guc_id(guc, ce);
> > -		lrc_destroy(&ce->ref);
> > +		__guc_context_destroy(ce);
> >   	}
> >   	decr_outstanding_submission_g2h(guc);
> > @@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >   			   atomic_read(&ce->guc_sched_state_no_lock));
> >   	}
> >   }
> > +
> > +static struct intel_context *
> > +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > +{
> > +	struct guc_virtual_engine *ve;
> > +	struct intel_guc *guc;
> > +	unsigned int n;
> > +	int err;
> > +
> > +	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
> > +	if (!ve)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	guc = &siblings[0]->gt->uc.guc;
> > +
> > +	ve->base.i915 = siblings[0]->i915;
> > +	ve->base.gt = siblings[0]->gt;
> > +	ve->base.uncore = siblings[0]->uncore;
> > +	ve->base.id = -1;
> > +
> > +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> > +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> > +	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> > +	ve->base.saturated = ALL_ENGINES;
> > +	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> > +	if (!ve->base.breadcrumbs) {
> > +		kfree(ve);
> > +		return ERR_PTR(-ENOMEM);
> > +	}
> > +
> > +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> > +
> > +	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
> > +
> > +	ve->base.cops = &virtual_guc_context_ops;
> > +	ve->base.request_alloc = guc_request_alloc;
> > +
> > +	ve->base.submit_request = guc_submit_request;
> > +
> > +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> > +
> > +	intel_context_init(&ve->context, &ve->base);
> > +
> > +	for (n = 0; n < count; n++) {
> > +		struct intel_engine_cs *sibling = siblings[n];
> > +
> > +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
> > +		if (sibling->mask & ve->base.mask) {
> > +			DRM_DEBUG("duplicate %s entry in load balancer\n",
> > +				  sibling->name);
> > +			err = -EINVAL;
> > +			goto err_put;
> > +		}
> > +
> > +		ve->base.mask |= sibling->mask;
> > +
> > +		if (n != 0 && ve->base.class != sibling->class) {
> > +			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
> > +				  sibling->class, ve->base.class);
> > +			err = -EINVAL;
> > +			goto err_put;
> > +		} else if (n == 0) {
> > +			ve->base.class = sibling->class;
> > +			ve->base.uabi_class = sibling->uabi_class;
> > +			snprintf(ve->base.name, sizeof(ve->base.name),
> > +				 "v%dx%d", ve->base.class, count);
> > +			ve->base.context_size = sibling->context_size;
> > +
> > +			ve->base.emit_bb_start = sibling->emit_bb_start;
> > +			ve->base.emit_flush = sibling->emit_flush;
> > +			ve->base.emit_init_breadcrumb =
> > +				sibling->emit_init_breadcrumb;
> > +			ve->base.emit_fini_breadcrumb =
> > +				sibling->emit_fini_breadcrumb;
> > +			ve->base.emit_fini_breadcrumb_dw =
> > +				sibling->emit_fini_breadcrumb_dw;
> > +
> > +			ve->base.flags |= sibling->flags;
> > +
> > +			ve->base.props.timeslice_duration_ms =
> > +				sibling->props.timeslice_duration_ms;
> > +		}
> > +	}
> > +
> > +	return &ve->context;
> > +
> > +err_put:
> > +	intel_context_put(&ve->context);
> > +	return ERR_PTR(err);
> > +}
> > +
> > +
> > +
> > +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
> > +{
> > +	struct intel_engine_cs *engine;
> > +	intel_engine_mask_t tmp, mask = ve->mask;
> > +
> > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > +		if (READ_ONCE(engine->props.heartbeat_interval_ms))
> > +			return true;
> > +
> > +	return false;
> > +}
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > index 2b9470c90558..5f263ac4f46a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > @@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> >   void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >   					     struct drm_printer *p);
> > +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
> > +
> >   static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
> >   {
> >   	/* XXX: GuC submission is unavailable for now */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 19/51] drm/i915/guc: GuC virtual engines
  2021-07-19 23:42         ` [Intel-gfx] " Daniele Ceraolo Spurio
@ 2021-07-19 23:32           ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 23:32 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, john.c.harrison, dri-devel

On Mon, Jul 19, 2021 at 04:42:50PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/19/2021 4:27 PM, Matthew Brost wrote:
> > On Mon, Jul 19, 2021 at 04:33:56PM -0700, Daniele Ceraolo Spurio wrote:
> > > 
> > > On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > > > Implement GuC virtual engines. Rather simple implementation, basically
> > > > just allocate an engine, setup context enter / exit function to virtual
> > > > engine specific functions, set all other variables / functions to guc
> > > > versions, and set the engine mask to that of all the siblings.
> > > > 
> > > > v2: Update to work with proto-ctx
> > > > 
> > > > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
> > > >    drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
> > > >    drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
> > > >    drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
> > > >    drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
> > > >    .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
> > > >    .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> > > >    drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
> > > >    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
> > > >    .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
> > > >    10 files changed, 308 insertions(+), 35 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > index 64659802d4df..edefe299bd76 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > @@ -74,7 +74,6 @@
> > > >    #include "gt/intel_context_param.h"
> > > >    #include "gt/intel_engine_heartbeat.h"
> > > >    #include "gt/intel_engine_user.h"
> > > > -#include "gt/intel_execlists_submission.h" /* virtual_engine */
> > > >    #include "gt/intel_gpu_commands.h"
> > > >    #include "gt/intel_ring.h"
> > > > @@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
> > > >    	if (!HAS_EXECLISTS(i915))
> > > >    		return -ENODEV;
> > > > -	if (intel_uc_uses_guc_submission(&i915->gt.uc))
> > > > -		return -ENODEV; /* not implement yet */
> > > > -
> > > >    	if (get_user(idx, &ext->engine_index))
> > > >    		return -EFAULT;
> > > > @@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
> > > >    			break;
> > > >    		case I915_GEM_ENGINE_TYPE_BALANCED:
> > > > -			ce = intel_execlists_create_virtual(pe[n].siblings,
> > > > -							    pe[n].num_siblings);
> > > > +			ce = intel_engine_create_virtual(pe[n].siblings,
> > > > +							 pe[n].num_siblings);
> > > >    			break;
> > > >    		case I915_GEM_ENGINE_TYPE_INVALID:
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > > > index 20411db84914..2639c719a7a6 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > > > @@ -10,6 +10,7 @@
> > > >    #include "i915_gem_context_types.h"
> > > >    #include "gt/intel_context.h"
> > > > +#include "gt/intel_engine.h"
> > > Apologies for the late question, but why do you need this include in this
> > > header? nothing else is changing within the file.
> > > 
> > It likely doesn't need to be included. Let me see what the build / CI
> > says if I remove it.
> > 
> > > >    #include "i915_drv.h"
> > > >    #include "i915_gem.h"
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > index 4a5518d295c2..542c98418771 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > @@ -47,6 +47,12 @@ struct intel_context_ops {
> > > >    	void (*reset)(struct intel_context *ce);
> > > >    	void (*destroy)(struct kref *kref);
> > > > +
> > > > +	/* virtual engine/context interface */
> > > > +	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
> > > > +						unsigned int count);
> > > > +	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
> > > > +					       unsigned int sibling);
> > > >    };
> > > >    struct intel_context {
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > > > index f911c1224ab2..9fec0aca5f4b 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > > > @@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
> > > >    	return intel_engine_has_preemption(engine);
> > > >    }
> > > > +struct intel_context *
> > > > +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> > > > +			    unsigned int count);
> > > > +
> > > > +static inline bool
> > > > +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
> > > > +{
> > > > +	if (intel_engine_uses_guc(engine))
> > > > +		return intel_guc_virtual_engine_has_heartbeat(engine);
> > > > +	else
> > > > +		GEM_BUG_ON("Only should be called in GuC submission");
> > > I insist that this needs a better explanation, as I've commented on the
> > > previous rev. Given that this is a shared file, it is not immediately
> > > evident why this call shouldn't be called with the execlists backend.
> > > 
> > Sure, can add a comment. Basically it comes down to it is always called
> > an active request which in execlists mode is always a physical engine.
> 
> I had added a sample comment in the previous review in case you want to copy
> it. I remember you explaining this to me a while back, but it always takes
> me a moment to recall it when I look at this code and I expect someone not
> familiar with the backend implementation would be even more confused.
> 

Let me see if I can dig up that comment.

> Daniele
> 
> > 
> > Matt
> > 
> > > Daniele
> > > 
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > >    static inline bool
> > > >    intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
> > > >    {
> > > >    	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
> > > >    		return false;
> > > > -	return READ_ONCE(engine->props.heartbeat_interval_ms);
> > > > +	if (intel_engine_is_virtual(engine))
> > > > +		return intel_virtual_engine_has_heartbeat(engine);
> > > > +	else
> > > > +		return READ_ONCE(engine->props.heartbeat_interval_ms);
> > > > +}
> > > > +
> > > > +static inline struct intel_engine_cs *
> > > > +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> > > > +{
> > > > +	GEM_BUG_ON(!intel_engine_is_virtual(engine));
> > > > +	return engine->cops->get_sibling(engine, sibling);
> > > >    }
> > > >    #endif /* _INTEL_RINGBUFFER_H_ */
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > > index d561573ed98c..b7292d5cb7da 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > > @@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
> > > >    	return total;
> > > >    }
> > > > +struct intel_context *
> > > > +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> > > > +			    unsigned int count)
> > > > +{
> > > > +	if (count == 0)
> > > > +		return ERR_PTR(-EINVAL);
> > > > +
> > > > +	if (count == 1)
> > > > +		return intel_context_create(siblings[0]);
> > > > +
> > > > +	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
> > > > +	return siblings[0]->cops->create_virtual(siblings, count);
> > > > +}
> > > > +
> > > >    static bool match_ring(struct i915_request *rq)
> > > >    {
> > > >    	u32 ring = ENGINE_READ(rq->engine, RING_START);
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > index 56e25090da67..28492cdce706 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > @@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
> > > >    	return container_of(engine, struct virtual_engine, base);
> > > >    }
> > > > +static struct intel_context *
> > > > +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> > > > +
> > > >    static struct i915_request *
> > > >    __active_request(const struct intel_timeline * const tl,
> > > >    		 struct i915_request *rq,
> > > > @@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
> > > >    	.reset = lrc_reset,
> > > >    	.destroy = lrc_destroy,
> > > > +
> > > > +	.create_virtual = execlists_create_virtual,
> > > >    };
> > > >    static int emit_pdps(struct i915_request *rq)
> > > > @@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
> > > >    		intel_engine_pm_put(ve->siblings[n]);
> > > >    }
> > > > +static struct intel_engine_cs *
> > > > +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> > > > +{
> > > > +	struct virtual_engine *ve = to_virtual_engine(engine);
> > > > +
> > > > +	if (sibling >= ve->num_siblings)
> > > > +		return NULL;
> > > > +
> > > > +	return ve->siblings[sibling];
> > > > +}
> > > > +
> > > >    static const struct intel_context_ops virtual_context_ops = {
> > > >    	.flags = COPS_HAS_INFLIGHT,
> > > > @@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
> > > >    	.exit = virtual_context_exit,
> > > >    	.destroy = virtual_context_destroy,
> > > > +
> > > > +	.get_sibling = virtual_get_sibling,
> > > >    };
> > > >    static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
> > > > @@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
> > > >    	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
> > > >    }
> > > > -struct intel_context *
> > > > -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > -			       unsigned int count)
> > > > +static struct intel_context *
> > > > +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > > >    {
> > > >    	struct virtual_engine *ve;
> > > >    	unsigned int n;
> > > >    	int err;
> > > > -	if (count == 0)
> > > > -		return ERR_PTR(-EINVAL);
> > > > -
> > > > -	if (count == 1)
> > > > -		return intel_context_create(siblings[0]);
> > > > -
> > > >    	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
> > > >    	if (!ve)
> > > >    		return ERR_PTR(-ENOMEM);
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > index ad4f3e1a0fde..a1aa92c983a5 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > @@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
> > > >    							int indent),
> > > >    				   unsigned int max);
> > > > -struct intel_context *
> > > > -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > -			       unsigned int count);
> > > > -
> > > >    bool
> > > >    intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > index 73ddc6e14730..59cf8afc6d6f 100644
> > > > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
> > > >    	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
> > > >    	for (n = 0; n < nctx; n++) {
> > > > -		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
> > > > +		ve[n] = intel_engine_create_virtual(siblings, nsibling);
> > > >    		if (IS_ERR(ve[n])) {
> > > >    			err = PTR_ERR(ve[n]);
> > > >    			nctx = n;
> > > > @@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
> > > >    	 * restrict it to our desired engine within the virtual engine.
> > > >    	 */
> > > > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > +	ve = intel_engine_create_virtual(siblings, nsibling);
> > > >    	if (IS_ERR(ve)) {
> > > >    		err = PTR_ERR(ve);
> > > >    		goto out_close;
> > > > @@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
> > > >    		i915_request_add(rq);
> > > >    	}
> > > > -	ce = intel_execlists_create_virtual(siblings, nsibling);
> > > > +	ce = intel_engine_create_virtual(siblings, nsibling);
> > > >    	if (IS_ERR(ce)) {
> > > >    		err = PTR_ERR(ce);
> > > >    		goto out;
> > > > @@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
> > > >    	/* XXX We do not handle oversubscription and fairness with normal rq */
> > > >    	for (n = 0; n < nsibling; n++) {
> > > > -		ce = intel_execlists_create_virtual(siblings, nsibling);
> > > > +		ce = intel_engine_create_virtual(siblings, nsibling);
> > > >    		if (IS_ERR(ce)) {
> > > >    			err = PTR_ERR(ce);
> > > >    			goto out;
> > > > @@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
> > > >    	if (err)
> > > >    		goto out_scratch;
> > > > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > +	ve = intel_engine_create_virtual(siblings, nsibling);
> > > >    	if (IS_ERR(ve)) {
> > > >    		err = PTR_ERR(ve);
> > > >    		goto out_scratch;
> > > > @@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
> > > >    	if (igt_spinner_init(&spin, gt))
> > > >    		return -ENOMEM;
> > > > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > +	ve = intel_engine_create_virtual(siblings, nsibling);
> > > >    	if (IS_ERR(ve)) {
> > > >    		err = PTR_ERR(ve);
> > > >    		goto out_spin;
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 05958260e849..7b3e1c91e689 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -60,6 +60,15 @@
> > > >     *
> > > >     */
> > > > +/* GuC Virtual Engine */
> > > > +struct guc_virtual_engine {
> > > > +	struct intel_engine_cs base;
> > > > +	struct intel_context context;
> > > > +};
> > > > +
> > > > +static struct intel_context *
> > > > +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> > > > +
> > > >    #define GUC_REQUEST_SIZE 64 /* bytes */
> > > >    /*
> > > > @@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
> > > >    	return ret;
> > > >    }
> > > > -static int guc_context_pre_pin(struct intel_context *ce,
> > > > -			       struct i915_gem_ww_ctx *ww,
> > > > -			       void **vaddr)
> > > > +static int __guc_context_pre_pin(struct intel_context *ce,
> > > > +				 struct intel_engine_cs *engine,
> > > > +				 struct i915_gem_ww_ctx *ww,
> > > > +				 void **vaddr)
> > > >    {
> > > > -	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
> > > > +	return lrc_pre_pin(ce, engine, ww, vaddr);
> > > >    }
> > > > -static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > > > +static int __guc_context_pin(struct intel_context *ce,
> > > > +			     struct intel_engine_cs *engine,
> > > > +			     void *vaddr)
> > > >    {
> > > >    	if (i915_ggtt_offset(ce->state) !=
> > > >    	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> > > >    		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> > > > -	return lrc_pin(ce, ce->engine, vaddr);
> > > > +	return lrc_pin(ce, engine, vaddr);
> > > > +}
> > > > +
> > > > +static int guc_context_pre_pin(struct intel_context *ce,
> > > > +			       struct i915_gem_ww_ctx *ww,
> > > > +			       void **vaddr)
> > > > +{
> > > > +	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
> > > > +}
> > > > +
> > > > +static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > > > +{
> > > > +	return __guc_context_pin(ce, ce->engine, vaddr);
> > > >    }
> > > >    static void guc_context_unpin(struct intel_context *ce)
> > > > @@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> > > >    	deregister_context(ce, ce->guc_id);
> > > >    }
> > > > +static void __guc_context_destroy(struct intel_context *ce)
> > > > +{
> > > > +	lrc_fini(ce);
> > > > +	intel_context_fini(ce);
> > > > +
> > > > +	if (intel_engine_is_virtual(ce->engine)) {
> > > > +		struct guc_virtual_engine *ve =
> > > > +			container_of(ce, typeof(*ve), context);
> > > > +
> > > > +		kfree(ve);
> > > > +	} else {
> > > > +		intel_context_free(ce);
> > > > +	}
> > > > +}
> > > > +
> > > >    static void guc_context_destroy(struct kref *kref)
> > > >    {
> > > >    	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > > > @@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
> > > >    	if (context_guc_id_invalid(ce) ||
> > > >    	    !lrc_desc_registered(guc, ce->guc_id)) {
> > > >    		release_guc_id(guc, ce);
> > > > -		lrc_destroy(kref);
> > > > +		__guc_context_destroy(ce);
> > > >    		return;
> > > >    	}
> > > > @@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
> > > >    	if (context_guc_id_invalid(ce)) {
> > > >    		__release_guc_id(guc, ce);
> > > >    		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > > > -		lrc_destroy(kref);
> > > > +		__guc_context_destroy(ce);
> > > >    		return;
> > > >    	}
> > > > @@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
> > > >    	.reset = lrc_reset,
> > > >    	.destroy = guc_context_destroy,
> > > > +
> > > > +	.create_virtual = guc_create_virtual,
> > > >    };
> > > >    static void __guc_signal_context_fence(struct intel_context *ce)
> > > > @@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
> > > >    	return 0;
> > > >    }
> > > > +static struct intel_engine_cs *
> > > > +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > > > +{
> > > > +	struct intel_engine_cs *engine;
> > > > +	intel_engine_mask_t tmp, mask = ve->mask;
> > > > +	unsigned int num_siblings = 0;
> > > > +
> > > > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > > > +		if (num_siblings++ == sibling)
> > > > +			return engine;
> > > > +
> > > > +	return NULL;
> > > > +}
> > > > +
> > > > +static int guc_virtual_context_pre_pin(struct intel_context *ce,
> > > > +				       struct i915_gem_ww_ctx *ww,
> > > > +				       void **vaddr)
> > > > +{
> > > > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > > > +
> > > > +	return __guc_context_pre_pin(ce, engine, ww, vaddr);
> > > > +}
> > > > +
> > > > +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
> > > > +{
> > > > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > > > +
> > > > +	return __guc_context_pin(ce, engine, vaddr);
> > > > +}
> > > > +
> > > > +static void guc_virtual_context_enter(struct intel_context *ce)
> > > > +{
> > > > +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> > > > +	struct intel_engine_cs *engine;
> > > > +
> > > > +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > > > +		intel_engine_pm_get(engine);
> > > > +
> > > > +	intel_timeline_enter(ce->timeline);
> > > > +}
> > > > +
> > > > +static void guc_virtual_context_exit(struct intel_context *ce)
> > > > +{
> > > > +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> > > > +	struct intel_engine_cs *engine;
> > > > +
> > > > +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > > > +		intel_engine_pm_put(engine);
> > > > +
> > > > +	intel_timeline_exit(ce->timeline);
> > > > +}
> > > > +
> > > > +static int guc_virtual_context_alloc(struct intel_context *ce)
> > > > +{
> > > > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > > > +
> > > > +	return lrc_alloc(ce, engine);
> > > > +}
> > > > +
> > > > +static const struct intel_context_ops virtual_guc_context_ops = {
> > > > +	.alloc = guc_virtual_context_alloc,
> > > > +
> > > > +	.pre_pin = guc_virtual_context_pre_pin,
> > > > +	.pin = guc_virtual_context_pin,
> > > > +	.unpin = guc_context_unpin,
> > > > +	.post_unpin = guc_context_post_unpin,
> > > > +
> > > > +	.enter = guc_virtual_context_enter,
> > > > +	.exit = guc_virtual_context_exit,
> > > > +
> > > > +	.sched_disable = guc_context_sched_disable,
> > > > +
> > > > +	.destroy = guc_context_destroy,
> > > > +
> > > > +	.get_sibling = guc_virtual_get_sibling,
> > > > +};
> > > > +
> > > >    static void sanitize_hwsp(struct intel_engine_cs *engine)
> > > >    {
> > > >    	struct intel_timeline *tl;
> > > > @@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > > >    	} else if (context_destroyed(ce)) {
> > > >    		/* Context has been destroyed */
> > > >    		release_guc_id(guc, ce);
> > > > -		lrc_destroy(&ce->ref);
> > > > +		__guc_context_destroy(ce);
> > > >    	}
> > > >    	decr_outstanding_submission_g2h(guc);
> > > > @@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
> > > >    			   atomic_read(&ce->guc_sched_state_no_lock));
> > > >    	}
> > > >    }
> > > > +
> > > > +static struct intel_context *
> > > > +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > > > +{
> > > > +	struct guc_virtual_engine *ve;
> > > > +	struct intel_guc *guc;
> > > > +	unsigned int n;
> > > > +	int err;
> > > > +
> > > > +	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
> > > > +	if (!ve)
> > > > +		return ERR_PTR(-ENOMEM);
> > > > +
> > > > +	guc = &siblings[0]->gt->uc.guc;
> > > > +
> > > > +	ve->base.i915 = siblings[0]->i915;
> > > > +	ve->base.gt = siblings[0]->gt;
> > > > +	ve->base.uncore = siblings[0]->uncore;
> > > > +	ve->base.id = -1;
> > > > +
> > > > +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> > > > +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> > > > +	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> > > > +	ve->base.saturated = ALL_ENGINES;
> > > > +	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> > > > +	if (!ve->base.breadcrumbs) {
> > > > +		kfree(ve);
> > > > +		return ERR_PTR(-ENOMEM);
> > > > +	}
> > > > +
> > > > +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> > > > +
> > > > +	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
> > > > +
> > > > +	ve->base.cops = &virtual_guc_context_ops;
> > > > +	ve->base.request_alloc = guc_request_alloc;
> > > > +
> > > > +	ve->base.submit_request = guc_submit_request;
> > > > +
> > > > +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> > > > +
> > > > +	intel_context_init(&ve->context, &ve->base);
> > > > +
> > > > +	for (n = 0; n < count; n++) {
> > > > +		struct intel_engine_cs *sibling = siblings[n];
> > > > +
> > > > +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
> > > > +		if (sibling->mask & ve->base.mask) {
> > > > +			DRM_DEBUG("duplicate %s entry in load balancer\n",
> > > > +				  sibling->name);
> > > > +			err = -EINVAL;
> > > > +			goto err_put;
> > > > +		}
> > > > +
> > > > +		ve->base.mask |= sibling->mask;
> > > > +
> > > > +		if (n != 0 && ve->base.class != sibling->class) {
> > > > +			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
> > > > +				  sibling->class, ve->base.class);
> > > > +			err = -EINVAL;
> > > > +			goto err_put;
> > > > +		} else if (n == 0) {
> > > > +			ve->base.class = sibling->class;
> > > > +			ve->base.uabi_class = sibling->uabi_class;
> > > > +			snprintf(ve->base.name, sizeof(ve->base.name),
> > > > +				 "v%dx%d", ve->base.class, count);
> > > > +			ve->base.context_size = sibling->context_size;
> > > > +
> > > > +			ve->base.emit_bb_start = sibling->emit_bb_start;
> > > > +			ve->base.emit_flush = sibling->emit_flush;
> > > > +			ve->base.emit_init_breadcrumb =
> > > > +				sibling->emit_init_breadcrumb;
> > > > +			ve->base.emit_fini_breadcrumb =
> > > > +				sibling->emit_fini_breadcrumb;
> > > > +			ve->base.emit_fini_breadcrumb_dw =
> > > > +				sibling->emit_fini_breadcrumb_dw;
> > > > +
> > > > +			ve->base.flags |= sibling->flags;
> > > > +
> > > > +			ve->base.props.timeslice_duration_ms =
> > > > +				sibling->props.timeslice_duration_ms;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return &ve->context;
> > > > +
> > > > +err_put:
> > > > +	intel_context_put(&ve->context);
> > > > +	return ERR_PTR(err);
> > > > +}
> > > > +
> > > > +
> > > > +
> > > > +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
> > > > +{
> > > > +	struct intel_engine_cs *engine;
> > > > +	intel_engine_mask_t tmp, mask = ve->mask;
> > > > +
> > > > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > > > +		if (READ_ONCE(engine->props.heartbeat_interval_ms))
> > > > +			return true;
> > > > +
> > > > +	return false;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > > > index 2b9470c90558..5f263ac4f46a 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > > > @@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> > > >    void intel_guc_submission_print_context_info(struct intel_guc *guc,
> > > >    					     struct drm_printer *p);
> > > > +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
> > > > +
> > > >    static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
> > > >    {
> > > >    	/* XXX: GuC submission is unavailable for now */
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 19/51] drm/i915/guc: GuC virtual engines
@ 2021-07-19 23:32           ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-19 23:32 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 04:42:50PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/19/2021 4:27 PM, Matthew Brost wrote:
> > On Mon, Jul 19, 2021 at 04:33:56PM -0700, Daniele Ceraolo Spurio wrote:
> > > 
> > > On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > > > Implement GuC virtual engines. Rather simple implementation, basically
> > > > just allocate an engine, setup context enter / exit function to virtual
> > > > engine specific functions, set all other variables / functions to guc
> > > > versions, and set the engine mask to that of all the siblings.
> > > > 
> > > > v2: Update to work with proto-ctx
> > > > 
> > > > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > ---
> > > >    drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
> > > >    drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
> > > >    drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
> > > >    drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
> > > >    drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
> > > >    .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
> > > >    .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> > > >    drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
> > > >    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
> > > >    .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
> > > >    10 files changed, 308 insertions(+), 35 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > index 64659802d4df..edefe299bd76 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > > @@ -74,7 +74,6 @@
> > > >    #include "gt/intel_context_param.h"
> > > >    #include "gt/intel_engine_heartbeat.h"
> > > >    #include "gt/intel_engine_user.h"
> > > > -#include "gt/intel_execlists_submission.h" /* virtual_engine */
> > > >    #include "gt/intel_gpu_commands.h"
> > > >    #include "gt/intel_ring.h"
> > > > @@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
> > > >    	if (!HAS_EXECLISTS(i915))
> > > >    		return -ENODEV;
> > > > -	if (intel_uc_uses_guc_submission(&i915->gt.uc))
> > > > -		return -ENODEV; /* not implement yet */
> > > > -
> > > >    	if (get_user(idx, &ext->engine_index))
> > > >    		return -EFAULT;
> > > > @@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
> > > >    			break;
> > > >    		case I915_GEM_ENGINE_TYPE_BALANCED:
> > > > -			ce = intel_execlists_create_virtual(pe[n].siblings,
> > > > -							    pe[n].num_siblings);
> > > > +			ce = intel_engine_create_virtual(pe[n].siblings,
> > > > +							 pe[n].num_siblings);
> > > >    			break;
> > > >    		case I915_GEM_ENGINE_TYPE_INVALID:
> > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > > > index 20411db84914..2639c719a7a6 100644
> > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> > > > @@ -10,6 +10,7 @@
> > > >    #include "i915_gem_context_types.h"
> > > >    #include "gt/intel_context.h"
> > > > +#include "gt/intel_engine.h"
> > > Apologies for the late question, but why do you need this include in this
> > > header? nothing else is changing within the file.
> > > 
> > It likely doesn't need to be included. Let me see what the build / CI
> > says if I remove it.
> > 
> > > >    #include "i915_drv.h"
> > > >    #include "i915_gem.h"
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > index 4a5518d295c2..542c98418771 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > > > @@ -47,6 +47,12 @@ struct intel_context_ops {
> > > >    	void (*reset)(struct intel_context *ce);
> > > >    	void (*destroy)(struct kref *kref);
> > > > +
> > > > +	/* virtual engine/context interface */
> > > > +	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
> > > > +						unsigned int count);
> > > > +	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
> > > > +					       unsigned int sibling);
> > > >    };
> > > >    struct intel_context {
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > > > index f911c1224ab2..9fec0aca5f4b 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > > > @@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
> > > >    	return intel_engine_has_preemption(engine);
> > > >    }
> > > > +struct intel_context *
> > > > +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> > > > +			    unsigned int count);
> > > > +
> > > > +static inline bool
> > > > +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
> > > > +{
> > > > +	if (intel_engine_uses_guc(engine))
> > > > +		return intel_guc_virtual_engine_has_heartbeat(engine);
> > > > +	else
> > > > +		GEM_BUG_ON("Only should be called in GuC submission");
> > > I insist that this needs a better explanation, as I've commented on the
> > > previous rev. Given that this is a shared file, it is not immediately
> > > evident why this call shouldn't be called with the execlists backend.
> > > 
> > Sure, can add a comment. Basically it comes down to it is always called
> > an active request which in execlists mode is always a physical engine.
> 
> I had added a sample comment in the previous review in case you want to copy
> it. I remember you explaining this to me a while back, but it always takes
> me a moment to recall it when I look at this code and I expect someone not
> familiar with the backend implementation would be even more confused.
> 

Let me see if I can dig up that comment.

> Daniele
> 
> > 
> > Matt
> > 
> > > Daniele
> > > 
> > > > +
> > > > +	return false;
> > > > +}
> > > > +
> > > >    static inline bool
> > > >    intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
> > > >    {
> > > >    	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
> > > >    		return false;
> > > > -	return READ_ONCE(engine->props.heartbeat_interval_ms);
> > > > +	if (intel_engine_is_virtual(engine))
> > > > +		return intel_virtual_engine_has_heartbeat(engine);
> > > > +	else
> > > > +		return READ_ONCE(engine->props.heartbeat_interval_ms);
> > > > +}
> > > > +
> > > > +static inline struct intel_engine_cs *
> > > > +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> > > > +{
> > > > +	GEM_BUG_ON(!intel_engine_is_virtual(engine));
> > > > +	return engine->cops->get_sibling(engine, sibling);
> > > >    }
> > > >    #endif /* _INTEL_RINGBUFFER_H_ */
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > > index d561573ed98c..b7292d5cb7da 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > > > @@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
> > > >    	return total;
> > > >    }
> > > > +struct intel_context *
> > > > +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> > > > +			    unsigned int count)
> > > > +{
> > > > +	if (count == 0)
> > > > +		return ERR_PTR(-EINVAL);
> > > > +
> > > > +	if (count == 1)
> > > > +		return intel_context_create(siblings[0]);
> > > > +
> > > > +	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
> > > > +	return siblings[0]->cops->create_virtual(siblings, count);
> > > > +}
> > > > +
> > > >    static bool match_ring(struct i915_request *rq)
> > > >    {
> > > >    	u32 ring = ENGINE_READ(rq->engine, RING_START);
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > index 56e25090da67..28492cdce706 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > > @@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
> > > >    	return container_of(engine, struct virtual_engine, base);
> > > >    }
> > > > +static struct intel_context *
> > > > +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> > > > +
> > > >    static struct i915_request *
> > > >    __active_request(const struct intel_timeline * const tl,
> > > >    		 struct i915_request *rq,
> > > > @@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
> > > >    	.reset = lrc_reset,
> > > >    	.destroy = lrc_destroy,
> > > > +
> > > > +	.create_virtual = execlists_create_virtual,
> > > >    };
> > > >    static int emit_pdps(struct i915_request *rq)
> > > > @@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
> > > >    		intel_engine_pm_put(ve->siblings[n]);
> > > >    }
> > > > +static struct intel_engine_cs *
> > > > +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> > > > +{
> > > > +	struct virtual_engine *ve = to_virtual_engine(engine);
> > > > +
> > > > +	if (sibling >= ve->num_siblings)
> > > > +		return NULL;
> > > > +
> > > > +	return ve->siblings[sibling];
> > > > +}
> > > > +
> > > >    static const struct intel_context_ops virtual_context_ops = {
> > > >    	.flags = COPS_HAS_INFLIGHT,
> > > > @@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
> > > >    	.exit = virtual_context_exit,
> > > >    	.destroy = virtual_context_destroy,
> > > > +
> > > > +	.get_sibling = virtual_get_sibling,
> > > >    };
> > > >    static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
> > > > @@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
> > > >    	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
> > > >    }
> > > > -struct intel_context *
> > > > -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > -			       unsigned int count)
> > > > +static struct intel_context *
> > > > +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > > >    {
> > > >    	struct virtual_engine *ve;
> > > >    	unsigned int n;
> > > >    	int err;
> > > > -	if (count == 0)
> > > > -		return ERR_PTR(-EINVAL);
> > > > -
> > > > -	if (count == 1)
> > > > -		return intel_context_create(siblings[0]);
> > > > -
> > > >    	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
> > > >    	if (!ve)
> > > >    		return ERR_PTR(-ENOMEM);
> > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > index ad4f3e1a0fde..a1aa92c983a5 100644
> > > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> > > > @@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
> > > >    							int indent),
> > > >    				   unsigned int max);
> > > > -struct intel_context *
> > > > -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> > > > -			       unsigned int count);
> > > > -
> > > >    bool
> > > >    intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
> > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > index 73ddc6e14730..59cf8afc6d6f 100644
> > > > --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> > > > @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
> > > >    	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
> > > >    	for (n = 0; n < nctx; n++) {
> > > > -		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
> > > > +		ve[n] = intel_engine_create_virtual(siblings, nsibling);
> > > >    		if (IS_ERR(ve[n])) {
> > > >    			err = PTR_ERR(ve[n]);
> > > >    			nctx = n;
> > > > @@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
> > > >    	 * restrict it to our desired engine within the virtual engine.
> > > >    	 */
> > > > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > +	ve = intel_engine_create_virtual(siblings, nsibling);
> > > >    	if (IS_ERR(ve)) {
> > > >    		err = PTR_ERR(ve);
> > > >    		goto out_close;
> > > > @@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
> > > >    		i915_request_add(rq);
> > > >    	}
> > > > -	ce = intel_execlists_create_virtual(siblings, nsibling);
> > > > +	ce = intel_engine_create_virtual(siblings, nsibling);
> > > >    	if (IS_ERR(ce)) {
> > > >    		err = PTR_ERR(ce);
> > > >    		goto out;
> > > > @@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
> > > >    	/* XXX We do not handle oversubscription and fairness with normal rq */
> > > >    	for (n = 0; n < nsibling; n++) {
> > > > -		ce = intel_execlists_create_virtual(siblings, nsibling);
> > > > +		ce = intel_engine_create_virtual(siblings, nsibling);
> > > >    		if (IS_ERR(ce)) {
> > > >    			err = PTR_ERR(ce);
> > > >    			goto out;
> > > > @@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
> > > >    	if (err)
> > > >    		goto out_scratch;
> > > > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > +	ve = intel_engine_create_virtual(siblings, nsibling);
> > > >    	if (IS_ERR(ve)) {
> > > >    		err = PTR_ERR(ve);
> > > >    		goto out_scratch;
> > > > @@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
> > > >    	if (igt_spinner_init(&spin, gt))
> > > >    		return -ENOMEM;
> > > > -	ve = intel_execlists_create_virtual(siblings, nsibling);
> > > > +	ve = intel_engine_create_virtual(siblings, nsibling);
> > > >    	if (IS_ERR(ve)) {
> > > >    		err = PTR_ERR(ve);
> > > >    		goto out_spin;
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > index 05958260e849..7b3e1c91e689 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > > @@ -60,6 +60,15 @@
> > > >     *
> > > >     */
> > > > +/* GuC Virtual Engine */
> > > > +struct guc_virtual_engine {
> > > > +	struct intel_engine_cs base;
> > > > +	struct intel_context context;
> > > > +};
> > > > +
> > > > +static struct intel_context *
> > > > +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> > > > +
> > > >    #define GUC_REQUEST_SIZE 64 /* bytes */
> > > >    /*
> > > > @@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
> > > >    	return ret;
> > > >    }
> > > > -static int guc_context_pre_pin(struct intel_context *ce,
> > > > -			       struct i915_gem_ww_ctx *ww,
> > > > -			       void **vaddr)
> > > > +static int __guc_context_pre_pin(struct intel_context *ce,
> > > > +				 struct intel_engine_cs *engine,
> > > > +				 struct i915_gem_ww_ctx *ww,
> > > > +				 void **vaddr)
> > > >    {
> > > > -	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
> > > > +	return lrc_pre_pin(ce, engine, ww, vaddr);
> > > >    }
> > > > -static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > > > +static int __guc_context_pin(struct intel_context *ce,
> > > > +			     struct intel_engine_cs *engine,
> > > > +			     void *vaddr)
> > > >    {
> > > >    	if (i915_ggtt_offset(ce->state) !=
> > > >    	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> > > >    		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> > > > -	return lrc_pin(ce, ce->engine, vaddr);
> > > > +	return lrc_pin(ce, engine, vaddr);
> > > > +}
> > > > +
> > > > +static int guc_context_pre_pin(struct intel_context *ce,
> > > > +			       struct i915_gem_ww_ctx *ww,
> > > > +			       void **vaddr)
> > > > +{
> > > > +	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
> > > > +}
> > > > +
> > > > +static int guc_context_pin(struct intel_context *ce, void *vaddr)
> > > > +{
> > > > +	return __guc_context_pin(ce, ce->engine, vaddr);
> > > >    }
> > > >    static void guc_context_unpin(struct intel_context *ce)
> > > > @@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> > > >    	deregister_context(ce, ce->guc_id);
> > > >    }
> > > > +static void __guc_context_destroy(struct intel_context *ce)
> > > > +{
> > > > +	lrc_fini(ce);
> > > > +	intel_context_fini(ce);
> > > > +
> > > > +	if (intel_engine_is_virtual(ce->engine)) {
> > > > +		struct guc_virtual_engine *ve =
> > > > +			container_of(ce, typeof(*ve), context);
> > > > +
> > > > +		kfree(ve);
> > > > +	} else {
> > > > +		intel_context_free(ce);
> > > > +	}
> > > > +}
> > > > +
> > > >    static void guc_context_destroy(struct kref *kref)
> > > >    {
> > > >    	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > > > @@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
> > > >    	if (context_guc_id_invalid(ce) ||
> > > >    	    !lrc_desc_registered(guc, ce->guc_id)) {
> > > >    		release_guc_id(guc, ce);
> > > > -		lrc_destroy(kref);
> > > > +		__guc_context_destroy(ce);
> > > >    		return;
> > > >    	}
> > > > @@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
> > > >    	if (context_guc_id_invalid(ce)) {
> > > >    		__release_guc_id(guc, ce);
> > > >    		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > > > -		lrc_destroy(kref);
> > > > +		__guc_context_destroy(ce);
> > > >    		return;
> > > >    	}
> > > > @@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
> > > >    	.reset = lrc_reset,
> > > >    	.destroy = guc_context_destroy,
> > > > +
> > > > +	.create_virtual = guc_create_virtual,
> > > >    };
> > > >    static void __guc_signal_context_fence(struct intel_context *ce)
> > > > @@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
> > > >    	return 0;
> > > >    }
> > > > +static struct intel_engine_cs *
> > > > +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > > > +{
> > > > +	struct intel_engine_cs *engine;
> > > > +	intel_engine_mask_t tmp, mask = ve->mask;
> > > > +	unsigned int num_siblings = 0;
> > > > +
> > > > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > > > +		if (num_siblings++ == sibling)
> > > > +			return engine;
> > > > +
> > > > +	return NULL;
> > > > +}
> > > > +
> > > > +static int guc_virtual_context_pre_pin(struct intel_context *ce,
> > > > +				       struct i915_gem_ww_ctx *ww,
> > > > +				       void **vaddr)
> > > > +{
> > > > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > > > +
> > > > +	return __guc_context_pre_pin(ce, engine, ww, vaddr);
> > > > +}
> > > > +
> > > > +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
> > > > +{
> > > > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > > > +
> > > > +	return __guc_context_pin(ce, engine, vaddr);
> > > > +}
> > > > +
> > > > +static void guc_virtual_context_enter(struct intel_context *ce)
> > > > +{
> > > > +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> > > > +	struct intel_engine_cs *engine;
> > > > +
> > > > +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > > > +		intel_engine_pm_get(engine);
> > > > +
> > > > +	intel_timeline_enter(ce->timeline);
> > > > +}
> > > > +
> > > > +static void guc_virtual_context_exit(struct intel_context *ce)
> > > > +{
> > > > +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> > > > +	struct intel_engine_cs *engine;
> > > > +
> > > > +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> > > > +		intel_engine_pm_put(engine);
> > > > +
> > > > +	intel_timeline_exit(ce->timeline);
> > > > +}
> > > > +
> > > > +static int guc_virtual_context_alloc(struct intel_context *ce)
> > > > +{
> > > > +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> > > > +
> > > > +	return lrc_alloc(ce, engine);
> > > > +}
> > > > +
> > > > +static const struct intel_context_ops virtual_guc_context_ops = {
> > > > +	.alloc = guc_virtual_context_alloc,
> > > > +
> > > > +	.pre_pin = guc_virtual_context_pre_pin,
> > > > +	.pin = guc_virtual_context_pin,
> > > > +	.unpin = guc_context_unpin,
> > > > +	.post_unpin = guc_context_post_unpin,
> > > > +
> > > > +	.enter = guc_virtual_context_enter,
> > > > +	.exit = guc_virtual_context_exit,
> > > > +
> > > > +	.sched_disable = guc_context_sched_disable,
> > > > +
> > > > +	.destroy = guc_context_destroy,
> > > > +
> > > > +	.get_sibling = guc_virtual_get_sibling,
> > > > +};
> > > > +
> > > >    static void sanitize_hwsp(struct intel_engine_cs *engine)
> > > >    {
> > > >    	struct intel_timeline *tl;
> > > > @@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > > >    	} else if (context_destroyed(ce)) {
> > > >    		/* Context has been destroyed */
> > > >    		release_guc_id(guc, ce);
> > > > -		lrc_destroy(&ce->ref);
> > > > +		__guc_context_destroy(ce);
> > > >    	}
> > > >    	decr_outstanding_submission_g2h(guc);
> > > > @@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
> > > >    			   atomic_read(&ce->guc_sched_state_no_lock));
> > > >    	}
> > > >    }
> > > > +
> > > > +static struct intel_context *
> > > > +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > > > +{
> > > > +	struct guc_virtual_engine *ve;
> > > > +	struct intel_guc *guc;
> > > > +	unsigned int n;
> > > > +	int err;
> > > > +
> > > > +	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
> > > > +	if (!ve)
> > > > +		return ERR_PTR(-ENOMEM);
> > > > +
> > > > +	guc = &siblings[0]->gt->uc.guc;
> > > > +
> > > > +	ve->base.i915 = siblings[0]->i915;
> > > > +	ve->base.gt = siblings[0]->gt;
> > > > +	ve->base.uncore = siblings[0]->uncore;
> > > > +	ve->base.id = -1;
> > > > +
> > > > +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> > > > +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> > > > +	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> > > > +	ve->base.saturated = ALL_ENGINES;
> > > > +	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> > > > +	if (!ve->base.breadcrumbs) {
> > > > +		kfree(ve);
> > > > +		return ERR_PTR(-ENOMEM);
> > > > +	}
> > > > +
> > > > +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> > > > +
> > > > +	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
> > > > +
> > > > +	ve->base.cops = &virtual_guc_context_ops;
> > > > +	ve->base.request_alloc = guc_request_alloc;
> > > > +
> > > > +	ve->base.submit_request = guc_submit_request;
> > > > +
> > > > +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> > > > +
> > > > +	intel_context_init(&ve->context, &ve->base);
> > > > +
> > > > +	for (n = 0; n < count; n++) {
> > > > +		struct intel_engine_cs *sibling = siblings[n];
> > > > +
> > > > +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
> > > > +		if (sibling->mask & ve->base.mask) {
> > > > +			DRM_DEBUG("duplicate %s entry in load balancer\n",
> > > > +				  sibling->name);
> > > > +			err = -EINVAL;
> > > > +			goto err_put;
> > > > +		}
> > > > +
> > > > +		ve->base.mask |= sibling->mask;
> > > > +
> > > > +		if (n != 0 && ve->base.class != sibling->class) {
> > > > +			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
> > > > +				  sibling->class, ve->base.class);
> > > > +			err = -EINVAL;
> > > > +			goto err_put;
> > > > +		} else if (n == 0) {
> > > > +			ve->base.class = sibling->class;
> > > > +			ve->base.uabi_class = sibling->uabi_class;
> > > > +			snprintf(ve->base.name, sizeof(ve->base.name),
> > > > +				 "v%dx%d", ve->base.class, count);
> > > > +			ve->base.context_size = sibling->context_size;
> > > > +
> > > > +			ve->base.emit_bb_start = sibling->emit_bb_start;
> > > > +			ve->base.emit_flush = sibling->emit_flush;
> > > > +			ve->base.emit_init_breadcrumb =
> > > > +				sibling->emit_init_breadcrumb;
> > > > +			ve->base.emit_fini_breadcrumb =
> > > > +				sibling->emit_fini_breadcrumb;
> > > > +			ve->base.emit_fini_breadcrumb_dw =
> > > > +				sibling->emit_fini_breadcrumb_dw;
> > > > +
> > > > +			ve->base.flags |= sibling->flags;
> > > > +
> > > > +			ve->base.props.timeslice_duration_ms =
> > > > +				sibling->props.timeslice_duration_ms;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return &ve->context;
> > > > +
> > > > +err_put:
> > > > +	intel_context_put(&ve->context);
> > > > +	return ERR_PTR(err);
> > > > +}
> > > > +
> > > > +
> > > > +
> > > > +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
> > > > +{
> > > > +	struct intel_engine_cs *engine;
> > > > +	intel_engine_mask_t tmp, mask = ve->mask;
> > > > +
> > > > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > > > +		if (READ_ONCE(engine->props.heartbeat_interval_ms))
> > > > +			return true;
> > > > +
> > > > +	return false;
> > > > +}
> > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > > > index 2b9470c90558..5f263ac4f46a 100644
> > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> > > > @@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> > > >    void intel_guc_submission_print_context_info(struct intel_guc *guc,
> > > >    					     struct drm_printer *p);
> > > > +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
> > > > +
> > > >    static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
> > > >    {
> > > >    	/* XXX: GuC submission is unavailable for now */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 19/51] drm/i915/guc: GuC virtual engines
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-19 23:33     ` Daniele Ceraolo Spurio
  -1 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-19 23:33 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: john.c.harrison



On 7/16/2021 1:16 PM, Matthew Brost wrote:
> Implement GuC virtual engines. Rather simple implementation, basically
> just allocate an engine, setup context enter / exit function to virtual
> engine specific functions, set all other variables / functions to guc
> versions, and set the engine mask to that of all the siblings.
>
> v2: Update to work with proto-ctx
>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
>   drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
>   drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
>   .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
>   .../drm/i915/gt/intel_execlists_submission.h  |   4 -
>   drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
>   10 files changed, 308 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 64659802d4df..edefe299bd76 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -74,7 +74,6 @@
>   #include "gt/intel_context_param.h"
>   #include "gt/intel_engine_heartbeat.h"
>   #include "gt/intel_engine_user.h"
> -#include "gt/intel_execlists_submission.h" /* virtual_engine */
>   #include "gt/intel_gpu_commands.h"
>   #include "gt/intel_ring.h"
>   
> @@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
>   	if (!HAS_EXECLISTS(i915))
>   		return -ENODEV;
>   
> -	if (intel_uc_uses_guc_submission(&i915->gt.uc))
> -		return -ENODEV; /* not implement yet */
> -
>   	if (get_user(idx, &ext->engine_index))
>   		return -EFAULT;
>   
> @@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
>   			break;
>   
>   		case I915_GEM_ENGINE_TYPE_BALANCED:
> -			ce = intel_execlists_create_virtual(pe[n].siblings,
> -							    pe[n].num_siblings);
> +			ce = intel_engine_create_virtual(pe[n].siblings,
> +							 pe[n].num_siblings);
>   			break;
>   
>   		case I915_GEM_ENGINE_TYPE_INVALID:
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> index 20411db84914..2639c719a7a6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> @@ -10,6 +10,7 @@
>   #include "i915_gem_context_types.h"
>   
>   #include "gt/intel_context.h"
> +#include "gt/intel_engine.h"

Apologies for the late question, but why do you need this include in 
this header? nothing else is changing within the file.

>   
>   #include "i915_drv.h"
>   #include "i915_gem.h"
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 4a5518d295c2..542c98418771 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -47,6 +47,12 @@ struct intel_context_ops {
>   
>   	void (*reset)(struct intel_context *ce);
>   	void (*destroy)(struct kref *kref);
> +
> +	/* virtual engine/context interface */
> +	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
> +						unsigned int count);
> +	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
> +					       unsigned int sibling);
>   };
>   
>   struct intel_context {
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index f911c1224ab2..9fec0aca5f4b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
>   	return intel_engine_has_preemption(engine);
>   }
>   
> +struct intel_context *
> +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> +			    unsigned int count);
> +
> +static inline bool
> +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
> +{
> +	if (intel_engine_uses_guc(engine))
> +		return intel_guc_virtual_engine_has_heartbeat(engine);
> +	else
> +		GEM_BUG_ON("Only should be called in GuC submission");

I insist that this needs a better explanation, as I've commented on the 
previous rev. Given that this is a shared file, it is not immediately 
evident why this call shouldn't be called with the execlists backend.

Daniele

> +
> +	return false;
> +}
> +
>   static inline bool
>   intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
>   {
>   	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
>   		return false;
>   
> -	return READ_ONCE(engine->props.heartbeat_interval_ms);
> +	if (intel_engine_is_virtual(engine))
> +		return intel_virtual_engine_has_heartbeat(engine);
> +	else
> +		return READ_ONCE(engine->props.heartbeat_interval_ms);
> +}
> +
> +static inline struct intel_engine_cs *
> +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> +{
> +	GEM_BUG_ON(!intel_engine_is_virtual(engine));
> +	return engine->cops->get_sibling(engine, sibling);
>   }
>   
>   #endif /* _INTEL_RINGBUFFER_H_ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index d561573ed98c..b7292d5cb7da 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
>   	return total;
>   }
>   
> +struct intel_context *
> +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> +			    unsigned int count)
> +{
> +	if (count == 0)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (count == 1)
> +		return intel_context_create(siblings[0]);
> +
> +	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
> +	return siblings[0]->cops->create_virtual(siblings, count);
> +}
> +
>   static bool match_ring(struct i915_request *rq)
>   {
>   	u32 ring = ENGINE_READ(rq->engine, RING_START);
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 56e25090da67..28492cdce706 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
>   	return container_of(engine, struct virtual_engine, base);
>   }
>   
> +static struct intel_context *
> +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> +
>   static struct i915_request *
>   __active_request(const struct intel_timeline * const tl,
>   		 struct i915_request *rq,
> @@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
>   
>   	.reset = lrc_reset,
>   	.destroy = lrc_destroy,
> +
> +	.create_virtual = execlists_create_virtual,
>   };
>   
>   static int emit_pdps(struct i915_request *rq)
> @@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
>   		intel_engine_pm_put(ve->siblings[n]);
>   }
>   
> +static struct intel_engine_cs *
> +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(engine);
> +
> +	if (sibling >= ve->num_siblings)
> +		return NULL;
> +
> +	return ve->siblings[sibling];
> +}
> +
>   static const struct intel_context_ops virtual_context_ops = {
>   	.flags = COPS_HAS_INFLIGHT,
>   
> @@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
>   	.exit = virtual_context_exit,
>   
>   	.destroy = virtual_context_destroy,
> +
> +	.get_sibling = virtual_get_sibling,
>   };
>   
>   static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
> @@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
>   	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
>   }
>   
> -struct intel_context *
> -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> -			       unsigned int count)
> +static struct intel_context *
> +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   {
>   	struct virtual_engine *ve;
>   	unsigned int n;
>   	int err;
>   
> -	if (count == 0)
> -		return ERR_PTR(-EINVAL);
> -
> -	if (count == 1)
> -		return intel_context_create(siblings[0]);
> -
>   	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
>   	if (!ve)
>   		return ERR_PTR(-ENOMEM);
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> index ad4f3e1a0fde..a1aa92c983a5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> @@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   							int indent),
>   				   unsigned int max);
>   
> -struct intel_context *
> -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> -			       unsigned int count);
> -
>   bool
>   intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
>   
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 73ddc6e14730..59cf8afc6d6f 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
>   	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
>   
>   	for (n = 0; n < nctx; n++) {
> -		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
> +		ve[n] = intel_engine_create_virtual(siblings, nsibling);
>   		if (IS_ERR(ve[n])) {
>   			err = PTR_ERR(ve[n]);
>   			nctx = n;
> @@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
>   	 * restrict it to our desired engine within the virtual engine.
>   	 */
>   
> -	ve = intel_execlists_create_virtual(siblings, nsibling);
> +	ve = intel_engine_create_virtual(siblings, nsibling);
>   	if (IS_ERR(ve)) {
>   		err = PTR_ERR(ve);
>   		goto out_close;
> @@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
>   		i915_request_add(rq);
>   	}
>   
> -	ce = intel_execlists_create_virtual(siblings, nsibling);
> +	ce = intel_engine_create_virtual(siblings, nsibling);
>   	if (IS_ERR(ce)) {
>   		err = PTR_ERR(ce);
>   		goto out;
> @@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
>   
>   	/* XXX We do not handle oversubscription and fairness with normal rq */
>   	for (n = 0; n < nsibling; n++) {
> -		ce = intel_execlists_create_virtual(siblings, nsibling);
> +		ce = intel_engine_create_virtual(siblings, nsibling);
>   		if (IS_ERR(ce)) {
>   			err = PTR_ERR(ce);
>   			goto out;
> @@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
>   	if (err)
>   		goto out_scratch;
>   
> -	ve = intel_execlists_create_virtual(siblings, nsibling);
> +	ve = intel_engine_create_virtual(siblings, nsibling);
>   	if (IS_ERR(ve)) {
>   		err = PTR_ERR(ve);
>   		goto out_scratch;
> @@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
>   	if (igt_spinner_init(&spin, gt))
>   		return -ENOMEM;
>   
> -	ve = intel_execlists_create_virtual(siblings, nsibling);
> +	ve = intel_engine_create_virtual(siblings, nsibling);
>   	if (IS_ERR(ve)) {
>   		err = PTR_ERR(ve);
>   		goto out_spin;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 05958260e849..7b3e1c91e689 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -60,6 +60,15 @@
>    *
>    */
>   
> +/* GuC Virtual Engine */
> +struct guc_virtual_engine {
> +	struct intel_engine_cs base;
> +	struct intel_context context;
> +};
> +
> +static struct intel_context *
> +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> +
>   #define GUC_REQUEST_SIZE 64 /* bytes */
>   
>   /*
> @@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
>   	return ret;
>   }
>   
> -static int guc_context_pre_pin(struct intel_context *ce,
> -			       struct i915_gem_ww_ctx *ww,
> -			       void **vaddr)
> +static int __guc_context_pre_pin(struct intel_context *ce,
> +				 struct intel_engine_cs *engine,
> +				 struct i915_gem_ww_ctx *ww,
> +				 void **vaddr)
>   {
> -	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
> +	return lrc_pre_pin(ce, engine, ww, vaddr);
>   }
>   
> -static int guc_context_pin(struct intel_context *ce, void *vaddr)
> +static int __guc_context_pin(struct intel_context *ce,
> +			     struct intel_engine_cs *engine,
> +			     void *vaddr)
>   {
>   	if (i915_ggtt_offset(ce->state) !=
>   	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
>   		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>   
> -	return lrc_pin(ce, ce->engine, vaddr);
> +	return lrc_pin(ce, engine, vaddr);
> +}
> +
> +static int guc_context_pre_pin(struct intel_context *ce,
> +			       struct i915_gem_ww_ctx *ww,
> +			       void **vaddr)
> +{
> +	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
> +}
> +
> +static int guc_context_pin(struct intel_context *ce, void *vaddr)
> +{
> +	return __guc_context_pin(ce, ce->engine, vaddr);
>   }
>   
>   static void guc_context_unpin(struct intel_context *ce)
> @@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>   	deregister_context(ce, ce->guc_id);
>   }
>   
> +static void __guc_context_destroy(struct intel_context *ce)
> +{
> +	lrc_fini(ce);
> +	intel_context_fini(ce);
> +
> +	if (intel_engine_is_virtual(ce->engine)) {
> +		struct guc_virtual_engine *ve =
> +			container_of(ce, typeof(*ve), context);
> +
> +		kfree(ve);
> +	} else {
> +		intel_context_free(ce);
> +	}
> +}
> +
>   static void guc_context_destroy(struct kref *kref)
>   {
>   	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> @@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
>   	if (context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
>   		release_guc_id(guc, ce);
> -		lrc_destroy(kref);
> +		__guc_context_destroy(ce);
>   		return;
>   	}
>   
> @@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
>   	if (context_guc_id_invalid(ce)) {
>   		__release_guc_id(guc, ce);
>   		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> -		lrc_destroy(kref);
> +		__guc_context_destroy(ce);
>   		return;
>   	}
>   
> @@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
>   
>   	.reset = lrc_reset,
>   	.destroy = guc_context_destroy,
> +
> +	.create_virtual = guc_create_virtual,
>   };
>   
>   static void __guc_signal_context_fence(struct intel_context *ce)
> @@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
>   	return 0;
>   }
>   
> +static struct intel_engine_cs *
> +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> +{
> +	struct intel_engine_cs *engine;
> +	intel_engine_mask_t tmp, mask = ve->mask;
> +	unsigned int num_siblings = 0;
> +
> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> +		if (num_siblings++ == sibling)
> +			return engine;
> +
> +	return NULL;
> +}
> +
> +static int guc_virtual_context_pre_pin(struct intel_context *ce,
> +				       struct i915_gem_ww_ctx *ww,
> +				       void **vaddr)
> +{
> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> +
> +	return __guc_context_pre_pin(ce, engine, ww, vaddr);
> +}
> +
> +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
> +{
> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> +
> +	return __guc_context_pin(ce, engine, vaddr);
> +}
> +
> +static void guc_virtual_context_enter(struct intel_context *ce)
> +{
> +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> +	struct intel_engine_cs *engine;
> +
> +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> +		intel_engine_pm_get(engine);
> +
> +	intel_timeline_enter(ce->timeline);
> +}
> +
> +static void guc_virtual_context_exit(struct intel_context *ce)
> +{
> +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> +	struct intel_engine_cs *engine;
> +
> +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> +		intel_engine_pm_put(engine);
> +
> +	intel_timeline_exit(ce->timeline);
> +}
> +
> +static int guc_virtual_context_alloc(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> +
> +	return lrc_alloc(ce, engine);
> +}
> +
> +static const struct intel_context_ops virtual_guc_context_ops = {
> +	.alloc = guc_virtual_context_alloc,
> +
> +	.pre_pin = guc_virtual_context_pre_pin,
> +	.pin = guc_virtual_context_pin,
> +	.unpin = guc_context_unpin,
> +	.post_unpin = guc_context_post_unpin,
> +
> +	.enter = guc_virtual_context_enter,
> +	.exit = guc_virtual_context_exit,
> +
> +	.sched_disable = guc_context_sched_disable,
> +
> +	.destroy = guc_context_destroy,
> +
> +	.get_sibling = guc_virtual_get_sibling,
> +};
> +
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
>   {
>   	struct intel_timeline *tl;
> @@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   	} else if (context_destroyed(ce)) {
>   		/* Context has been destroyed */
>   		release_guc_id(guc, ce);
> -		lrc_destroy(&ce->ref);
> +		__guc_context_destroy(ce);
>   	}
>   
>   	decr_outstanding_submission_g2h(guc);
> @@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
>   			   atomic_read(&ce->guc_sched_state_no_lock));
>   	}
>   }
> +
> +static struct intel_context *
> +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> +{
> +	struct guc_virtual_engine *ve;
> +	struct intel_guc *guc;
> +	unsigned int n;
> +	int err;
> +
> +	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
> +	if (!ve)
> +		return ERR_PTR(-ENOMEM);
> +
> +	guc = &siblings[0]->gt->uc.guc;
> +
> +	ve->base.i915 = siblings[0]->i915;
> +	ve->base.gt = siblings[0]->gt;
> +	ve->base.uncore = siblings[0]->uncore;
> +	ve->base.id = -1;
> +
> +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> +	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> +	ve->base.saturated = ALL_ENGINES;
> +	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> +	if (!ve->base.breadcrumbs) {
> +		kfree(ve);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> +
> +	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
> +
> +	ve->base.cops = &virtual_guc_context_ops;
> +	ve->base.request_alloc = guc_request_alloc;
> +
> +	ve->base.submit_request = guc_submit_request;
> +
> +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> +
> +	intel_context_init(&ve->context, &ve->base);
> +
> +	for (n = 0; n < count; n++) {
> +		struct intel_engine_cs *sibling = siblings[n];
> +
> +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
> +		if (sibling->mask & ve->base.mask) {
> +			DRM_DEBUG("duplicate %s entry in load balancer\n",
> +				  sibling->name);
> +			err = -EINVAL;
> +			goto err_put;
> +		}
> +
> +		ve->base.mask |= sibling->mask;
> +
> +		if (n != 0 && ve->base.class != sibling->class) {
> +			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
> +				  sibling->class, ve->base.class);
> +			err = -EINVAL;
> +			goto err_put;
> +		} else if (n == 0) {
> +			ve->base.class = sibling->class;
> +			ve->base.uabi_class = sibling->uabi_class;
> +			snprintf(ve->base.name, sizeof(ve->base.name),
> +				 "v%dx%d", ve->base.class, count);
> +			ve->base.context_size = sibling->context_size;
> +
> +			ve->base.emit_bb_start = sibling->emit_bb_start;
> +			ve->base.emit_flush = sibling->emit_flush;
> +			ve->base.emit_init_breadcrumb =
> +				sibling->emit_init_breadcrumb;
> +			ve->base.emit_fini_breadcrumb =
> +				sibling->emit_fini_breadcrumb;
> +			ve->base.emit_fini_breadcrumb_dw =
> +				sibling->emit_fini_breadcrumb_dw;
> +
> +			ve->base.flags |= sibling->flags;
> +
> +			ve->base.props.timeslice_duration_ms =
> +				sibling->props.timeslice_duration_ms;
> +		}
> +	}
> +
> +	return &ve->context;
> +
> +err_put:
> +	intel_context_put(&ve->context);
> +	return ERR_PTR(err);
> +}
> +
> +
> +
> +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
> +{
> +	struct intel_engine_cs *engine;
> +	intel_engine_mask_t tmp, mask = ve->mask;
> +
> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> +		if (READ_ONCE(engine->props.heartbeat_interval_ms))
> +			return true;
> +
> +	return false;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index 2b9470c90558..5f263ac4f46a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
>   void intel_guc_submission_print_context_info(struct intel_guc *guc,
>   					     struct drm_printer *p);
>   
> +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
> +
>   static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
>   {
>   	/* XXX: GuC submission is unavailable for now */


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 19/51] drm/i915/guc: GuC virtual engines
@ 2021-07-19 23:33     ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-19 23:33 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 7/16/2021 1:16 PM, Matthew Brost wrote:
> Implement GuC virtual engines. Rather simple implementation, basically
> just allocate an engine, setup context enter / exit function to virtual
> engine specific functions, set all other variables / functions to guc
> versions, and set the engine mask to that of all the siblings.
>
> v2: Update to work with proto-ctx
>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
>   drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
>   drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
>   .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
>   .../drm/i915/gt/intel_execlists_submission.h  |   4 -
>   drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
>   10 files changed, 308 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 64659802d4df..edefe299bd76 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -74,7 +74,6 @@
>   #include "gt/intel_context_param.h"
>   #include "gt/intel_engine_heartbeat.h"
>   #include "gt/intel_engine_user.h"
> -#include "gt/intel_execlists_submission.h" /* virtual_engine */
>   #include "gt/intel_gpu_commands.h"
>   #include "gt/intel_ring.h"
>   
> @@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
>   	if (!HAS_EXECLISTS(i915))
>   		return -ENODEV;
>   
> -	if (intel_uc_uses_guc_submission(&i915->gt.uc))
> -		return -ENODEV; /* not implement yet */
> -
>   	if (get_user(idx, &ext->engine_index))
>   		return -EFAULT;
>   
> @@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
>   			break;
>   
>   		case I915_GEM_ENGINE_TYPE_BALANCED:
> -			ce = intel_execlists_create_virtual(pe[n].siblings,
> -							    pe[n].num_siblings);
> +			ce = intel_engine_create_virtual(pe[n].siblings,
> +							 pe[n].num_siblings);
>   			break;
>   
>   		case I915_GEM_ENGINE_TYPE_INVALID:
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> index 20411db84914..2639c719a7a6 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> @@ -10,6 +10,7 @@
>   #include "i915_gem_context_types.h"
>   
>   #include "gt/intel_context.h"
> +#include "gt/intel_engine.h"

Apologies for the late question, but why do you need this include in 
this header? nothing else is changing within the file.

>   
>   #include "i915_drv.h"
>   #include "i915_gem.h"
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 4a5518d295c2..542c98418771 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -47,6 +47,12 @@ struct intel_context_ops {
>   
>   	void (*reset)(struct intel_context *ce);
>   	void (*destroy)(struct kref *kref);
> +
> +	/* virtual engine/context interface */
> +	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
> +						unsigned int count);
> +	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
> +					       unsigned int sibling);
>   };
>   
>   struct intel_context {
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index f911c1224ab2..9fec0aca5f4b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
>   	return intel_engine_has_preemption(engine);
>   }
>   
> +struct intel_context *
> +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> +			    unsigned int count);
> +
> +static inline bool
> +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
> +{
> +	if (intel_engine_uses_guc(engine))
> +		return intel_guc_virtual_engine_has_heartbeat(engine);
> +	else
> +		GEM_BUG_ON("Only should be called in GuC submission");

I insist that this needs a better explanation, as I've commented on the 
previous rev. Given that this is a shared file, it is not immediately 
evident why this call shouldn't be called with the execlists backend.

Daniele

> +
> +	return false;
> +}
> +
>   static inline bool
>   intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
>   {
>   	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
>   		return false;
>   
> -	return READ_ONCE(engine->props.heartbeat_interval_ms);
> +	if (intel_engine_is_virtual(engine))
> +		return intel_virtual_engine_has_heartbeat(engine);
> +	else
> +		return READ_ONCE(engine->props.heartbeat_interval_ms);
> +}
> +
> +static inline struct intel_engine_cs *
> +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> +{
> +	GEM_BUG_ON(!intel_engine_is_virtual(engine));
> +	return engine->cops->get_sibling(engine, sibling);
>   }
>   
>   #endif /* _INTEL_RINGBUFFER_H_ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index d561573ed98c..b7292d5cb7da 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
>   	return total;
>   }
>   
> +struct intel_context *
> +intel_engine_create_virtual(struct intel_engine_cs **siblings,
> +			    unsigned int count)
> +{
> +	if (count == 0)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (count == 1)
> +		return intel_context_create(siblings[0]);
> +
> +	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
> +	return siblings[0]->cops->create_virtual(siblings, count);
> +}
> +
>   static bool match_ring(struct i915_request *rq)
>   {
>   	u32 ring = ENGINE_READ(rq->engine, RING_START);
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 56e25090da67..28492cdce706 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
>   	return container_of(engine, struct virtual_engine, base);
>   }
>   
> +static struct intel_context *
> +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> +
>   static struct i915_request *
>   __active_request(const struct intel_timeline * const tl,
>   		 struct i915_request *rq,
> @@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
>   
>   	.reset = lrc_reset,
>   	.destroy = lrc_destroy,
> +
> +	.create_virtual = execlists_create_virtual,
>   };
>   
>   static int emit_pdps(struct i915_request *rq)
> @@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
>   		intel_engine_pm_put(ve->siblings[n]);
>   }
>   
> +static struct intel_engine_cs *
> +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(engine);
> +
> +	if (sibling >= ve->num_siblings)
> +		return NULL;
> +
> +	return ve->siblings[sibling];
> +}
> +
>   static const struct intel_context_ops virtual_context_ops = {
>   	.flags = COPS_HAS_INFLIGHT,
>   
> @@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
>   	.exit = virtual_context_exit,
>   
>   	.destroy = virtual_context_destroy,
> +
> +	.get_sibling = virtual_get_sibling,
>   };
>   
>   static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
> @@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
>   	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
>   }
>   
> -struct intel_context *
> -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> -			       unsigned int count)
> +static struct intel_context *
> +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   {
>   	struct virtual_engine *ve;
>   	unsigned int n;
>   	int err;
>   
> -	if (count == 0)
> -		return ERR_PTR(-EINVAL);
> -
> -	if (count == 1)
> -		return intel_context_create(siblings[0]);
> -
>   	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
>   	if (!ve)
>   		return ERR_PTR(-ENOMEM);
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> index ad4f3e1a0fde..a1aa92c983a5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
> @@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   							int indent),
>   				   unsigned int max);
>   
> -struct intel_context *
> -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
> -			       unsigned int count);
> -
>   bool
>   intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
>   
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 73ddc6e14730..59cf8afc6d6f 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
>   	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
>   
>   	for (n = 0; n < nctx; n++) {
> -		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
> +		ve[n] = intel_engine_create_virtual(siblings, nsibling);
>   		if (IS_ERR(ve[n])) {
>   			err = PTR_ERR(ve[n]);
>   			nctx = n;
> @@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
>   	 * restrict it to our desired engine within the virtual engine.
>   	 */
>   
> -	ve = intel_execlists_create_virtual(siblings, nsibling);
> +	ve = intel_engine_create_virtual(siblings, nsibling);
>   	if (IS_ERR(ve)) {
>   		err = PTR_ERR(ve);
>   		goto out_close;
> @@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
>   		i915_request_add(rq);
>   	}
>   
> -	ce = intel_execlists_create_virtual(siblings, nsibling);
> +	ce = intel_engine_create_virtual(siblings, nsibling);
>   	if (IS_ERR(ce)) {
>   		err = PTR_ERR(ce);
>   		goto out;
> @@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
>   
>   	/* XXX We do not handle oversubscription and fairness with normal rq */
>   	for (n = 0; n < nsibling; n++) {
> -		ce = intel_execlists_create_virtual(siblings, nsibling);
> +		ce = intel_engine_create_virtual(siblings, nsibling);
>   		if (IS_ERR(ce)) {
>   			err = PTR_ERR(ce);
>   			goto out;
> @@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
>   	if (err)
>   		goto out_scratch;
>   
> -	ve = intel_execlists_create_virtual(siblings, nsibling);
> +	ve = intel_engine_create_virtual(siblings, nsibling);
>   	if (IS_ERR(ve)) {
>   		err = PTR_ERR(ve);
>   		goto out_scratch;
> @@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
>   	if (igt_spinner_init(&spin, gt))
>   		return -ENOMEM;
>   
> -	ve = intel_execlists_create_virtual(siblings, nsibling);
> +	ve = intel_engine_create_virtual(siblings, nsibling);
>   	if (IS_ERR(ve)) {
>   		err = PTR_ERR(ve);
>   		goto out_spin;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 05958260e849..7b3e1c91e689 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -60,6 +60,15 @@
>    *
>    */
>   
> +/* GuC Virtual Engine */
> +struct guc_virtual_engine {
> +	struct intel_engine_cs base;
> +	struct intel_context context;
> +};
> +
> +static struct intel_context *
> +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> +
>   #define GUC_REQUEST_SIZE 64 /* bytes */
>   
>   /*
> @@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
>   	return ret;
>   }
>   
> -static int guc_context_pre_pin(struct intel_context *ce,
> -			       struct i915_gem_ww_ctx *ww,
> -			       void **vaddr)
> +static int __guc_context_pre_pin(struct intel_context *ce,
> +				 struct intel_engine_cs *engine,
> +				 struct i915_gem_ww_ctx *ww,
> +				 void **vaddr)
>   {
> -	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
> +	return lrc_pre_pin(ce, engine, ww, vaddr);
>   }
>   
> -static int guc_context_pin(struct intel_context *ce, void *vaddr)
> +static int __guc_context_pin(struct intel_context *ce,
> +			     struct intel_engine_cs *engine,
> +			     void *vaddr)
>   {
>   	if (i915_ggtt_offset(ce->state) !=
>   	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
>   		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>   
> -	return lrc_pin(ce, ce->engine, vaddr);
> +	return lrc_pin(ce, engine, vaddr);
> +}
> +
> +static int guc_context_pre_pin(struct intel_context *ce,
> +			       struct i915_gem_ww_ctx *ww,
> +			       void **vaddr)
> +{
> +	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
> +}
> +
> +static int guc_context_pin(struct intel_context *ce, void *vaddr)
> +{
> +	return __guc_context_pin(ce, ce->engine, vaddr);
>   }
>   
>   static void guc_context_unpin(struct intel_context *ce)
> @@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>   	deregister_context(ce, ce->guc_id);
>   }
>   
> +static void __guc_context_destroy(struct intel_context *ce)
> +{
> +	lrc_fini(ce);
> +	intel_context_fini(ce);
> +
> +	if (intel_engine_is_virtual(ce->engine)) {
> +		struct guc_virtual_engine *ve =
> +			container_of(ce, typeof(*ve), context);
> +
> +		kfree(ve);
> +	} else {
> +		intel_context_free(ce);
> +	}
> +}
> +
>   static void guc_context_destroy(struct kref *kref)
>   {
>   	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> @@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
>   	if (context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
>   		release_guc_id(guc, ce);
> -		lrc_destroy(kref);
> +		__guc_context_destroy(ce);
>   		return;
>   	}
>   
> @@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
>   	if (context_guc_id_invalid(ce)) {
>   		__release_guc_id(guc, ce);
>   		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> -		lrc_destroy(kref);
> +		__guc_context_destroy(ce);
>   		return;
>   	}
>   
> @@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
>   
>   	.reset = lrc_reset,
>   	.destroy = guc_context_destroy,
> +
> +	.create_virtual = guc_create_virtual,
>   };
>   
>   static void __guc_signal_context_fence(struct intel_context *ce)
> @@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
>   	return 0;
>   }
>   
> +static struct intel_engine_cs *
> +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> +{
> +	struct intel_engine_cs *engine;
> +	intel_engine_mask_t tmp, mask = ve->mask;
> +	unsigned int num_siblings = 0;
> +
> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> +		if (num_siblings++ == sibling)
> +			return engine;
> +
> +	return NULL;
> +}
> +
> +static int guc_virtual_context_pre_pin(struct intel_context *ce,
> +				       struct i915_gem_ww_ctx *ww,
> +				       void **vaddr)
> +{
> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> +
> +	return __guc_context_pre_pin(ce, engine, ww, vaddr);
> +}
> +
> +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
> +{
> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> +
> +	return __guc_context_pin(ce, engine, vaddr);
> +}
> +
> +static void guc_virtual_context_enter(struct intel_context *ce)
> +{
> +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> +	struct intel_engine_cs *engine;
> +
> +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> +		intel_engine_pm_get(engine);
> +
> +	intel_timeline_enter(ce->timeline);
> +}
> +
> +static void guc_virtual_context_exit(struct intel_context *ce)
> +{
> +	intel_engine_mask_t tmp, mask = ce->engine->mask;
> +	struct intel_engine_cs *engine;
> +
> +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
> +		intel_engine_pm_put(engine);
> +
> +	intel_timeline_exit(ce->timeline);
> +}
> +
> +static int guc_virtual_context_alloc(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
> +
> +	return lrc_alloc(ce, engine);
> +}
> +
> +static const struct intel_context_ops virtual_guc_context_ops = {
> +	.alloc = guc_virtual_context_alloc,
> +
> +	.pre_pin = guc_virtual_context_pre_pin,
> +	.pin = guc_virtual_context_pin,
> +	.unpin = guc_context_unpin,
> +	.post_unpin = guc_context_post_unpin,
> +
> +	.enter = guc_virtual_context_enter,
> +	.exit = guc_virtual_context_exit,
> +
> +	.sched_disable = guc_context_sched_disable,
> +
> +	.destroy = guc_context_destroy,
> +
> +	.get_sibling = guc_virtual_get_sibling,
> +};
> +
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
>   {
>   	struct intel_timeline *tl;
> @@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   	} else if (context_destroyed(ce)) {
>   		/* Context has been destroyed */
>   		release_guc_id(guc, ce);
> -		lrc_destroy(&ce->ref);
> +		__guc_context_destroy(ce);
>   	}
>   
>   	decr_outstanding_submission_g2h(guc);
> @@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
>   			   atomic_read(&ce->guc_sched_state_no_lock));
>   	}
>   }
> +
> +static struct intel_context *
> +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> +{
> +	struct guc_virtual_engine *ve;
> +	struct intel_guc *guc;
> +	unsigned int n;
> +	int err;
> +
> +	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
> +	if (!ve)
> +		return ERR_PTR(-ENOMEM);
> +
> +	guc = &siblings[0]->gt->uc.guc;
> +
> +	ve->base.i915 = siblings[0]->i915;
> +	ve->base.gt = siblings[0]->gt;
> +	ve->base.uncore = siblings[0]->uncore;
> +	ve->base.id = -1;
> +
> +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> +	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> +	ve->base.saturated = ALL_ENGINES;
> +	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> +	if (!ve->base.breadcrumbs) {
> +		kfree(ve);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> +
> +	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
> +
> +	ve->base.cops = &virtual_guc_context_ops;
> +	ve->base.request_alloc = guc_request_alloc;
> +
> +	ve->base.submit_request = guc_submit_request;
> +
> +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> +
> +	intel_context_init(&ve->context, &ve->base);
> +
> +	for (n = 0; n < count; n++) {
> +		struct intel_engine_cs *sibling = siblings[n];
> +
> +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
> +		if (sibling->mask & ve->base.mask) {
> +			DRM_DEBUG("duplicate %s entry in load balancer\n",
> +				  sibling->name);
> +			err = -EINVAL;
> +			goto err_put;
> +		}
> +
> +		ve->base.mask |= sibling->mask;
> +
> +		if (n != 0 && ve->base.class != sibling->class) {
> +			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
> +				  sibling->class, ve->base.class);
> +			err = -EINVAL;
> +			goto err_put;
> +		} else if (n == 0) {
> +			ve->base.class = sibling->class;
> +			ve->base.uabi_class = sibling->uabi_class;
> +			snprintf(ve->base.name, sizeof(ve->base.name),
> +				 "v%dx%d", ve->base.class, count);
> +			ve->base.context_size = sibling->context_size;
> +
> +			ve->base.emit_bb_start = sibling->emit_bb_start;
> +			ve->base.emit_flush = sibling->emit_flush;
> +			ve->base.emit_init_breadcrumb =
> +				sibling->emit_init_breadcrumb;
> +			ve->base.emit_fini_breadcrumb =
> +				sibling->emit_fini_breadcrumb;
> +			ve->base.emit_fini_breadcrumb_dw =
> +				sibling->emit_fini_breadcrumb_dw;
> +
> +			ve->base.flags |= sibling->flags;
> +
> +			ve->base.props.timeslice_duration_ms =
> +				sibling->props.timeslice_duration_ms;
> +		}
> +	}
> +
> +	return &ve->context;
> +
> +err_put:
> +	intel_context_put(&ve->context);
> +	return ERR_PTR(err);
> +}
> +
> +
> +
> +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
> +{
> +	struct intel_engine_cs *engine;
> +	intel_engine_mask_t tmp, mask = ve->mask;
> +
> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> +		if (READ_ONCE(engine->props.heartbeat_interval_ms))
> +			return true;
> +
> +	return false;
> +}
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index 2b9470c90558..5f263ac4f46a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
>   void intel_guc_submission_print_context_info(struct intel_guc *guc,
>   					     struct drm_printer *p);
>   
> +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
> +
>   static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
>   {
>   	/* XXX: GuC submission is unavailable for now */

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 19/51] drm/i915/guc: GuC virtual engines
  2021-07-19 23:27       ` [Intel-gfx] " Matthew Brost
@ 2021-07-19 23:42         ` Daniele Ceraolo Spurio
  -1 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-19 23:42 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, john.c.harrison, dri-devel



On 7/19/2021 4:27 PM, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 04:33:56PM -0700, Daniele Ceraolo Spurio wrote:
>>
>> On 7/16/2021 1:16 PM, Matthew Brost wrote:
>>> Implement GuC virtual engines. Rather simple implementation, basically
>>> just allocate an engine, setup context enter / exit function to virtual
>>> engine specific functions, set all other variables / functions to guc
>>> versions, and set the engine mask to that of all the siblings.
>>>
>>> v2: Update to work with proto-ctx
>>>
>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
>>>    drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
>>>    drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
>>>    drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
>>>    drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
>>>    .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
>>>    .../drm/i915/gt/intel_execlists_submission.h  |   4 -
>>>    drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
>>>    10 files changed, 308 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> index 64659802d4df..edefe299bd76 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> @@ -74,7 +74,6 @@
>>>    #include "gt/intel_context_param.h"
>>>    #include "gt/intel_engine_heartbeat.h"
>>>    #include "gt/intel_engine_user.h"
>>> -#include "gt/intel_execlists_submission.h" /* virtual_engine */
>>>    #include "gt/intel_gpu_commands.h"
>>>    #include "gt/intel_ring.h"
>>> @@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
>>>    	if (!HAS_EXECLISTS(i915))
>>>    		return -ENODEV;
>>> -	if (intel_uc_uses_guc_submission(&i915->gt.uc))
>>> -		return -ENODEV; /* not implement yet */
>>> -
>>>    	if (get_user(idx, &ext->engine_index))
>>>    		return -EFAULT;
>>> @@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
>>>    			break;
>>>    		case I915_GEM_ENGINE_TYPE_BALANCED:
>>> -			ce = intel_execlists_create_virtual(pe[n].siblings,
>>> -							    pe[n].num_siblings);
>>> +			ce = intel_engine_create_virtual(pe[n].siblings,
>>> +							 pe[n].num_siblings);
>>>    			break;
>>>    		case I915_GEM_ENGINE_TYPE_INVALID:
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>> index 20411db84914..2639c719a7a6 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>> @@ -10,6 +10,7 @@
>>>    #include "i915_gem_context_types.h"
>>>    #include "gt/intel_context.h"
>>> +#include "gt/intel_engine.h"
>> Apologies for the late question, but why do you need this include in this
>> header? nothing else is changing within the file.
>>
> It likely doesn't need to be included. Let me see what the build / CI
> says if I remove it.
>
>>>    #include "i915_drv.h"
>>>    #include "i915_gem.h"
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> index 4a5518d295c2..542c98418771 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> @@ -47,6 +47,12 @@ struct intel_context_ops {
>>>    	void (*reset)(struct intel_context *ce);
>>>    	void (*destroy)(struct kref *kref);
>>> +
>>> +	/* virtual engine/context interface */
>>> +	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
>>> +						unsigned int count);
>>> +	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
>>> +					       unsigned int sibling);
>>>    };
>>>    struct intel_context {
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
>>> index f911c1224ab2..9fec0aca5f4b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
>>> @@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
>>>    	return intel_engine_has_preemption(engine);
>>>    }
>>> +struct intel_context *
>>> +intel_engine_create_virtual(struct intel_engine_cs **siblings,
>>> +			    unsigned int count);
>>> +
>>> +static inline bool
>>> +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
>>> +{
>>> +	if (intel_engine_uses_guc(engine))
>>> +		return intel_guc_virtual_engine_has_heartbeat(engine);
>>> +	else
>>> +		GEM_BUG_ON("Only should be called in GuC submission");
>> I insist that this needs a better explanation, as I've commented on the
>> previous rev. Given that this is a shared file, it is not immediately
>> evident why this call shouldn't be called with the execlists backend.
>>
> Sure, can add a comment. Basically it comes down to it is always called
> an active request which in execlists mode is always a physical engine.

I had added a sample comment in the previous review in case you want to 
copy it. I remember you explaining this to me a while back, but it 
always takes me a moment to recall it when I look at this code and I 
expect someone not familiar with the backend implementation would be 
even more confused.

Daniele

>
> Matt
>
>> Daniele
>>
>>> +
>>> +	return false;
>>> +}
>>> +
>>>    static inline bool
>>>    intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
>>>    {
>>>    	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
>>>    		return false;
>>> -	return READ_ONCE(engine->props.heartbeat_interval_ms);
>>> +	if (intel_engine_is_virtual(engine))
>>> +		return intel_virtual_engine_has_heartbeat(engine);
>>> +	else
>>> +		return READ_ONCE(engine->props.heartbeat_interval_ms);
>>> +}
>>> +
>>> +static inline struct intel_engine_cs *
>>> +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
>>> +{
>>> +	GEM_BUG_ON(!intel_engine_is_virtual(engine));
>>> +	return engine->cops->get_sibling(engine, sibling);
>>>    }
>>>    #endif /* _INTEL_RINGBUFFER_H_ */
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> index d561573ed98c..b7292d5cb7da 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> @@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
>>>    	return total;
>>>    }
>>> +struct intel_context *
>>> +intel_engine_create_virtual(struct intel_engine_cs **siblings,
>>> +			    unsigned int count)
>>> +{
>>> +	if (count == 0)
>>> +		return ERR_PTR(-EINVAL);
>>> +
>>> +	if (count == 1)
>>> +		return intel_context_create(siblings[0]);
>>> +
>>> +	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
>>> +	return siblings[0]->cops->create_virtual(siblings, count);
>>> +}
>>> +
>>>    static bool match_ring(struct i915_request *rq)
>>>    {
>>>    	u32 ring = ENGINE_READ(rq->engine, RING_START);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> index 56e25090da67..28492cdce706 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> @@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
>>>    	return container_of(engine, struct virtual_engine, base);
>>>    }
>>> +static struct intel_context *
>>> +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
>>> +
>>>    static struct i915_request *
>>>    __active_request(const struct intel_timeline * const tl,
>>>    		 struct i915_request *rq,
>>> @@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
>>>    	.reset = lrc_reset,
>>>    	.destroy = lrc_destroy,
>>> +
>>> +	.create_virtual = execlists_create_virtual,
>>>    };
>>>    static int emit_pdps(struct i915_request *rq)
>>> @@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
>>>    		intel_engine_pm_put(ve->siblings[n]);
>>>    }
>>> +static struct intel_engine_cs *
>>> +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
>>> +{
>>> +	struct virtual_engine *ve = to_virtual_engine(engine);
>>> +
>>> +	if (sibling >= ve->num_siblings)
>>> +		return NULL;
>>> +
>>> +	return ve->siblings[sibling];
>>> +}
>>> +
>>>    static const struct intel_context_ops virtual_context_ops = {
>>>    	.flags = COPS_HAS_INFLIGHT,
>>> @@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
>>>    	.exit = virtual_context_exit,
>>>    	.destroy = virtual_context_destroy,
>>> +
>>> +	.get_sibling = virtual_get_sibling,
>>>    };
>>>    static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
>>> @@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
>>>    	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
>>>    }
>>> -struct intel_context *
>>> -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>>> -			       unsigned int count)
>>> +static struct intel_context *
>>> +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>>>    {
>>>    	struct virtual_engine *ve;
>>>    	unsigned int n;
>>>    	int err;
>>> -	if (count == 0)
>>> -		return ERR_PTR(-EINVAL);
>>> -
>>> -	if (count == 1)
>>> -		return intel_context_create(siblings[0]);
>>> -
>>>    	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
>>>    	if (!ve)
>>>    		return ERR_PTR(-ENOMEM);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
>>> index ad4f3e1a0fde..a1aa92c983a5 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
>>> @@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>>>    							int indent),
>>>    				   unsigned int max);
>>> -struct intel_context *
>>> -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>>> -			       unsigned int count);
>>> -
>>>    bool
>>>    intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>> index 73ddc6e14730..59cf8afc6d6f 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>> @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
>>>    	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
>>>    	for (n = 0; n < nctx; n++) {
>>> -		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
>>> +		ve[n] = intel_engine_create_virtual(siblings, nsibling);
>>>    		if (IS_ERR(ve[n])) {
>>>    			err = PTR_ERR(ve[n]);
>>>    			nctx = n;
>>> @@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
>>>    	 * restrict it to our desired engine within the virtual engine.
>>>    	 */
>>> -	ve = intel_execlists_create_virtual(siblings, nsibling);
>>> +	ve = intel_engine_create_virtual(siblings, nsibling);
>>>    	if (IS_ERR(ve)) {
>>>    		err = PTR_ERR(ve);
>>>    		goto out_close;
>>> @@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
>>>    		i915_request_add(rq);
>>>    	}
>>> -	ce = intel_execlists_create_virtual(siblings, nsibling);
>>> +	ce = intel_engine_create_virtual(siblings, nsibling);
>>>    	if (IS_ERR(ce)) {
>>>    		err = PTR_ERR(ce);
>>>    		goto out;
>>> @@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
>>>    	/* XXX We do not handle oversubscription and fairness with normal rq */
>>>    	for (n = 0; n < nsibling; n++) {
>>> -		ce = intel_execlists_create_virtual(siblings, nsibling);
>>> +		ce = intel_engine_create_virtual(siblings, nsibling);
>>>    		if (IS_ERR(ce)) {
>>>    			err = PTR_ERR(ce);
>>>    			goto out;
>>> @@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
>>>    	if (err)
>>>    		goto out_scratch;
>>> -	ve = intel_execlists_create_virtual(siblings, nsibling);
>>> +	ve = intel_engine_create_virtual(siblings, nsibling);
>>>    	if (IS_ERR(ve)) {
>>>    		err = PTR_ERR(ve);
>>>    		goto out_scratch;
>>> @@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
>>>    	if (igt_spinner_init(&spin, gt))
>>>    		return -ENOMEM;
>>> -	ve = intel_execlists_create_virtual(siblings, nsibling);
>>> +	ve = intel_engine_create_virtual(siblings, nsibling);
>>>    	if (IS_ERR(ve)) {
>>>    		err = PTR_ERR(ve);
>>>    		goto out_spin;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index 05958260e849..7b3e1c91e689 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -60,6 +60,15 @@
>>>     *
>>>     */
>>> +/* GuC Virtual Engine */
>>> +struct guc_virtual_engine {
>>> +	struct intel_engine_cs base;
>>> +	struct intel_context context;
>>> +};
>>> +
>>> +static struct intel_context *
>>> +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
>>> +
>>>    #define GUC_REQUEST_SIZE 64 /* bytes */
>>>    /*
>>> @@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
>>>    	return ret;
>>>    }
>>> -static int guc_context_pre_pin(struct intel_context *ce,
>>> -			       struct i915_gem_ww_ctx *ww,
>>> -			       void **vaddr)
>>> +static int __guc_context_pre_pin(struct intel_context *ce,
>>> +				 struct intel_engine_cs *engine,
>>> +				 struct i915_gem_ww_ctx *ww,
>>> +				 void **vaddr)
>>>    {
>>> -	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
>>> +	return lrc_pre_pin(ce, engine, ww, vaddr);
>>>    }
>>> -static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>> +static int __guc_context_pin(struct intel_context *ce,
>>> +			     struct intel_engine_cs *engine,
>>> +			     void *vaddr)
>>>    {
>>>    	if (i915_ggtt_offset(ce->state) !=
>>>    	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
>>>    		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>>> -	return lrc_pin(ce, ce->engine, vaddr);
>>> +	return lrc_pin(ce, engine, vaddr);
>>> +}
>>> +
>>> +static int guc_context_pre_pin(struct intel_context *ce,
>>> +			       struct i915_gem_ww_ctx *ww,
>>> +			       void **vaddr)
>>> +{
>>> +	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
>>> +}
>>> +
>>> +static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>> +{
>>> +	return __guc_context_pin(ce, ce->engine, vaddr);
>>>    }
>>>    static void guc_context_unpin(struct intel_context *ce)
>>> @@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>>>    	deregister_context(ce, ce->guc_id);
>>>    }
>>> +static void __guc_context_destroy(struct intel_context *ce)
>>> +{
>>> +	lrc_fini(ce);
>>> +	intel_context_fini(ce);
>>> +
>>> +	if (intel_engine_is_virtual(ce->engine)) {
>>> +		struct guc_virtual_engine *ve =
>>> +			container_of(ce, typeof(*ve), context);
>>> +
>>> +		kfree(ve);
>>> +	} else {
>>> +		intel_context_free(ce);
>>> +	}
>>> +}
>>> +
>>>    static void guc_context_destroy(struct kref *kref)
>>>    {
>>>    	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
>>> @@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
>>>    	if (context_guc_id_invalid(ce) ||
>>>    	    !lrc_desc_registered(guc, ce->guc_id)) {
>>>    		release_guc_id(guc, ce);
>>> -		lrc_destroy(kref);
>>> +		__guc_context_destroy(ce);
>>>    		return;
>>>    	}
>>> @@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
>>>    	if (context_guc_id_invalid(ce)) {
>>>    		__release_guc_id(guc, ce);
>>>    		spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> -		lrc_destroy(kref);
>>> +		__guc_context_destroy(ce);
>>>    		return;
>>>    	}
>>> @@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
>>>    	.reset = lrc_reset,
>>>    	.destroy = guc_context_destroy,
>>> +
>>> +	.create_virtual = guc_create_virtual,
>>>    };
>>>    static void __guc_signal_context_fence(struct intel_context *ce)
>>> @@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
>>>    	return 0;
>>>    }
>>> +static struct intel_engine_cs *
>>> +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
>>> +{
>>> +	struct intel_engine_cs *engine;
>>> +	intel_engine_mask_t tmp, mask = ve->mask;
>>> +	unsigned int num_siblings = 0;
>>> +
>>> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
>>> +		if (num_siblings++ == sibling)
>>> +			return engine;
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static int guc_virtual_context_pre_pin(struct intel_context *ce,
>>> +				       struct i915_gem_ww_ctx *ww,
>>> +				       void **vaddr)
>>> +{
>>> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
>>> +
>>> +	return __guc_context_pre_pin(ce, engine, ww, vaddr);
>>> +}
>>> +
>>> +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
>>> +{
>>> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
>>> +
>>> +	return __guc_context_pin(ce, engine, vaddr);
>>> +}
>>> +
>>> +static void guc_virtual_context_enter(struct intel_context *ce)
>>> +{
>>> +	intel_engine_mask_t tmp, mask = ce->engine->mask;
>>> +	struct intel_engine_cs *engine;
>>> +
>>> +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
>>> +		intel_engine_pm_get(engine);
>>> +
>>> +	intel_timeline_enter(ce->timeline);
>>> +}
>>> +
>>> +static void guc_virtual_context_exit(struct intel_context *ce)
>>> +{
>>> +	intel_engine_mask_t tmp, mask = ce->engine->mask;
>>> +	struct intel_engine_cs *engine;
>>> +
>>> +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
>>> +		intel_engine_pm_put(engine);
>>> +
>>> +	intel_timeline_exit(ce->timeline);
>>> +}
>>> +
>>> +static int guc_virtual_context_alloc(struct intel_context *ce)
>>> +{
>>> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
>>> +
>>> +	return lrc_alloc(ce, engine);
>>> +}
>>> +
>>> +static const struct intel_context_ops virtual_guc_context_ops = {
>>> +	.alloc = guc_virtual_context_alloc,
>>> +
>>> +	.pre_pin = guc_virtual_context_pre_pin,
>>> +	.pin = guc_virtual_context_pin,
>>> +	.unpin = guc_context_unpin,
>>> +	.post_unpin = guc_context_post_unpin,
>>> +
>>> +	.enter = guc_virtual_context_enter,
>>> +	.exit = guc_virtual_context_exit,
>>> +
>>> +	.sched_disable = guc_context_sched_disable,
>>> +
>>> +	.destroy = guc_context_destroy,
>>> +
>>> +	.get_sibling = guc_virtual_get_sibling,
>>> +};
>>> +
>>>    static void sanitize_hwsp(struct intel_engine_cs *engine)
>>>    {
>>>    	struct intel_timeline *tl;
>>> @@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>    	} else if (context_destroyed(ce)) {
>>>    		/* Context has been destroyed */
>>>    		release_guc_id(guc, ce);
>>> -		lrc_destroy(&ce->ref);
>>> +		__guc_context_destroy(ce);
>>>    	}
>>>    	decr_outstanding_submission_g2h(guc);
>>> @@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
>>>    			   atomic_read(&ce->guc_sched_state_no_lock));
>>>    	}
>>>    }
>>> +
>>> +static struct intel_context *
>>> +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>>> +{
>>> +	struct guc_virtual_engine *ve;
>>> +	struct intel_guc *guc;
>>> +	unsigned int n;
>>> +	int err;
>>> +
>>> +	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
>>> +	if (!ve)
>>> +		return ERR_PTR(-ENOMEM);
>>> +
>>> +	guc = &siblings[0]->gt->uc.guc;
>>> +
>>> +	ve->base.i915 = siblings[0]->i915;
>>> +	ve->base.gt = siblings[0]->gt;
>>> +	ve->base.uncore = siblings[0]->uncore;
>>> +	ve->base.id = -1;
>>> +
>>> +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
>>> +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>>> +	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>>> +	ve->base.saturated = ALL_ENGINES;
>>> +	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
>>> +	if (!ve->base.breadcrumbs) {
>>> +		kfree(ve);
>>> +		return ERR_PTR(-ENOMEM);
>>> +	}
>>> +
>>> +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>>> +
>>> +	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
>>> +
>>> +	ve->base.cops = &virtual_guc_context_ops;
>>> +	ve->base.request_alloc = guc_request_alloc;
>>> +
>>> +	ve->base.submit_request = guc_submit_request;
>>> +
>>> +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
>>> +
>>> +	intel_context_init(&ve->context, &ve->base);
>>> +
>>> +	for (n = 0; n < count; n++) {
>>> +		struct intel_engine_cs *sibling = siblings[n];
>>> +
>>> +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
>>> +		if (sibling->mask & ve->base.mask) {
>>> +			DRM_DEBUG("duplicate %s entry in load balancer\n",
>>> +				  sibling->name);
>>> +			err = -EINVAL;
>>> +			goto err_put;
>>> +		}
>>> +
>>> +		ve->base.mask |= sibling->mask;
>>> +
>>> +		if (n != 0 && ve->base.class != sibling->class) {
>>> +			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
>>> +				  sibling->class, ve->base.class);
>>> +			err = -EINVAL;
>>> +			goto err_put;
>>> +		} else if (n == 0) {
>>> +			ve->base.class = sibling->class;
>>> +			ve->base.uabi_class = sibling->uabi_class;
>>> +			snprintf(ve->base.name, sizeof(ve->base.name),
>>> +				 "v%dx%d", ve->base.class, count);
>>> +			ve->base.context_size = sibling->context_size;
>>> +
>>> +			ve->base.emit_bb_start = sibling->emit_bb_start;
>>> +			ve->base.emit_flush = sibling->emit_flush;
>>> +			ve->base.emit_init_breadcrumb =
>>> +				sibling->emit_init_breadcrumb;
>>> +			ve->base.emit_fini_breadcrumb =
>>> +				sibling->emit_fini_breadcrumb;
>>> +			ve->base.emit_fini_breadcrumb_dw =
>>> +				sibling->emit_fini_breadcrumb_dw;
>>> +
>>> +			ve->base.flags |= sibling->flags;
>>> +
>>> +			ve->base.props.timeslice_duration_ms =
>>> +				sibling->props.timeslice_duration_ms;
>>> +		}
>>> +	}
>>> +
>>> +	return &ve->context;
>>> +
>>> +err_put:
>>> +	intel_context_put(&ve->context);
>>> +	return ERR_PTR(err);
>>> +}
>>> +
>>> +
>>> +
>>> +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
>>> +{
>>> +	struct intel_engine_cs *engine;
>>> +	intel_engine_mask_t tmp, mask = ve->mask;
>>> +
>>> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
>>> +		if (READ_ONCE(engine->props.heartbeat_interval_ms))
>>> +			return true;
>>> +
>>> +	return false;
>>> +}
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>>> index 2b9470c90558..5f263ac4f46a 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>>> @@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
>>>    void intel_guc_submission_print_context_info(struct intel_guc *guc,
>>>    					     struct drm_printer *p);
>>> +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
>>> +
>>>    static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
>>>    {
>>>    	/* XXX: GuC submission is unavailable for now */


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 19/51] drm/i915/guc: GuC virtual engines
@ 2021-07-19 23:42         ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-19 23:42 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel



On 7/19/2021 4:27 PM, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 04:33:56PM -0700, Daniele Ceraolo Spurio wrote:
>>
>> On 7/16/2021 1:16 PM, Matthew Brost wrote:
>>> Implement GuC virtual engines. Rather simple implementation, basically
>>> just allocate an engine, setup context enter / exit function to virtual
>>> engine specific functions, set all other variables / functions to guc
>>> versions, and set the engine mask to that of all the siblings.
>>>
>>> v2: Update to work with proto-ctx
>>>
>>> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_context.c   |   8 +-
>>>    drivers/gpu/drm/i915/gem/i915_gem_context.h   |   1 +
>>>    drivers/gpu/drm/i915/gt/intel_context_types.h |   6 +
>>>    drivers/gpu/drm/i915/gt/intel_engine.h        |  27 +-
>>>    drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  14 +
>>>    .../drm/i915/gt/intel_execlists_submission.c  |  29 ++-
>>>    .../drm/i915/gt/intel_execlists_submission.h  |   4 -
>>>    drivers/gpu/drm/i915/gt/selftest_execlists.c  |  12 +-
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 240 +++++++++++++++++-
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.h |   2 +
>>>    10 files changed, 308 insertions(+), 35 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> index 64659802d4df..edefe299bd76 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> @@ -74,7 +74,6 @@
>>>    #include "gt/intel_context_param.h"
>>>    #include "gt/intel_engine_heartbeat.h"
>>>    #include "gt/intel_engine_user.h"
>>> -#include "gt/intel_execlists_submission.h" /* virtual_engine */
>>>    #include "gt/intel_gpu_commands.h"
>>>    #include "gt/intel_ring.h"
>>> @@ -363,9 +362,6 @@ set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
>>>    	if (!HAS_EXECLISTS(i915))
>>>    		return -ENODEV;
>>> -	if (intel_uc_uses_guc_submission(&i915->gt.uc))
>>> -		return -ENODEV; /* not implement yet */
>>> -
>>>    	if (get_user(idx, &ext->engine_index))
>>>    		return -EFAULT;
>>> @@ -950,8 +946,8 @@ static struct i915_gem_engines *user_engines(struct i915_gem_context *ctx,
>>>    			break;
>>>    		case I915_GEM_ENGINE_TYPE_BALANCED:
>>> -			ce = intel_execlists_create_virtual(pe[n].siblings,
>>> -							    pe[n].num_siblings);
>>> +			ce = intel_engine_create_virtual(pe[n].siblings,
>>> +							 pe[n].num_siblings);
>>>    			break;
>>>    		case I915_GEM_ENGINE_TYPE_INVALID:
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>> index 20411db84914..2639c719a7a6 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
>>> @@ -10,6 +10,7 @@
>>>    #include "i915_gem_context_types.h"
>>>    #include "gt/intel_context.h"
>>> +#include "gt/intel_engine.h"
>> Apologies for the late question, but why do you need this include in this
>> header? nothing else is changing within the file.
>>
> It likely doesn't need to be included. Let me see what the build / CI
> says if I remove it.
>
>>>    #include "i915_drv.h"
>>>    #include "i915_gem.h"
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> index 4a5518d295c2..542c98418771 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> @@ -47,6 +47,12 @@ struct intel_context_ops {
>>>    	void (*reset)(struct intel_context *ce);
>>>    	void (*destroy)(struct kref *kref);
>>> +
>>> +	/* virtual engine/context interface */
>>> +	struct intel_context *(*create_virtual)(struct intel_engine_cs **engine,
>>> +						unsigned int count);
>>> +	struct intel_engine_cs *(*get_sibling)(struct intel_engine_cs *engine,
>>> +					       unsigned int sibling);
>>>    };
>>>    struct intel_context {
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
>>> index f911c1224ab2..9fec0aca5f4b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
>>> @@ -273,13 +273,38 @@ intel_engine_has_preempt_reset(const struct intel_engine_cs *engine)
>>>    	return intel_engine_has_preemption(engine);
>>>    }
>>> +struct intel_context *
>>> +intel_engine_create_virtual(struct intel_engine_cs **siblings,
>>> +			    unsigned int count);
>>> +
>>> +static inline bool
>>> +intel_virtual_engine_has_heartbeat(const struct intel_engine_cs *engine)
>>> +{
>>> +	if (intel_engine_uses_guc(engine))
>>> +		return intel_guc_virtual_engine_has_heartbeat(engine);
>>> +	else
>>> +		GEM_BUG_ON("Only should be called in GuC submission");
>> I insist that this needs a better explanation, as I've commented on the
>> previous rev. Given that this is a shared file, it is not immediately
>> evident why this call shouldn't be called with the execlists backend.
>>
> Sure, can add a comment. Basically it comes down to it is always called
> an active request which in execlists mode is always a physical engine.

I had added a sample comment in the previous review in case you want to 
copy it. I remember you explaining this to me a while back, but it 
always takes me a moment to recall it when I look at this code and I 
expect someone not familiar with the backend implementation would be 
even more confused.

Daniele

>
> Matt
>
>> Daniele
>>
>>> +
>>> +	return false;
>>> +}
>>> +
>>>    static inline bool
>>>    intel_engine_has_heartbeat(const struct intel_engine_cs *engine)
>>>    {
>>>    	if (!IS_ACTIVE(CONFIG_DRM_I915_HEARTBEAT_INTERVAL))
>>>    		return false;
>>> -	return READ_ONCE(engine->props.heartbeat_interval_ms);
>>> +	if (intel_engine_is_virtual(engine))
>>> +		return intel_virtual_engine_has_heartbeat(engine);
>>> +	else
>>> +		return READ_ONCE(engine->props.heartbeat_interval_ms);
>>> +}
>>> +
>>> +static inline struct intel_engine_cs *
>>> +intel_engine_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
>>> +{
>>> +	GEM_BUG_ON(!intel_engine_is_virtual(engine));
>>> +	return engine->cops->get_sibling(engine, sibling);
>>>    }
>>>    #endif /* _INTEL_RINGBUFFER_H_ */
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> index d561573ed98c..b7292d5cb7da 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
>>> @@ -1737,6 +1737,20 @@ ktime_t intel_engine_get_busy_time(struct intel_engine_cs *engine, ktime_t *now)
>>>    	return total;
>>>    }
>>> +struct intel_context *
>>> +intel_engine_create_virtual(struct intel_engine_cs **siblings,
>>> +			    unsigned int count)
>>> +{
>>> +	if (count == 0)
>>> +		return ERR_PTR(-EINVAL);
>>> +
>>> +	if (count == 1)
>>> +		return intel_context_create(siblings[0]);
>>> +
>>> +	GEM_BUG_ON(!siblings[0]->cops->create_virtual);
>>> +	return siblings[0]->cops->create_virtual(siblings, count);
>>> +}
>>> +
>>>    static bool match_ring(struct i915_request *rq)
>>>    {
>>>    	u32 ring = ENGINE_READ(rq->engine, RING_START);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> index 56e25090da67..28492cdce706 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> @@ -193,6 +193,9 @@ static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
>>>    	return container_of(engine, struct virtual_engine, base);
>>>    }
>>> +static struct intel_context *
>>> +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
>>> +
>>>    static struct i915_request *
>>>    __active_request(const struct intel_timeline * const tl,
>>>    		 struct i915_request *rq,
>>> @@ -2548,6 +2551,8 @@ static const struct intel_context_ops execlists_context_ops = {
>>>    	.reset = lrc_reset,
>>>    	.destroy = lrc_destroy,
>>> +
>>> +	.create_virtual = execlists_create_virtual,
>>>    };
>>>    static int emit_pdps(struct i915_request *rq)
>>> @@ -3493,6 +3498,17 @@ static void virtual_context_exit(struct intel_context *ce)
>>>    		intel_engine_pm_put(ve->siblings[n]);
>>>    }
>>> +static struct intel_engine_cs *
>>> +virtual_get_sibling(struct intel_engine_cs *engine, unsigned int sibling)
>>> +{
>>> +	struct virtual_engine *ve = to_virtual_engine(engine);
>>> +
>>> +	if (sibling >= ve->num_siblings)
>>> +		return NULL;
>>> +
>>> +	return ve->siblings[sibling];
>>> +}
>>> +
>>>    static const struct intel_context_ops virtual_context_ops = {
>>>    	.flags = COPS_HAS_INFLIGHT,
>>> @@ -3507,6 +3523,8 @@ static const struct intel_context_ops virtual_context_ops = {
>>>    	.exit = virtual_context_exit,
>>>    	.destroy = virtual_context_destroy,
>>> +
>>> +	.get_sibling = virtual_get_sibling,
>>>    };
>>>    static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
>>> @@ -3655,20 +3673,13 @@ static void virtual_submit_request(struct i915_request *rq)
>>>    	spin_unlock_irqrestore(&ve->base.sched_engine->lock, flags);
>>>    }
>>> -struct intel_context *
>>> -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>>> -			       unsigned int count)
>>> +static struct intel_context *
>>> +execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>>>    {
>>>    	struct virtual_engine *ve;
>>>    	unsigned int n;
>>>    	int err;
>>> -	if (count == 0)
>>> -		return ERR_PTR(-EINVAL);
>>> -
>>> -	if (count == 1)
>>> -		return intel_context_create(siblings[0]);
>>> -
>>>    	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
>>>    	if (!ve)
>>>    		return ERR_PTR(-ENOMEM);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
>>> index ad4f3e1a0fde..a1aa92c983a5 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.h
>>> @@ -32,10 +32,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>>>    							int indent),
>>>    				   unsigned int max);
>>> -struct intel_context *
>>> -intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>>> -			       unsigned int count);
>>> -
>>>    bool
>>>    intel_engine_in_execlists_submission_mode(const struct intel_engine_cs *engine);
>>> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c b/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>> index 73ddc6e14730..59cf8afc6d6f 100644
>>> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
>>> @@ -3727,7 +3727,7 @@ static int nop_virtual_engine(struct intel_gt *gt,
>>>    	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ve));
>>>    	for (n = 0; n < nctx; n++) {
>>> -		ve[n] = intel_execlists_create_virtual(siblings, nsibling);
>>> +		ve[n] = intel_engine_create_virtual(siblings, nsibling);
>>>    		if (IS_ERR(ve[n])) {
>>>    			err = PTR_ERR(ve[n]);
>>>    			nctx = n;
>>> @@ -3923,7 +3923,7 @@ static int mask_virtual_engine(struct intel_gt *gt,
>>>    	 * restrict it to our desired engine within the virtual engine.
>>>    	 */
>>> -	ve = intel_execlists_create_virtual(siblings, nsibling);
>>> +	ve = intel_engine_create_virtual(siblings, nsibling);
>>>    	if (IS_ERR(ve)) {
>>>    		err = PTR_ERR(ve);
>>>    		goto out_close;
>>> @@ -4054,7 +4054,7 @@ static int slicein_virtual_engine(struct intel_gt *gt,
>>>    		i915_request_add(rq);
>>>    	}
>>> -	ce = intel_execlists_create_virtual(siblings, nsibling);
>>> +	ce = intel_engine_create_virtual(siblings, nsibling);
>>>    	if (IS_ERR(ce)) {
>>>    		err = PTR_ERR(ce);
>>>    		goto out;
>>> @@ -4106,7 +4106,7 @@ static int sliceout_virtual_engine(struct intel_gt *gt,
>>>    	/* XXX We do not handle oversubscription and fairness with normal rq */
>>>    	for (n = 0; n < nsibling; n++) {
>>> -		ce = intel_execlists_create_virtual(siblings, nsibling);
>>> +		ce = intel_engine_create_virtual(siblings, nsibling);
>>>    		if (IS_ERR(ce)) {
>>>    			err = PTR_ERR(ce);
>>>    			goto out;
>>> @@ -4208,7 +4208,7 @@ static int preserved_virtual_engine(struct intel_gt *gt,
>>>    	if (err)
>>>    		goto out_scratch;
>>> -	ve = intel_execlists_create_virtual(siblings, nsibling);
>>> +	ve = intel_engine_create_virtual(siblings, nsibling);
>>>    	if (IS_ERR(ve)) {
>>>    		err = PTR_ERR(ve);
>>>    		goto out_scratch;
>>> @@ -4348,7 +4348,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
>>>    	if (igt_spinner_init(&spin, gt))
>>>    		return -ENOMEM;
>>> -	ve = intel_execlists_create_virtual(siblings, nsibling);
>>> +	ve = intel_engine_create_virtual(siblings, nsibling);
>>>    	if (IS_ERR(ve)) {
>>>    		err = PTR_ERR(ve);
>>>    		goto out_spin;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index 05958260e849..7b3e1c91e689 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -60,6 +60,15 @@
>>>     *
>>>     */
>>> +/* GuC Virtual Engine */
>>> +struct guc_virtual_engine {
>>> +	struct intel_engine_cs base;
>>> +	struct intel_context context;
>>> +};
>>> +
>>> +static struct intel_context *
>>> +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
>>> +
>>>    #define GUC_REQUEST_SIZE 64 /* bytes */
>>>    /*
>>> @@ -925,20 +934,35 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
>>>    	return ret;
>>>    }
>>> -static int guc_context_pre_pin(struct intel_context *ce,
>>> -			       struct i915_gem_ww_ctx *ww,
>>> -			       void **vaddr)
>>> +static int __guc_context_pre_pin(struct intel_context *ce,
>>> +				 struct intel_engine_cs *engine,
>>> +				 struct i915_gem_ww_ctx *ww,
>>> +				 void **vaddr)
>>>    {
>>> -	return lrc_pre_pin(ce, ce->engine, ww, vaddr);
>>> +	return lrc_pre_pin(ce, engine, ww, vaddr);
>>>    }
>>> -static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>> +static int __guc_context_pin(struct intel_context *ce,
>>> +			     struct intel_engine_cs *engine,
>>> +			     void *vaddr)
>>>    {
>>>    	if (i915_ggtt_offset(ce->state) !=
>>>    	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
>>>    		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>>> -	return lrc_pin(ce, ce->engine, vaddr);
>>> +	return lrc_pin(ce, engine, vaddr);
>>> +}
>>> +
>>> +static int guc_context_pre_pin(struct intel_context *ce,
>>> +			       struct i915_gem_ww_ctx *ww,
>>> +			       void **vaddr)
>>> +{
>>> +	return __guc_context_pre_pin(ce, ce->engine, ww, vaddr);
>>> +}
>>> +
>>> +static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>> +{
>>> +	return __guc_context_pin(ce, ce->engine, vaddr);
>>>    }
>>>    static void guc_context_unpin(struct intel_context *ce)
>>> @@ -1043,6 +1067,21 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>>>    	deregister_context(ce, ce->guc_id);
>>>    }
>>> +static void __guc_context_destroy(struct intel_context *ce)
>>> +{
>>> +	lrc_fini(ce);
>>> +	intel_context_fini(ce);
>>> +
>>> +	if (intel_engine_is_virtual(ce->engine)) {
>>> +		struct guc_virtual_engine *ve =
>>> +			container_of(ce, typeof(*ve), context);
>>> +
>>> +		kfree(ve);
>>> +	} else {
>>> +		intel_context_free(ce);
>>> +	}
>>> +}
>>> +
>>>    static void guc_context_destroy(struct kref *kref)
>>>    {
>>>    	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
>>> @@ -1059,7 +1098,7 @@ static void guc_context_destroy(struct kref *kref)
>>>    	if (context_guc_id_invalid(ce) ||
>>>    	    !lrc_desc_registered(guc, ce->guc_id)) {
>>>    		release_guc_id(guc, ce);
>>> -		lrc_destroy(kref);
>>> +		__guc_context_destroy(ce);
>>>    		return;
>>>    	}
>>> @@ -1075,7 +1114,7 @@ static void guc_context_destroy(struct kref *kref)
>>>    	if (context_guc_id_invalid(ce)) {
>>>    		__release_guc_id(guc, ce);
>>>    		spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> -		lrc_destroy(kref);
>>> +		__guc_context_destroy(ce);
>>>    		return;
>>>    	}
>>> @@ -1120,6 +1159,8 @@ static const struct intel_context_ops guc_context_ops = {
>>>    	.reset = lrc_reset,
>>>    	.destroy = guc_context_destroy,
>>> +
>>> +	.create_virtual = guc_create_virtual,
>>>    };
>>>    static void __guc_signal_context_fence(struct intel_context *ce)
>>> @@ -1248,6 +1289,83 @@ static int guc_request_alloc(struct i915_request *rq)
>>>    	return 0;
>>>    }
>>> +static struct intel_engine_cs *
>>> +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
>>> +{
>>> +	struct intel_engine_cs *engine;
>>> +	intel_engine_mask_t tmp, mask = ve->mask;
>>> +	unsigned int num_siblings = 0;
>>> +
>>> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
>>> +		if (num_siblings++ == sibling)
>>> +			return engine;
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +static int guc_virtual_context_pre_pin(struct intel_context *ce,
>>> +				       struct i915_gem_ww_ctx *ww,
>>> +				       void **vaddr)
>>> +{
>>> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
>>> +
>>> +	return __guc_context_pre_pin(ce, engine, ww, vaddr);
>>> +}
>>> +
>>> +static int guc_virtual_context_pin(struct intel_context *ce, void *vaddr)
>>> +{
>>> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
>>> +
>>> +	return __guc_context_pin(ce, engine, vaddr);
>>> +}
>>> +
>>> +static void guc_virtual_context_enter(struct intel_context *ce)
>>> +{
>>> +	intel_engine_mask_t tmp, mask = ce->engine->mask;
>>> +	struct intel_engine_cs *engine;
>>> +
>>> +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
>>> +		intel_engine_pm_get(engine);
>>> +
>>> +	intel_timeline_enter(ce->timeline);
>>> +}
>>> +
>>> +static void guc_virtual_context_exit(struct intel_context *ce)
>>> +{
>>> +	intel_engine_mask_t tmp, mask = ce->engine->mask;
>>> +	struct intel_engine_cs *engine;
>>> +
>>> +	for_each_engine_masked(engine, ce->engine->gt, mask, tmp)
>>> +		intel_engine_pm_put(engine);
>>> +
>>> +	intel_timeline_exit(ce->timeline);
>>> +}
>>> +
>>> +static int guc_virtual_context_alloc(struct intel_context *ce)
>>> +{
>>> +	struct intel_engine_cs *engine = guc_virtual_get_sibling(ce->engine, 0);
>>> +
>>> +	return lrc_alloc(ce, engine);
>>> +}
>>> +
>>> +static const struct intel_context_ops virtual_guc_context_ops = {
>>> +	.alloc = guc_virtual_context_alloc,
>>> +
>>> +	.pre_pin = guc_virtual_context_pre_pin,
>>> +	.pin = guc_virtual_context_pin,
>>> +	.unpin = guc_context_unpin,
>>> +	.post_unpin = guc_context_post_unpin,
>>> +
>>> +	.enter = guc_virtual_context_enter,
>>> +	.exit = guc_virtual_context_exit,
>>> +
>>> +	.sched_disable = guc_context_sched_disable,
>>> +
>>> +	.destroy = guc_context_destroy,
>>> +
>>> +	.get_sibling = guc_virtual_get_sibling,
>>> +};
>>> +
>>>    static void sanitize_hwsp(struct intel_engine_cs *engine)
>>>    {
>>>    	struct intel_timeline *tl;
>>> @@ -1559,7 +1677,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>    	} else if (context_destroyed(ce)) {
>>>    		/* Context has been destroyed */
>>>    		release_guc_id(guc, ce);
>>> -		lrc_destroy(&ce->ref);
>>> +		__guc_context_destroy(ce);
>>>    	}
>>>    	decr_outstanding_submission_g2h(guc);
>>> @@ -1674,3 +1792,107 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
>>>    			   atomic_read(&ce->guc_sched_state_no_lock));
>>>    	}
>>>    }
>>> +
>>> +static struct intel_context *
>>> +guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>>> +{
>>> +	struct guc_virtual_engine *ve;
>>> +	struct intel_guc *guc;
>>> +	unsigned int n;
>>> +	int err;
>>> +
>>> +	ve = kzalloc(sizeof(*ve), GFP_KERNEL);
>>> +	if (!ve)
>>> +		return ERR_PTR(-ENOMEM);
>>> +
>>> +	guc = &siblings[0]->gt->uc.guc;
>>> +
>>> +	ve->base.i915 = siblings[0]->i915;
>>> +	ve->base.gt = siblings[0]->gt;
>>> +	ve->base.uncore = siblings[0]->uncore;
>>> +	ve->base.id = -1;
>>> +
>>> +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
>>> +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>>> +	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>>> +	ve->base.saturated = ALL_ENGINES;
>>> +	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
>>> +	if (!ve->base.breadcrumbs) {
>>> +		kfree(ve);
>>> +		return ERR_PTR(-ENOMEM);
>>> +	}
>>> +
>>> +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>>> +
>>> +	ve->base.sched_engine = i915_sched_engine_get(guc->sched_engine);
>>> +
>>> +	ve->base.cops = &virtual_guc_context_ops;
>>> +	ve->base.request_alloc = guc_request_alloc;
>>> +
>>> +	ve->base.submit_request = guc_submit_request;
>>> +
>>> +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
>>> +
>>> +	intel_context_init(&ve->context, &ve->base);
>>> +
>>> +	for (n = 0; n < count; n++) {
>>> +		struct intel_engine_cs *sibling = siblings[n];
>>> +
>>> +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
>>> +		if (sibling->mask & ve->base.mask) {
>>> +			DRM_DEBUG("duplicate %s entry in load balancer\n",
>>> +				  sibling->name);
>>> +			err = -EINVAL;
>>> +			goto err_put;
>>> +		}
>>> +
>>> +		ve->base.mask |= sibling->mask;
>>> +
>>> +		if (n != 0 && ve->base.class != sibling->class) {
>>> +			DRM_DEBUG("invalid mixing of engine class, sibling %d, already %d\n",
>>> +				  sibling->class, ve->base.class);
>>> +			err = -EINVAL;
>>> +			goto err_put;
>>> +		} else if (n == 0) {
>>> +			ve->base.class = sibling->class;
>>> +			ve->base.uabi_class = sibling->uabi_class;
>>> +			snprintf(ve->base.name, sizeof(ve->base.name),
>>> +				 "v%dx%d", ve->base.class, count);
>>> +			ve->base.context_size = sibling->context_size;
>>> +
>>> +			ve->base.emit_bb_start = sibling->emit_bb_start;
>>> +			ve->base.emit_flush = sibling->emit_flush;
>>> +			ve->base.emit_init_breadcrumb =
>>> +				sibling->emit_init_breadcrumb;
>>> +			ve->base.emit_fini_breadcrumb =
>>> +				sibling->emit_fini_breadcrumb;
>>> +			ve->base.emit_fini_breadcrumb_dw =
>>> +				sibling->emit_fini_breadcrumb_dw;
>>> +
>>> +			ve->base.flags |= sibling->flags;
>>> +
>>> +			ve->base.props.timeslice_duration_ms =
>>> +				sibling->props.timeslice_duration_ms;
>>> +		}
>>> +	}
>>> +
>>> +	return &ve->context;
>>> +
>>> +err_put:
>>> +	intel_context_put(&ve->context);
>>> +	return ERR_PTR(err);
>>> +}
>>> +
>>> +
>>> +
>>> +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve)
>>> +{
>>> +	struct intel_engine_cs *engine;
>>> +	intel_engine_mask_t tmp, mask = ve->mask;
>>> +
>>> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
>>> +		if (READ_ONCE(engine->props.heartbeat_interval_ms))
>>> +			return true;
>>> +
>>> +	return false;
>>> +}
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>>> index 2b9470c90558..5f263ac4f46a 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
>>> @@ -26,6 +26,8 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
>>>    void intel_guc_submission_print_context_info(struct intel_guc *guc,
>>>    					     struct drm_printer *p);
>>> +bool intel_guc_virtual_engine_has_heartbeat(const struct intel_engine_cs *ve);
>>> +
>>>    static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
>>>    {
>>>    	/* XXX: GuC submission is unavailable for now */

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 12/51] drm/i915/guc: Ensure request ordering via completion fences
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-19 23:46     ` Daniele Ceraolo Spurio
  -1 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-19 23:46 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: john.c.harrison



On 7/16/2021 1:16 PM, Matthew Brost wrote:
> If two requests are on the same ring, they are explicitly ordered by the
> HW. So, a submission fence is sufficient to ensure ordering when using
> the new GuC submission interface. Conversely, if two requests share a
> timeline and are on the same physical engine but different context this
> doesn't ensure ordering on the new GuC submission interface. So, a
> completion fence needs to be used to ensure ordering.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  1 -
>   drivers/gpu/drm/i915/i915_request.c               | 12 ++++++++++--
>   2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 9dc1a256e185..4443cc6f5320 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -933,7 +933,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   	 * a request before we set the 'context_pending_disable' flag here.
>   	 */
>   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> -		spin_unlock_irqrestore(&ce->guc_state.lock, flags);

incorrect spinlock drop is still here. Everything else looks ok (my 
suggestion to use an engine flag stands, but can be addressed as a 
follow up).

Daniele

>   		return;
>   	}
>   	guc_id = prep_context_pending_disable(ce);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index b48c4905d3fc..2b2b63cba06c 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -432,6 +432,7 @@ void i915_request_retire_upto(struct i915_request *rq)
>   
>   	do {
>   		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
> +		GEM_BUG_ON(!i915_request_completed(tmp));
>   	} while (i915_request_retire(tmp) && tmp != rq);
>   }
>   
> @@ -1380,6 +1381,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
>   	return err;
>   }
>   
> +static int
> +i915_request_await_request(struct i915_request *to, struct i915_request *from);
> +
>   int
>   i915_request_await_execution(struct i915_request *rq,
>   			     struct dma_fence *fence)
> @@ -1465,7 +1469,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   			return ret;
>   	}
>   
> -	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> +	if (!intel_engine_uses_guc(to->engine) &&
> +	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
>   		ret = await_request_submit(to, from);
>   	else
>   		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
> @@ -1626,6 +1631,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
>   	prev = to_request(__i915_active_fence_set(&timeline->last_request,
>   						  &rq->fence));
>   	if (prev && !__i915_request_is_complete(prev)) {
> +		bool uses_guc = intel_engine_uses_guc(rq->engine);
> +
>   		/*
>   		 * The requests are supposed to be kept in order. However,
>   		 * we need to be wary in case the timeline->last_request
> @@ -1636,7 +1643,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
>   			   i915_seqno_passed(prev->fence.seqno,
>   					     rq->fence.seqno));
>   
> -		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
> +		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
> +		    (uses_guc && prev->context == rq->context))
>   			i915_sw_fence_await_sw_fence(&rq->submit,
>   						     &prev->submit,
>   						     &rq->submitq);


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 12/51] drm/i915/guc: Ensure request ordering via completion fences
@ 2021-07-19 23:46     ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-19 23:46 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 7/16/2021 1:16 PM, Matthew Brost wrote:
> If two requests are on the same ring, they are explicitly ordered by the
> HW. So, a submission fence is sufficient to ensure ordering when using
> the new GuC submission interface. Conversely, if two requests share a
> timeline and are on the same physical engine but different context this
> doesn't ensure ordering on the new GuC submission interface. So, a
> completion fence needs to be used to ensure ordering.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  1 -
>   drivers/gpu/drm/i915/i915_request.c               | 12 ++++++++++--
>   2 files changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 9dc1a256e185..4443cc6f5320 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -933,7 +933,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   	 * a request before we set the 'context_pending_disable' flag here.
>   	 */
>   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> -		spin_unlock_irqrestore(&ce->guc_state.lock, flags);

incorrect spinlock drop is still here. Everything else looks ok (my 
suggestion to use an engine flag stands, but can be addressed as a 
follow up).

Daniele

>   		return;
>   	}
>   	guc_id = prep_context_pending_disable(ce);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index b48c4905d3fc..2b2b63cba06c 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -432,6 +432,7 @@ void i915_request_retire_upto(struct i915_request *rq)
>   
>   	do {
>   		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
> +		GEM_BUG_ON(!i915_request_completed(tmp));
>   	} while (i915_request_retire(tmp) && tmp != rq);
>   }
>   
> @@ -1380,6 +1381,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
>   	return err;
>   }
>   
> +static int
> +i915_request_await_request(struct i915_request *to, struct i915_request *from);
> +
>   int
>   i915_request_await_execution(struct i915_request *rq,
>   			     struct dma_fence *fence)
> @@ -1465,7 +1469,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   			return ret;
>   	}
>   
> -	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> +	if (!intel_engine_uses_guc(to->engine) &&
> +	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
>   		ret = await_request_submit(to, from);
>   	else
>   		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
> @@ -1626,6 +1631,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
>   	prev = to_request(__i915_active_fence_set(&timeline->last_request,
>   						  &rq->fence));
>   	if (prev && !__i915_request_is_complete(prev)) {
> +		bool uses_guc = intel_engine_uses_guc(rq->engine);
> +
>   		/*
>   		 * The requests are supposed to be kept in order. However,
>   		 * we need to be wary in case the timeline->last_request
> @@ -1636,7 +1643,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
>   			   i915_seqno_passed(prev->fence.seqno,
>   					     rq->fence.seqno));
>   
> -		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
> +		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
> +		    (uses_guc && prev->context == rq->context))
>   			i915_sw_fence_await_sw_fence(&rq->submit,
>   						     &prev->submit,
>   						     &rq->submitq);

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20  0:23     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  0:23 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> Implement GuC context operations which includes GuC specific operations
> alloc, pin, unpin, and destroy.
>
> v2:
>   (Daniel Vetter)
>    - Use msleep_interruptible rather than cond_resched in busy loop
>   (Michal)
>    - Remove C++ style comment
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
>   drivers/gpu/drm/i915/i915_reg.h               |   1 +
>   drivers/gpu/drm/i915/i915_request.c           |   1 +
>   8 files changed, 685 insertions(+), 55 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index bd63813c8a80..32fd6647154b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   
>   	mutex_init(&ce->pin_mutex);
>   
> +	spin_lock_init(&ce->guc_state.lock);
> +
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +	INIT_LIST_HEAD(&ce->guc_id_link);
> +
>   	i915_active_init(&ce->active,
>   			 __intel_context_active, __intel_context_retire, 0);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 6d99631d19b9..606c480aec26 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -96,6 +96,7 @@ struct intel_context {
>   #define CONTEXT_BANNED			6
>   #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
>   #define CONTEXT_NOPREEMPT		8
> +#define CONTEXT_LRCA_DIRTY		9
>   
>   	struct {
>   		u64 timeout_us;
> @@ -138,14 +139,29 @@ struct intel_context {
>   
>   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
>   
> +	struct {
> +		/** lock: protects everything in guc_state */
> +		spinlock_t lock;
> +		/**
> +		 * sched_state: scheduling state of this context using GuC
> +		 * submission
> +		 */
> +		u8 sched_state;
> +	} guc_state;
> +
>   	/* GuC scheduling state flags that do not require a lock. */
>   	atomic_t guc_sched_state_no_lock;
>   
> +	/* GuC LRC descriptor ID */
> +	u16 guc_id;
> +
> +	/* GuC LRC descriptor reference count */
> +	atomic_t guc_id_ref;
> +
>   	/*
> -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> -	 * in the series will.
> +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
>   	 */
> -	u16 guc_id;
> +	struct list_head guc_id_link;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> index 41e5350a7a05..49d4857ad9b7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> @@ -87,7 +87,6 @@
>   #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
>   
>   #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
>   #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
>   /* in Gen12 ID 0x7FF is reserved to indicate idle */
>   #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 8c7b92f699f1..30773cd699f5 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -7,6 +7,7 @@
>   #define _INTEL_GUC_H_
>   
>   #include <linux/xarray.h>
> +#include <linux/delay.h>
>   
>   #include "intel_uncore.h"
>   #include "intel_guc_fw.h"
> @@ -44,6 +45,14 @@ struct intel_guc {
>   		void (*disable)(struct intel_guc *guc);
>   	} interrupts;
>   
> +	/*
> +	 * contexts_lock protects the pool of free guc ids and a linked list of
> +	 * guc ids available to be stolen
> +	 */
> +	spinlock_t contexts_lock;
> +	struct ida guc_ids;
> +	struct list_head guc_id_list;
> +
>   	bool submission_selected;
>   
>   	struct i915_vma *ads_vma;
> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>   				 response_buf, response_buf_size, 0);
>   }
>   
> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> +					   const u32 *action,
> +					   u32 len,
> +					   bool loop)
> +{
> +	int err;
> +	unsigned int sleep_period_ms = 1;
> +	bool not_atomic = !in_atomic() && !irqs_disabled();
> +
> +	/* No sleeping with spin locks, just busy loop */
> +	might_sleep_if(loop && not_atomic);
> +
> +retry:
> +	err = intel_guc_send_nb(guc, action, len);
> +	if (unlikely(err == -EBUSY && loop)) {
> +		if (likely(not_atomic)) {
> +			if (msleep_interruptible(sleep_period_ms))
> +				return -EINTR;
> +			sleep_period_ms = sleep_period_ms << 1;
> +		} else {
> +			cpu_relax();
> +		}
> +		goto retry;
> +	}
> +
> +	return err;
> +}
> +
>   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>   {
>   	intel_guc_ct_event_handler(&guc->ct);
> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   int intel_guc_reset_engine(struct intel_guc *guc,
>   			   struct intel_engine_cs *engine);
>   
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg, u32 len);
> +
>   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 83ec60ea3f89..28ff82c5be45 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>   	case INTEL_GUC_ACTION_DEFAULT:
>   		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>   		break;
> +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> +							    len);
> +		break;
>   	default:
>   		ret = -EOPNOTSUPP;
>   		break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 53b4a5eb4a85..a47b3813b4d0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -13,7 +13,9 @@
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_irq.h"
>   #include "gt/intel_gt_pm.h"
> +#include "gt/intel_gt_requests.h"
>   #include "gt/intel_lrc.h"
> +#include "gt/intel_lrc_reg.h"
>   #include "gt/intel_mocs.h"
>   #include "gt/intel_ring.h"
>   
> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
>   		   &ce->guc_sched_state_no_lock);
>   }
>   
> +/*
> + * Below is a set of functions which control the GuC scheduling state which
> + * require a lock, aside from the special case where the functions are called
> + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> + * path to be executing on the context.
> + */
> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> +#define SCHED_STATE_DESTROYED				BIT(1)
> +static inline void init_sched_state(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> +	ce->guc_state.sched_state = 0;
> +}
> +
> +static inline bool
> +context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state &
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> +}
> +
> +static inline void
> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	ce->guc_state.sched_state |=
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> +}
> +
> +static inline void
> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state =
> +		(ce->guc_state.sched_state &
> +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> +}
> +
> +static inline bool
> +context_destroyed(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> +}
> +
> +static inline void
> +set_context_destroyed(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> +}
> +
> +static inline bool context_guc_id_invalid(struct intel_context *ce)
> +{
> +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> +}
> +
> +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> +{
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +}
> +
> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> +{
> +	return &ce->engine->gt->uc.guc;
> +}
> +
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
>   	return rb_entry(rb, struct i915_priolist, node);
> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   	int len = 0;
>   	bool enabled = context_enabled(ce);
>   
> +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> +	GEM_BUG_ON(context_guc_id_invalid(ce));
> +
>   	if (!enabled) {
>   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>   		action[len++] = ce->guc_id;
> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   
>   	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>   
> +	spin_lock_init(&guc->contexts_lock);
> +	INIT_LIST_HEAD(&guc->guc_id_list);
> +	ida_init(&guc->guc_ids);
> +
>   	return 0;
>   }
>   
> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>   	i915_sched_engine_put(guc->sched_engine);
>   }
>   
> -static int guc_context_alloc(struct intel_context *ce)
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
> +				 struct i915_request *rq,
> +				 int prio)
>   {
> -	return lrc_alloc(ce, ce->engine);
> +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> +	list_add_tail(&rq->sched.link,
> +		      i915_sched_lookup_priolist(sched_engine, prio));
> +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +}
> +
> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> +				     struct i915_request *rq)
> +{
> +	int ret;
> +
> +	__i915_request_submit(rq);
> +
> +	trace_i915_request_in(rq, 0);
> +
> +	guc_set_lrc_tail(rq);
> +	ret = guc_add_request(guc, rq);
> +	if (ret == -EBUSY)
> +		guc->stalled_request = rq;
> +
> +	return ret;
> +}
> +
> +static void guc_submit_request(struct i915_request *rq)
> +{
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	/* Will be called from irq-context when using foreign fences. */
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +
> +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +		queue_request(sched_engine, rq, rq_prio(rq));
> +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> +		tasklet_hi_schedule(&sched_engine->tasklet);
> +
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
Reserved for what?

> +static int new_guc_id(struct intel_guc *guc)
> +{
> +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> +}
> +
> +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	if (!context_guc_id_invalid(ce)) {
> +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> +		reset_lrc_desc(guc, ce->guc_id);
> +		set_context_guc_id_invalid(ce);
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +}
> +
> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	__release_guc_id(guc, ce);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int steal_guc_id(struct intel_guc *guc)
> +{
> +	struct intel_context *ce;
> +	int guc_id;
> +
> +	if (!list_empty(&guc->guc_id_list)) {
> +		ce = list_first_entry(&guc->guc_id_list,
> +				      struct intel_context,
> +				      guc_id_link);
> +
> +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +		GEM_BUG_ON(context_guc_id_invalid(ce));
> +
> +		list_del_init(&ce->guc_id_link);
> +		guc_id = ce->guc_id;
> +		set_context_guc_id_invalid(ce);
> +		return guc_id;
> +	} else {
> +		return -EAGAIN;
> +	}
> +}
> +
> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> +{
> +	int ret;
> +
> +	ret = new_guc_id(guc);
> +	if (unlikely(ret < 0)) {
> +		ret = steal_guc_id(guc);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	*out = ret;
> +	return 0;
> +}
> +
> +#define PIN_GUC_ID_TRIES	4
> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	int ret = 0;
> +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +
> +try_again:
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +
> +	if (context_guc_id_invalid(ce)) {
> +		ret = assign_guc_id(guc, &ce->guc_id);
> +		if (ret)
> +			goto out_unlock;
> +		ret = 1;	/* Indidcates newly assigned guc_id */
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	atomic_inc(&ce->guc_id_ref);
> +
> +out_unlock:
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> +	 * outstanding requests to see if that frees up a guc_id. If the first
> +	 * retire didn't help, insert a sleep with the timeslice duration before
> +	 * attempting to retire more requests. Double the sleep period each
> +	 * subsequent pass before finally giving up. The sleep period has max of
> +	 * 100ms and minimum of 1ms.
> +	 */
> +	if (ret == -EAGAIN && --tries) {
> +		if (PIN_GUC_ID_TRIES - tries > 1) {
> +			unsigned int timeslice_shifted =
> +				ce->engine->props.timeslice_duration_ms <<
> +				(PIN_GUC_ID_TRIES - tries - 2);
> +			unsigned int max = min_t(unsigned int, 100,
> +						 timeslice_shifted);
> +
> +			msleep(max_t(unsigned int, max, 1));
> +		}
> +		intel_gt_retire_requests(guc_to_gt(guc));
> +		goto try_again;
> +	}
> +
> +	return ret;
> +}
> +
> +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> +	    !atomic_read(&ce->guc_id_ref))
> +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int __guc_action_register_context(struct intel_guc *guc,
> +					 u32 guc_id,
> +					 u32 offset)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> +		guc_id,
> +		offset,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int register_context(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> +		ce->guc_id * sizeof(struct guc_lrc_desc);
> +
> +	return __guc_action_register_context(guc, ce->guc_id, offset);
> +}
> +
> +static int __guc_action_deregister_context(struct intel_guc *guc,
> +					   u32 guc_id)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> +		guc_id,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int deregister_context(struct intel_context *ce, u32 guc_id)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	return __guc_action_deregister_context(guc, guc_id);
> +}
> +
> +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> +{
> +	switch (class) {
> +	case RENDER_CLASS:
> +		return mask >> RCS0;
> +	case VIDEO_ENHANCEMENT_CLASS:
> +		return mask >> VECS0;
> +	case VIDEO_DECODE_CLASS:
> +		return mask >> VCS0;
> +	case COPY_ENGINE_CLASS:
> +		return mask >> BCS0;
> +	default:
> +		GEM_BUG_ON("Invalid Class");
> +		return 0;
> +	}
> +}
> +
> +static void guc_context_policy_init(struct intel_engine_cs *engine,
> +				    struct guc_lrc_desc *desc)
> +{
> +	desc->policy_flags = 0;
> +
> +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> +}
> +
> +static int guc_lrc_desc_pin(struct intel_context *ce)
> +{
> +	struct intel_runtime_pm *runtime_pm =
> +		&ce->engine->gt->i915->runtime_pm;
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
> +	u32 desc_idx = ce->guc_id;
> +	struct guc_lrc_desc *desc;
> +	bool context_registered;
> +	intel_wakeref_t wakeref;
> +	int ret = 0;
> +
> +	GEM_BUG_ON(!engine->mask);
> +
> +	/*
> +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> +	 * based on CT vma region.
> +	 */
> +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> +
> +	context_registered = lrc_desc_registered(guc, desc_idx);
> +
> +	reset_lrc_desc(guc, desc_idx);
> +	set_lrc_desc_registered(guc, desc_idx, ce);
> +
> +	desc = __get_lrc_desc(guc, desc_idx);
> +	desc->engine_class = engine_class_to_guc_class(engine->class);
> +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> +						      engine->mask);
> +	desc->hw_context_desc = ce->lrc.lrca;
> +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> +	guc_context_policy_init(engine, desc);
> +	init_sched_state(ce);
> +
> +	/*
> +	 * The context_lookup xarray is used to determine if the hardware
> +	 * context is currently registered. There are two cases in which it
> +	 * could be regisgered either the guc_id has been stole from from
'regisgered' and 'stole' should be 'stolen'

> +	 * another context or the lrc descriptor address of this context has
> +	 * changed. In either case the context needs to be deregistered with the
> +	 * GuC before registering this context.
> +	 */
> +	if (context_registered) {
> +		set_context_wait_for_deregister_to_register(ce);
> +		intel_context_get(ce);
> +
> +		/*
> +		 * If stealing the guc_id, this ce has the same guc_id as the
> +		 * context whos guc_id was stole.
whose guc_id was stolen

> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = deregister_context(ce, ce->guc_id);
> +	} else {
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = register_context(ce);
> +	}
> +
> +	return ret;
>   }
>   
>   static int guc_context_pre_pin(struct intel_context *ce,
> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
>   
>   static int guc_context_pin(struct intel_context *ce, void *vaddr)
>   {
> +	if (i915_ggtt_offset(ce->state) !=
> +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> +
>   	return lrc_pin(ce, ce->engine, vaddr);
>   }
>   
> +static void guc_context_unpin(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	unpin_guc_id(guc, ce);
> +	lrc_unpin(ce);
> +}
> +
> +static void guc_context_post_unpin(struct intel_context *ce)
> +{
> +	lrc_post_unpin(ce);
> +}
> +
> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
Use ce_to_guc?

> +	unsigned long flags;
> +
> +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	set_context_destroyed(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	deregister_context(ce, ce->guc_id);
> +}
> +
> +static void guc_context_destroy(struct kref *kref)
> +{
> +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
Also ce_to_guc?

> +	intel_wakeref_t wakeref;
> +	unsigned long flags;
> +
> +	/*
> +	 * If the guc_id is invalid this context has been stolen and we can free
> +	 * it immediately. Also can be freed immediately if the context is not
> +	 * registered with the GuC.
> +	 */
> +	if (context_guc_id_invalid(ce) ||
> +	    !lrc_desc_registered(guc, ce->guc_id)) {
> +		release_guc_id(guc, ce);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	/*
> +	 * We have to acquire the context spinlock and check guc_id again, if it
> +	 * is valid it hasn't been stolen and needs to be deregistered. We
> +	 * delete this context from the list of unpinned guc_ids available to
> +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
stole -> steal

> +	 * returns indicating this context has been deregistered the guc_id is
> +	 * returned to the pool of available guc_ids.
> +	 */
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (context_guc_id_invalid(ce)) {
> +		__release_guc_id(guc, ce);
> +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * We defer GuC context deregistration until the context is destroyed
> +	 * in order to save on CTBs. With this optimization ideally we only need
> +	 * 1 CTB to register the context during the first pin and 1 CTB to
> +	 * deregister the context when the context is destroyed. Without this
> +	 * optimization, a CTB would be needed every pin & unpin.
> +	 *
> +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
FIXME rather than XXX?

> +	 * from context_free_worker when not runtime wakeref is held.
when runtime wakeref is not held

> +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> +	 * in H2G CTB to deregister the context. A future patch may defer this
> +	 * H2G CTB if the runtime wakeref is zero.
> +	 */
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		guc_lrc_desc_unpin(ce);
> +}
> +
> +static int guc_context_alloc(struct intel_context *ce)
> +{
> +	return lrc_alloc(ce, ce->engine);
> +}
> +
>   static const struct intel_context_ops guc_context_ops = {
>   	.alloc = guc_context_alloc,
>   
>   	.pre_pin = guc_context_pre_pin,
>   	.pin = guc_context_pin,
> -	.unpin = lrc_unpin,
> -	.post_unpin = lrc_post_unpin,
> +	.unpin = guc_context_unpin,
> +	.post_unpin = guc_context_post_unpin,
>   
>   	.enter = intel_context_enter_engine,
>   	.exit = intel_context_exit_engine,
>   
>   	.reset = lrc_reset,
> -	.destroy = lrc_destroy,
> +	.destroy = guc_context_destroy,
>   };
>   
> -static int guc_request_alloc(struct i915_request *request)
> +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>   {
> +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> +}
> +
> +static int guc_request_alloc(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +	struct intel_guc *guc = ce_to_guc(ce);
>   	int ret;
>   
> -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>   
>   	/*
>   	 * Flush enough space to reduce the likelihood of waiting after
>   	 * we start building the request - in which case we will just
>   	 * have to repeat work.
>   	 */
> -	request->reserved_space += GUC_REQUEST_SIZE;
> +	rq->reserved_space += GUC_REQUEST_SIZE;
>   
>   	/*
>   	 * Note that after this point, we have committed to using
> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
>   	 */
>   
>   	/* Unconditionally invalidate GPU caches and TLBs. */
> -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>   	if (ret)
>   		return ret;
>   
> -	request->reserved_space -= GUC_REQUEST_SIZE;
> -	return 0;
> -}
> +	rq->reserved_space -= GUC_REQUEST_SIZE;
>   
> -static inline void queue_request(struct i915_sched_engine *sched_engine,
> -				 struct i915_request *rq,
> -				 int prio)
> -{
> -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> -	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(sched_engine, prio));
> -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -}
> -
> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> -				     struct i915_request *rq)
> -{
> -	int ret;
> -
> -	__i915_request_submit(rq);
> -
> -	trace_i915_request_in(rq, 0);
> -
> -	guc_set_lrc_tail(rq);
> -	ret = guc_add_request(guc, rq);
> -	if (ret == -EBUSY)
> -		guc->stalled_request = rq;
> -
> -	return ret;
> -}
> -
> -static void guc_submit_request(struct i915_request *rq)
> -{
> -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> -	unsigned long flags;
> +	/*
> +	 * Call pin_guc_id here rather than in the pinning step as with
> +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> +	 * guc_ids and creating horrible race conditions. This is especially bad
> +	 * when guc_ids are being stolen due to over subscription. By the time
> +	 * this function is reached, it is guaranteed that the guc_id will be
> +	 * persistent until the generated request is retired. Thus, sealing these
> +	 * race conditions. It is still safe to fail here if guc_ids are
> +	 * exhausted and return -EAGAIN to the user indicating that they can try
> +	 * again in the future.
> +	 *
> +	 * There is no need for a lock here as the timeline mutex ensures at
> +	 * most one context can be executing this code path at once. The
> +	 * guc_id_ref is incremented once for every request in flight and
> +	 * decremented on each retire. When it is zero, a lock around the
> +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> +	 */
> +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> +		return 0;
>   
> -	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&sched_engine->lock, flags);
> +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> +	if (unlikely(ret < 0))
> +		return ret;;
;;

> +	if (context_needs_register(ce, !!ret)) {
> +		ret = guc_lrc_desc_pin(ce);
> +		if (unlikely(ret)) {	/* unwind */
> +			atomic_dec(&ce->guc_id_ref);
> +			unpin_guc_id(guc, ce);
> +			return ret;
> +		}
> +	}
>   
> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> -		queue_request(sched_engine, rq, rq_prio(rq));
> -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> -		tasklet_hi_schedule(&sched_engine->tasklet);
> +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>   
> -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	return 0;
>   }
>   
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>   	engine->submit_request = guc_submit_request;
>   }
>   
> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> +					  struct intel_context *ce)
> +{
> +	if (context_guc_id_invalid(ce))
> +		pin_guc_id(guc, ce);
> +	guc_lrc_desc_pin(ce);
> +}
> +
> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	/* make sure all descriptors are clean... */
> +	xa_destroy(&guc->context_lookup);
> +
> +	/*
> +	 * Some contexts might have been pinned before we enabled GuC
> +	 * submission, so we need to add them to the GuC bookeeping.
> +	 * Also, after a reset the GuC we want to make sure that the information
reset of the GuC

> +	 * shared with GuC is properly reset. The kernel lrcs are not attached
LRCs

> +	 * to the gem_context, so they need to be added separately.
> +	 *
> +	 * Note: we purposely do not check the error return of
check the return of gldp for errors
> +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> +	 * enough here (we're either only pinning a handful of lrc on first boot
LRCs and same on line below

> +	 * or we're re-pinning lrcs that were already pinned before the reset).
> +	 * Two, if the GuC has died and CTBs can't make forward progress.
> +	 * Presumably, the GuC should be alive as this function is called on
> +	 * driver load or after a reset. Even if it is dead, another full GPU
GPU -> GT

> +	 * reset will be triggered and this function would be called again.
> +	 */
> +
> +	for_each_engine(engine, gt, id)
> +		if (engine->kernel_context)
> +			guc_kernel_context_pin(guc, engine->kernel_context);
> +}
> +
>   static void guc_release(struct intel_engine_cs *engine)
>   {
>   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   void intel_guc_submission_enable(struct intel_guc *guc)
>   {
> +	guc_init_lrc_mapping(guc);
>   }
>   
>   void intel_guc_submission_disable(struct intel_guc *guc)
> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
>   	guc->submission_selected = __guc_submission_selected(guc);
>   }
> +
> +static inline struct intel_context *
> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> +{
> +	struct intel_context *ce;
> +
> +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Invalid desc_idx %u", desc_idx);
Agree these should not be BUG_ONs or similar, but a drm_err seems 
appropriate. If either this or the error below occurs the something has 
gone quite wrong and it should be reported.

> +		return NULL;
> +	}
> +
> +	ce = __get_context(guc, desc_idx);
> +	if (unlikely(!ce)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Context is NULL, desc_idx %u", desc_idx);
> +		return NULL;
> +	}
> +
> +	return ce;
> +}
> +
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg,
> +					  u32 len)
> +{
> +	struct intel_context *ce;
> +	u32 desc_idx = msg[0];
> +
> +	if (unlikely(len < 1)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
As above, should be a drm_err I would say.

> +		return -EPROTO;
> +	}
> +
> +	ce = g2h_context_lookup(guc, desc_idx);
> +	if (unlikely(!ce))
> +		return -EPROTO;
> +
> +	if (context_wait_for_deregister_to_register(ce)) {
> +		struct intel_runtime_pm *runtime_pm =
> +			&ce->engine->gt->i915->runtime_pm;
> +		intel_wakeref_t wakeref;
> +
> +		/*
> +		 * Previous owner of this guc_id has been deregistered, now safe
> +		 * register this context.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			register_context(ce);
> +		clr_context_wait_for_deregister_to_register(ce);
> +		intel_context_put(ce);
> +	} else if (context_destroyed(ce)) {
> +		/* Context has been destroyed */
> +		release_guc_id(guc, ce);
> +		lrc_destroy(&ce->ref);
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 943fe485c662..204c95c39353 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -4142,6 +4142,7 @@ enum {
>   	FAULT_AND_CONTINUE /* Unsupported */
>   };
>   
> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>   #define GEN8_CTX_VALID (1 << 0)
>   #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>   #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 86b4c9f2613d..b48c4905d3fc 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
>   	 */
>   	if (!list_empty(&rq->sched.link))
>   		remove_from_engine(rq);
> +	atomic_dec(&rq->context->guc_id_ref);
>   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>   
>   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
@ 2021-07-20  0:23     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  0:23 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> Implement GuC context operations which includes GuC specific operations
> alloc, pin, unpin, and destroy.
>
> v2:
>   (Daniel Vetter)
>    - Use msleep_interruptible rather than cond_resched in busy loop
>   (Michal)
>    - Remove C++ style comment
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
>   drivers/gpu/drm/i915/i915_reg.h               |   1 +
>   drivers/gpu/drm/i915/i915_request.c           |   1 +
>   8 files changed, 685 insertions(+), 55 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index bd63813c8a80..32fd6647154b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   
>   	mutex_init(&ce->pin_mutex);
>   
> +	spin_lock_init(&ce->guc_state.lock);
> +
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +	INIT_LIST_HEAD(&ce->guc_id_link);
> +
>   	i915_active_init(&ce->active,
>   			 __intel_context_active, __intel_context_retire, 0);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 6d99631d19b9..606c480aec26 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -96,6 +96,7 @@ struct intel_context {
>   #define CONTEXT_BANNED			6
>   #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
>   #define CONTEXT_NOPREEMPT		8
> +#define CONTEXT_LRCA_DIRTY		9
>   
>   	struct {
>   		u64 timeout_us;
> @@ -138,14 +139,29 @@ struct intel_context {
>   
>   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
>   
> +	struct {
> +		/** lock: protects everything in guc_state */
> +		spinlock_t lock;
> +		/**
> +		 * sched_state: scheduling state of this context using GuC
> +		 * submission
> +		 */
> +		u8 sched_state;
> +	} guc_state;
> +
>   	/* GuC scheduling state flags that do not require a lock. */
>   	atomic_t guc_sched_state_no_lock;
>   
> +	/* GuC LRC descriptor ID */
> +	u16 guc_id;
> +
> +	/* GuC LRC descriptor reference count */
> +	atomic_t guc_id_ref;
> +
>   	/*
> -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> -	 * in the series will.
> +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
>   	 */
> -	u16 guc_id;
> +	struct list_head guc_id_link;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> index 41e5350a7a05..49d4857ad9b7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> @@ -87,7 +87,6 @@
>   #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
>   
>   #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
>   #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
>   /* in Gen12 ID 0x7FF is reserved to indicate idle */
>   #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 8c7b92f699f1..30773cd699f5 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -7,6 +7,7 @@
>   #define _INTEL_GUC_H_
>   
>   #include <linux/xarray.h>
> +#include <linux/delay.h>
>   
>   #include "intel_uncore.h"
>   #include "intel_guc_fw.h"
> @@ -44,6 +45,14 @@ struct intel_guc {
>   		void (*disable)(struct intel_guc *guc);
>   	} interrupts;
>   
> +	/*
> +	 * contexts_lock protects the pool of free guc ids and a linked list of
> +	 * guc ids available to be stolen
> +	 */
> +	spinlock_t contexts_lock;
> +	struct ida guc_ids;
> +	struct list_head guc_id_list;
> +
>   	bool submission_selected;
>   
>   	struct i915_vma *ads_vma;
> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>   				 response_buf, response_buf_size, 0);
>   }
>   
> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> +					   const u32 *action,
> +					   u32 len,
> +					   bool loop)
> +{
> +	int err;
> +	unsigned int sleep_period_ms = 1;
> +	bool not_atomic = !in_atomic() && !irqs_disabled();
> +
> +	/* No sleeping with spin locks, just busy loop */
> +	might_sleep_if(loop && not_atomic);
> +
> +retry:
> +	err = intel_guc_send_nb(guc, action, len);
> +	if (unlikely(err == -EBUSY && loop)) {
> +		if (likely(not_atomic)) {
> +			if (msleep_interruptible(sleep_period_ms))
> +				return -EINTR;
> +			sleep_period_ms = sleep_period_ms << 1;
> +		} else {
> +			cpu_relax();
> +		}
> +		goto retry;
> +	}
> +
> +	return err;
> +}
> +
>   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>   {
>   	intel_guc_ct_event_handler(&guc->ct);
> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   int intel_guc_reset_engine(struct intel_guc *guc,
>   			   struct intel_engine_cs *engine);
>   
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg, u32 len);
> +
>   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 83ec60ea3f89..28ff82c5be45 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>   	case INTEL_GUC_ACTION_DEFAULT:
>   		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>   		break;
> +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> +							    len);
> +		break;
>   	default:
>   		ret = -EOPNOTSUPP;
>   		break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 53b4a5eb4a85..a47b3813b4d0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -13,7 +13,9 @@
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_irq.h"
>   #include "gt/intel_gt_pm.h"
> +#include "gt/intel_gt_requests.h"
>   #include "gt/intel_lrc.h"
> +#include "gt/intel_lrc_reg.h"
>   #include "gt/intel_mocs.h"
>   #include "gt/intel_ring.h"
>   
> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
>   		   &ce->guc_sched_state_no_lock);
>   }
>   
> +/*
> + * Below is a set of functions which control the GuC scheduling state which
> + * require a lock, aside from the special case where the functions are called
> + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> + * path to be executing on the context.
> + */
> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> +#define SCHED_STATE_DESTROYED				BIT(1)
> +static inline void init_sched_state(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> +	ce->guc_state.sched_state = 0;
> +}
> +
> +static inline bool
> +context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state &
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> +}
> +
> +static inline void
> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	ce->guc_state.sched_state |=
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> +}
> +
> +static inline void
> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state =
> +		(ce->guc_state.sched_state &
> +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> +}
> +
> +static inline bool
> +context_destroyed(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> +}
> +
> +static inline void
> +set_context_destroyed(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> +}
> +
> +static inline bool context_guc_id_invalid(struct intel_context *ce)
> +{
> +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> +}
> +
> +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> +{
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +}
> +
> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> +{
> +	return &ce->engine->gt->uc.guc;
> +}
> +
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
>   	return rb_entry(rb, struct i915_priolist, node);
> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   	int len = 0;
>   	bool enabled = context_enabled(ce);
>   
> +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> +	GEM_BUG_ON(context_guc_id_invalid(ce));
> +
>   	if (!enabled) {
>   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>   		action[len++] = ce->guc_id;
> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   
>   	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>   
> +	spin_lock_init(&guc->contexts_lock);
> +	INIT_LIST_HEAD(&guc->guc_id_list);
> +	ida_init(&guc->guc_ids);
> +
>   	return 0;
>   }
>   
> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>   	i915_sched_engine_put(guc->sched_engine);
>   }
>   
> -static int guc_context_alloc(struct intel_context *ce)
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
> +				 struct i915_request *rq,
> +				 int prio)
>   {
> -	return lrc_alloc(ce, ce->engine);
> +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> +	list_add_tail(&rq->sched.link,
> +		      i915_sched_lookup_priolist(sched_engine, prio));
> +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +}
> +
> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> +				     struct i915_request *rq)
> +{
> +	int ret;
> +
> +	__i915_request_submit(rq);
> +
> +	trace_i915_request_in(rq, 0);
> +
> +	guc_set_lrc_tail(rq);
> +	ret = guc_add_request(guc, rq);
> +	if (ret == -EBUSY)
> +		guc->stalled_request = rq;
> +
> +	return ret;
> +}
> +
> +static void guc_submit_request(struct i915_request *rq)
> +{
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	/* Will be called from irq-context when using foreign fences. */
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +
> +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +		queue_request(sched_engine, rq, rq_prio(rq));
> +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> +		tasklet_hi_schedule(&sched_engine->tasklet);
> +
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
Reserved for what?

> +static int new_guc_id(struct intel_guc *guc)
> +{
> +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> +}
> +
> +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	if (!context_guc_id_invalid(ce)) {
> +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> +		reset_lrc_desc(guc, ce->guc_id);
> +		set_context_guc_id_invalid(ce);
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +}
> +
> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	__release_guc_id(guc, ce);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int steal_guc_id(struct intel_guc *guc)
> +{
> +	struct intel_context *ce;
> +	int guc_id;
> +
> +	if (!list_empty(&guc->guc_id_list)) {
> +		ce = list_first_entry(&guc->guc_id_list,
> +				      struct intel_context,
> +				      guc_id_link);
> +
> +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +		GEM_BUG_ON(context_guc_id_invalid(ce));
> +
> +		list_del_init(&ce->guc_id_link);
> +		guc_id = ce->guc_id;
> +		set_context_guc_id_invalid(ce);
> +		return guc_id;
> +	} else {
> +		return -EAGAIN;
> +	}
> +}
> +
> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> +{
> +	int ret;
> +
> +	ret = new_guc_id(guc);
> +	if (unlikely(ret < 0)) {
> +		ret = steal_guc_id(guc);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	*out = ret;
> +	return 0;
> +}
> +
> +#define PIN_GUC_ID_TRIES	4
> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	int ret = 0;
> +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +
> +try_again:
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +
> +	if (context_guc_id_invalid(ce)) {
> +		ret = assign_guc_id(guc, &ce->guc_id);
> +		if (ret)
> +			goto out_unlock;
> +		ret = 1;	/* Indidcates newly assigned guc_id */
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	atomic_inc(&ce->guc_id_ref);
> +
> +out_unlock:
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> +	 * outstanding requests to see if that frees up a guc_id. If the first
> +	 * retire didn't help, insert a sleep with the timeslice duration before
> +	 * attempting to retire more requests. Double the sleep period each
> +	 * subsequent pass before finally giving up. The sleep period has max of
> +	 * 100ms and minimum of 1ms.
> +	 */
> +	if (ret == -EAGAIN && --tries) {
> +		if (PIN_GUC_ID_TRIES - tries > 1) {
> +			unsigned int timeslice_shifted =
> +				ce->engine->props.timeslice_duration_ms <<
> +				(PIN_GUC_ID_TRIES - tries - 2);
> +			unsigned int max = min_t(unsigned int, 100,
> +						 timeslice_shifted);
> +
> +			msleep(max_t(unsigned int, max, 1));
> +		}
> +		intel_gt_retire_requests(guc_to_gt(guc));
> +		goto try_again;
> +	}
> +
> +	return ret;
> +}
> +
> +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> +	    !atomic_read(&ce->guc_id_ref))
> +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int __guc_action_register_context(struct intel_guc *guc,
> +					 u32 guc_id,
> +					 u32 offset)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> +		guc_id,
> +		offset,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int register_context(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> +		ce->guc_id * sizeof(struct guc_lrc_desc);
> +
> +	return __guc_action_register_context(guc, ce->guc_id, offset);
> +}
> +
> +static int __guc_action_deregister_context(struct intel_guc *guc,
> +					   u32 guc_id)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> +		guc_id,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int deregister_context(struct intel_context *ce, u32 guc_id)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	return __guc_action_deregister_context(guc, guc_id);
> +}
> +
> +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> +{
> +	switch (class) {
> +	case RENDER_CLASS:
> +		return mask >> RCS0;
> +	case VIDEO_ENHANCEMENT_CLASS:
> +		return mask >> VECS0;
> +	case VIDEO_DECODE_CLASS:
> +		return mask >> VCS0;
> +	case COPY_ENGINE_CLASS:
> +		return mask >> BCS0;
> +	default:
> +		GEM_BUG_ON("Invalid Class");
> +		return 0;
> +	}
> +}
> +
> +static void guc_context_policy_init(struct intel_engine_cs *engine,
> +				    struct guc_lrc_desc *desc)
> +{
> +	desc->policy_flags = 0;
> +
> +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> +}
> +
> +static int guc_lrc_desc_pin(struct intel_context *ce)
> +{
> +	struct intel_runtime_pm *runtime_pm =
> +		&ce->engine->gt->i915->runtime_pm;
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
> +	u32 desc_idx = ce->guc_id;
> +	struct guc_lrc_desc *desc;
> +	bool context_registered;
> +	intel_wakeref_t wakeref;
> +	int ret = 0;
> +
> +	GEM_BUG_ON(!engine->mask);
> +
> +	/*
> +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> +	 * based on CT vma region.
> +	 */
> +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> +
> +	context_registered = lrc_desc_registered(guc, desc_idx);
> +
> +	reset_lrc_desc(guc, desc_idx);
> +	set_lrc_desc_registered(guc, desc_idx, ce);
> +
> +	desc = __get_lrc_desc(guc, desc_idx);
> +	desc->engine_class = engine_class_to_guc_class(engine->class);
> +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> +						      engine->mask);
> +	desc->hw_context_desc = ce->lrc.lrca;
> +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> +	guc_context_policy_init(engine, desc);
> +	init_sched_state(ce);
> +
> +	/*
> +	 * The context_lookup xarray is used to determine if the hardware
> +	 * context is currently registered. There are two cases in which it
> +	 * could be regisgered either the guc_id has been stole from from
'regisgered' and 'stole' should be 'stolen'

> +	 * another context or the lrc descriptor address of this context has
> +	 * changed. In either case the context needs to be deregistered with the
> +	 * GuC before registering this context.
> +	 */
> +	if (context_registered) {
> +		set_context_wait_for_deregister_to_register(ce);
> +		intel_context_get(ce);
> +
> +		/*
> +		 * If stealing the guc_id, this ce has the same guc_id as the
> +		 * context whos guc_id was stole.
whose guc_id was stolen

> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = deregister_context(ce, ce->guc_id);
> +	} else {
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = register_context(ce);
> +	}
> +
> +	return ret;
>   }
>   
>   static int guc_context_pre_pin(struct intel_context *ce,
> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
>   
>   static int guc_context_pin(struct intel_context *ce, void *vaddr)
>   {
> +	if (i915_ggtt_offset(ce->state) !=
> +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> +
>   	return lrc_pin(ce, ce->engine, vaddr);
>   }
>   
> +static void guc_context_unpin(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	unpin_guc_id(guc, ce);
> +	lrc_unpin(ce);
> +}
> +
> +static void guc_context_post_unpin(struct intel_context *ce)
> +{
> +	lrc_post_unpin(ce);
> +}
> +
> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
Use ce_to_guc?

> +	unsigned long flags;
> +
> +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	set_context_destroyed(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	deregister_context(ce, ce->guc_id);
> +}
> +
> +static void guc_context_destroy(struct kref *kref)
> +{
> +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
Also ce_to_guc?

> +	intel_wakeref_t wakeref;
> +	unsigned long flags;
> +
> +	/*
> +	 * If the guc_id is invalid this context has been stolen and we can free
> +	 * it immediately. Also can be freed immediately if the context is not
> +	 * registered with the GuC.
> +	 */
> +	if (context_guc_id_invalid(ce) ||
> +	    !lrc_desc_registered(guc, ce->guc_id)) {
> +		release_guc_id(guc, ce);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	/*
> +	 * We have to acquire the context spinlock and check guc_id again, if it
> +	 * is valid it hasn't been stolen and needs to be deregistered. We
> +	 * delete this context from the list of unpinned guc_ids available to
> +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
stole -> steal

> +	 * returns indicating this context has been deregistered the guc_id is
> +	 * returned to the pool of available guc_ids.
> +	 */
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (context_guc_id_invalid(ce)) {
> +		__release_guc_id(guc, ce);
> +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * We defer GuC context deregistration until the context is destroyed
> +	 * in order to save on CTBs. With this optimization ideally we only need
> +	 * 1 CTB to register the context during the first pin and 1 CTB to
> +	 * deregister the context when the context is destroyed. Without this
> +	 * optimization, a CTB would be needed every pin & unpin.
> +	 *
> +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
FIXME rather than XXX?

> +	 * from context_free_worker when not runtime wakeref is held.
when runtime wakeref is not held

> +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> +	 * in H2G CTB to deregister the context. A future patch may defer this
> +	 * H2G CTB if the runtime wakeref is zero.
> +	 */
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		guc_lrc_desc_unpin(ce);
> +}
> +
> +static int guc_context_alloc(struct intel_context *ce)
> +{
> +	return lrc_alloc(ce, ce->engine);
> +}
> +
>   static const struct intel_context_ops guc_context_ops = {
>   	.alloc = guc_context_alloc,
>   
>   	.pre_pin = guc_context_pre_pin,
>   	.pin = guc_context_pin,
> -	.unpin = lrc_unpin,
> -	.post_unpin = lrc_post_unpin,
> +	.unpin = guc_context_unpin,
> +	.post_unpin = guc_context_post_unpin,
>   
>   	.enter = intel_context_enter_engine,
>   	.exit = intel_context_exit_engine,
>   
>   	.reset = lrc_reset,
> -	.destroy = lrc_destroy,
> +	.destroy = guc_context_destroy,
>   };
>   
> -static int guc_request_alloc(struct i915_request *request)
> +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>   {
> +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> +}
> +
> +static int guc_request_alloc(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +	struct intel_guc *guc = ce_to_guc(ce);
>   	int ret;
>   
> -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>   
>   	/*
>   	 * Flush enough space to reduce the likelihood of waiting after
>   	 * we start building the request - in which case we will just
>   	 * have to repeat work.
>   	 */
> -	request->reserved_space += GUC_REQUEST_SIZE;
> +	rq->reserved_space += GUC_REQUEST_SIZE;
>   
>   	/*
>   	 * Note that after this point, we have committed to using
> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
>   	 */
>   
>   	/* Unconditionally invalidate GPU caches and TLBs. */
> -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>   	if (ret)
>   		return ret;
>   
> -	request->reserved_space -= GUC_REQUEST_SIZE;
> -	return 0;
> -}
> +	rq->reserved_space -= GUC_REQUEST_SIZE;
>   
> -static inline void queue_request(struct i915_sched_engine *sched_engine,
> -				 struct i915_request *rq,
> -				 int prio)
> -{
> -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> -	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(sched_engine, prio));
> -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -}
> -
> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> -				     struct i915_request *rq)
> -{
> -	int ret;
> -
> -	__i915_request_submit(rq);
> -
> -	trace_i915_request_in(rq, 0);
> -
> -	guc_set_lrc_tail(rq);
> -	ret = guc_add_request(guc, rq);
> -	if (ret == -EBUSY)
> -		guc->stalled_request = rq;
> -
> -	return ret;
> -}
> -
> -static void guc_submit_request(struct i915_request *rq)
> -{
> -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> -	unsigned long flags;
> +	/*
> +	 * Call pin_guc_id here rather than in the pinning step as with
> +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> +	 * guc_ids and creating horrible race conditions. This is especially bad
> +	 * when guc_ids are being stolen due to over subscription. By the time
> +	 * this function is reached, it is guaranteed that the guc_id will be
> +	 * persistent until the generated request is retired. Thus, sealing these
> +	 * race conditions. It is still safe to fail here if guc_ids are
> +	 * exhausted and return -EAGAIN to the user indicating that they can try
> +	 * again in the future.
> +	 *
> +	 * There is no need for a lock here as the timeline mutex ensures at
> +	 * most one context can be executing this code path at once. The
> +	 * guc_id_ref is incremented once for every request in flight and
> +	 * decremented on each retire. When it is zero, a lock around the
> +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> +	 */
> +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> +		return 0;
>   
> -	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&sched_engine->lock, flags);
> +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> +	if (unlikely(ret < 0))
> +		return ret;;
;;

> +	if (context_needs_register(ce, !!ret)) {
> +		ret = guc_lrc_desc_pin(ce);
> +		if (unlikely(ret)) {	/* unwind */
> +			atomic_dec(&ce->guc_id_ref);
> +			unpin_guc_id(guc, ce);
> +			return ret;
> +		}
> +	}
>   
> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> -		queue_request(sched_engine, rq, rq_prio(rq));
> -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> -		tasklet_hi_schedule(&sched_engine->tasklet);
> +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>   
> -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	return 0;
>   }
>   
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>   	engine->submit_request = guc_submit_request;
>   }
>   
> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> +					  struct intel_context *ce)
> +{
> +	if (context_guc_id_invalid(ce))
> +		pin_guc_id(guc, ce);
> +	guc_lrc_desc_pin(ce);
> +}
> +
> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	/* make sure all descriptors are clean... */
> +	xa_destroy(&guc->context_lookup);
> +
> +	/*
> +	 * Some contexts might have been pinned before we enabled GuC
> +	 * submission, so we need to add them to the GuC bookeeping.
> +	 * Also, after a reset the GuC we want to make sure that the information
reset of the GuC

> +	 * shared with GuC is properly reset. The kernel lrcs are not attached
LRCs

> +	 * to the gem_context, so they need to be added separately.
> +	 *
> +	 * Note: we purposely do not check the error return of
check the return of gldp for errors
> +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> +	 * enough here (we're either only pinning a handful of lrc on first boot
LRCs and same on line below

> +	 * or we're re-pinning lrcs that were already pinned before the reset).
> +	 * Two, if the GuC has died and CTBs can't make forward progress.
> +	 * Presumably, the GuC should be alive as this function is called on
> +	 * driver load or after a reset. Even if it is dead, another full GPU
GPU -> GT

> +	 * reset will be triggered and this function would be called again.
> +	 */
> +
> +	for_each_engine(engine, gt, id)
> +		if (engine->kernel_context)
> +			guc_kernel_context_pin(guc, engine->kernel_context);
> +}
> +
>   static void guc_release(struct intel_engine_cs *engine)
>   {
>   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   void intel_guc_submission_enable(struct intel_guc *guc)
>   {
> +	guc_init_lrc_mapping(guc);
>   }
>   
>   void intel_guc_submission_disable(struct intel_guc *guc)
> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
>   	guc->submission_selected = __guc_submission_selected(guc);
>   }
> +
> +static inline struct intel_context *
> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> +{
> +	struct intel_context *ce;
> +
> +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Invalid desc_idx %u", desc_idx);
Agree these should not be BUG_ONs or similar, but a drm_err seems 
appropriate. If either this or the error below occurs the something has 
gone quite wrong and it should be reported.

> +		return NULL;
> +	}
> +
> +	ce = __get_context(guc, desc_idx);
> +	if (unlikely(!ce)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Context is NULL, desc_idx %u", desc_idx);
> +		return NULL;
> +	}
> +
> +	return ce;
> +}
> +
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg,
> +					  u32 len)
> +{
> +	struct intel_context *ce;
> +	u32 desc_idx = msg[0];
> +
> +	if (unlikely(len < 1)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
As above, should be a drm_err I would say.

> +		return -EPROTO;
> +	}
> +
> +	ce = g2h_context_lookup(guc, desc_idx);
> +	if (unlikely(!ce))
> +		return -EPROTO;
> +
> +	if (context_wait_for_deregister_to_register(ce)) {
> +		struct intel_runtime_pm *runtime_pm =
> +			&ce->engine->gt->i915->runtime_pm;
> +		intel_wakeref_t wakeref;
> +
> +		/*
> +		 * Previous owner of this guc_id has been deregistered, now safe
> +		 * register this context.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			register_context(ce);
> +		clr_context_wait_for_deregister_to_register(ce);
> +		intel_context_put(ce);
> +	} else if (context_destroyed(ce)) {
> +		/* Context has been destroyed */
> +		release_guc_id(guc, ce);
> +		lrc_destroy(&ce->ref);
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 943fe485c662..204c95c39353 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -4142,6 +4142,7 @@ enum {
>   	FAULT_AND_CONTINUE /* Unsupported */
>   };
>   
> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>   #define GEN8_CTX_VALID (1 << 0)
>   #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>   #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 86b4c9f2613d..b48c4905d3fc 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
>   	 */
>   	if (!list_empty(&rq->sched.link))
>   		remove_from_engine(rq);
> +	atomic_dec(&rq->context->guc_id_ref);
>   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>   
>   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
  2021-07-19 22:55       ` [Intel-gfx] " Matthew Brost
@ 2021-07-20  0:26         ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  0:26 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On 7/19/2021 15:55, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 04:01:56PM -0700, John Harrison wrote:
>> On 7/16/2021 13:16, Matthew Brost wrote:
>>> Implement GuC submission tasklet for new interface. The new GuC
>>> interface uses H2G to submit contexts to the GuC. Since H2G use a single
>>> channel, a single tasklet submits is used for the submission path.
>> This still needs fixing - 'a single tasklet submits is used' is not valid
>> English.
>>
> Will fix.
>
>> It also seems that the idea of splitting all the deletes of old code into a
>> separate patch didn't happen. It really does obfuscate things significantly
>> having completely unrelated deletes and adds interspersed :(.
>>
> I don't recall promising to do that.
>
> Matt
"No promises but perhaps I'll do this in the next rev."

Well, this is the next rev. So I am expressing my disappointment that it 
didn't happen. Reviewability of patches is important.

John.


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet
@ 2021-07-20  0:26         ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  0:26 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel

On 7/19/2021 15:55, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 04:01:56PM -0700, John Harrison wrote:
>> On 7/16/2021 13:16, Matthew Brost wrote:
>>> Implement GuC submission tasklet for new interface. The new GuC
>>> interface uses H2G to submit contexts to the GuC. Since H2G use a single
>>> channel, a single tasklet submits is used for the submission path.
>> This still needs fixing - 'a single tasklet submits is used' is not valid
>> English.
>>
> Will fix.
>
>> It also seems that the idea of splitting all the deletes of old code into a
>> separate patch didn't happen. It really does obfuscate things significantly
>> having completely unrelated deletes and adds interspersed :(.
>>
> I don't recall promising to do that.
>
> Matt
"No promises but perhaps I'll do this in the next rev."

Well, this is the next rev. So I am expressing my disappointment that it 
didn't happen. Reviewability of patches is important.

John.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 13/51] drm/i915/guc: Disable semaphores when using GuC scheduling
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20  0:33     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  0:33 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> Semaphores are an optimization and not required for basic GuC submission
> to work properly. Disable until we have time to do the implementation to
> enable semaphores and tune them for performance. Also long direction is
> just to delete semaphores from the i915 so another reason to not enable
> these for GuC submission.
>
> This patch fixes an existing bug where I915_ENGINE_HAS_SEMAPHORES was
> not honored correctly.
Bugs plural. Otherwise:
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

>
> v2: Reword commit message
> v3:
>   (John H)
>    - Add text to commit indicating this also fixing an existing bug
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 7d6f52d8a801..64659802d4df 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -799,7 +799,8 @@ static int intel_context_set_gem(struct intel_context *ce,
>   	}
>   
>   	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> -	    intel_engine_has_timeslices(ce->engine))
> +	    intel_engine_has_timeslices(ce->engine) &&
> +	    intel_engine_has_semaphores(ce->engine))
>   		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
>   
>   	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
> @@ -1778,7 +1779,8 @@ static void __apply_priority(struct intel_context *ce, void *arg)
>   	if (!intel_engine_has_timeslices(ce->engine))
>   		return;
>   
> -	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
> +	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> +	    intel_engine_has_semaphores(ce->engine))
>   		intel_context_set_use_semaphores(ce);
>   	else
>   		intel_context_clear_use_semaphores(ce);


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 13/51] drm/i915/guc: Disable semaphores when using GuC scheduling
@ 2021-07-20  0:33     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  0:33 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> Semaphores are an optimization and not required for basic GuC submission
> to work properly. Disable until we have time to do the implementation to
> enable semaphores and tune them for performance. Also long direction is
> just to delete semaphores from the i915 so another reason to not enable
> these for GuC submission.
>
> This patch fixes an existing bug where I915_ENGINE_HAS_SEMAPHORES was
> not honored correctly.
Bugs plural. Otherwise:
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

>
> v2: Reword commit message
> v3:
>   (John H)
>    - Add text to commit indicating this also fixing an existing bug
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 7d6f52d8a801..64659802d4df 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -799,7 +799,8 @@ static int intel_context_set_gem(struct intel_context *ce,
>   	}
>   
>   	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> -	    intel_engine_has_timeslices(ce->engine))
> +	    intel_engine_has_timeslices(ce->engine) &&
> +	    intel_engine_has_semaphores(ce->engine))
>   		__set_bit(CONTEXT_USE_SEMAPHORES, &ce->flags);
>   
>   	if (IS_ACTIVE(CONFIG_DRM_I915_REQUEST_TIMEOUT) &&
> @@ -1778,7 +1779,8 @@ static void __apply_priority(struct intel_context *ce, void *arg)
>   	if (!intel_engine_has_timeslices(ce->engine))
>   		return;
>   
> -	if (ctx->sched.priority >= I915_PRIORITY_NORMAL)
> +	if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> +	    intel_engine_has_semaphores(ce->engine))
>   		intel_context_set_use_semaphores(ce);
>   	else
>   		intel_context_clear_use_semaphores(ce);

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20  0:51     ` Daniele Ceraolo Spurio
  -1 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-20  0:51 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: john.c.harrison



On 7/16/2021 1:16 PM, Matthew Brost wrote:
> Implement GuC context operations which includes GuC specific operations
> alloc, pin, unpin, and destroy.
>
> v2:
>   (Daniel Vetter)
>    - Use msleep_interruptible rather than cond_resched in busy loop
>   (Michal)
>    - Remove C++ style comment
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
>   drivers/gpu/drm/i915/i915_reg.h               |   1 +
>   drivers/gpu/drm/i915/i915_request.c           |   1 +
>   8 files changed, 685 insertions(+), 55 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index bd63813c8a80..32fd6647154b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   
>   	mutex_init(&ce->pin_mutex);
>   
> +	spin_lock_init(&ce->guc_state.lock);
> +
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +	INIT_LIST_HEAD(&ce->guc_id_link);
> +
>   	i915_active_init(&ce->active,
>   			 __intel_context_active, __intel_context_retire, 0);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 6d99631d19b9..606c480aec26 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -96,6 +96,7 @@ struct intel_context {
>   #define CONTEXT_BANNED			6
>   #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
>   #define CONTEXT_NOPREEMPT		8
> +#define CONTEXT_LRCA_DIRTY		9
>   
>   	struct {
>   		u64 timeout_us;
> @@ -138,14 +139,29 @@ struct intel_context {
>   
>   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
>   
> +	struct {
> +		/** lock: protects everything in guc_state */
> +		spinlock_t lock;
> +		/**
> +		 * sched_state: scheduling state of this context using GuC
> +		 * submission
> +		 */
> +		u8 sched_state;
> +	} guc_state;
> +
>   	/* GuC scheduling state flags that do not require a lock. */
>   	atomic_t guc_sched_state_no_lock;
>   
> +	/* GuC LRC descriptor ID */
> +	u16 guc_id;
> +
> +	/* GuC LRC descriptor reference count */
> +	atomic_t guc_id_ref;
> +
>   	/*
> -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> -	 * in the series will.
> +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
>   	 */
> -	u16 guc_id;
> +	struct list_head guc_id_link;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> index 41e5350a7a05..49d4857ad9b7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> @@ -87,7 +87,6 @@
>   #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
>   
>   #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
>   #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
>   /* in Gen12 ID 0x7FF is reserved to indicate idle */
>   #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 8c7b92f699f1..30773cd699f5 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -7,6 +7,7 @@
>   #define _INTEL_GUC_H_
>   
>   #include <linux/xarray.h>
> +#include <linux/delay.h>
>   
>   #include "intel_uncore.h"
>   #include "intel_guc_fw.h"
> @@ -44,6 +45,14 @@ struct intel_guc {
>   		void (*disable)(struct intel_guc *guc);
>   	} interrupts;
>   
> +	/*
> +	 * contexts_lock protects the pool of free guc ids and a linked list of
> +	 * guc ids available to be stolen
> +	 */
> +	spinlock_t contexts_lock;
> +	struct ida guc_ids;
> +	struct list_head guc_id_list;
> +
>   	bool submission_selected;
>   
>   	struct i915_vma *ads_vma;
> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>   				 response_buf, response_buf_size, 0);
>   }
>   
> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> +					   const u32 *action,
> +					   u32 len,
> +					   bool loop)
> +{
> +	int err;
> +	unsigned int sleep_period_ms = 1;
> +	bool not_atomic = !in_atomic() && !irqs_disabled();
> +
> +	/* No sleeping with spin locks, just busy loop */
> +	might_sleep_if(loop && not_atomic);
> +
> +retry:
> +	err = intel_guc_send_nb(guc, action, len);
> +	if (unlikely(err == -EBUSY && loop)) {
> +		if (likely(not_atomic)) {
> +			if (msleep_interruptible(sleep_period_ms))
> +				return -EINTR;
> +			sleep_period_ms = sleep_period_ms << 1;
> +		} else {
> +			cpu_relax();

Potentially something we can change later, but if we're in atomic 
context we can't keep looping without a timeout while we get -EBUSY, we 
need to bail out early.

> +		}
> +		goto retry;
> +	}
> +
> +	return err;
> +}
> +
>   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>   {
>   	intel_guc_ct_event_handler(&guc->ct);
> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   int intel_guc_reset_engine(struct intel_guc *guc,
>   			   struct intel_engine_cs *engine);
>   
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg, u32 len);
> +
>   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 83ec60ea3f89..28ff82c5be45 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>   	case INTEL_GUC_ACTION_DEFAULT:
>   		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>   		break;
> +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> +							    len);
> +		break;
>   	default:
>   		ret = -EOPNOTSUPP;
>   		break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 53b4a5eb4a85..a47b3813b4d0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -13,7 +13,9 @@
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_irq.h"
>   #include "gt/intel_gt_pm.h"
> +#include "gt/intel_gt_requests.h"
>   #include "gt/intel_lrc.h"
> +#include "gt/intel_lrc_reg.h"
>   #include "gt/intel_mocs.h"
>   #include "gt/intel_ring.h"
>   
> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
>   		   &ce->guc_sched_state_no_lock);
>   }
>   
> +/*
> + * Below is a set of functions which control the GuC scheduling state which
> + * require a lock, aside from the special case where the functions are called
> + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> + * path to be executing on the context.
> + */

Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even if 
it isn't strictly needed? I'd prefer to avoid the asymmetry in the 
locking scheme if possible, as that might case trouble if thing change 
in the future.

> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> +#define SCHED_STATE_DESTROYED				BIT(1)
> +static inline void init_sched_state(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> +	ce->guc_state.sched_state = 0;
> +}
> +
> +static inline bool
> +context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state &
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);

No need for (). Below as well.

> +}
> +
> +static inline void
> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	ce->guc_state.sched_state |=
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> +}
> +
> +static inline void
> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state =
> +		(ce->guc_state.sched_state &
> +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);

nit: can also use

ce->guc_state.sched_state &=
	~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER


> +}
> +
> +static inline bool
> +context_destroyed(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> +}
> +
> +static inline void
> +set_context_destroyed(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> +}
> +
> +static inline bool context_guc_id_invalid(struct intel_context *ce)
> +{
> +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> +}
> +
> +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> +{
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +}
> +
> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> +{
> +	return &ce->engine->gt->uc.guc;
> +}
> +
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
>   	return rb_entry(rb, struct i915_priolist, node);
> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   	int len = 0;
>   	bool enabled = context_enabled(ce);
>   
> +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> +	GEM_BUG_ON(context_guc_id_invalid(ce));
> +
>   	if (!enabled) {
>   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>   		action[len++] = ce->guc_id;
> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   
>   	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>   
> +	spin_lock_init(&guc->contexts_lock);
> +	INIT_LIST_HEAD(&guc->guc_id_list);
> +	ida_init(&guc->guc_ids);
> +
>   	return 0;
>   }
>   
> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>   	i915_sched_engine_put(guc->sched_engine);
>   }
>   
> -static int guc_context_alloc(struct intel_context *ce)
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
> +				 struct i915_request *rq,
> +				 int prio)
>   {
> -	return lrc_alloc(ce, ce->engine);
> +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> +	list_add_tail(&rq->sched.link,
> +		      i915_sched_lookup_priolist(sched_engine, prio));
> +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +}
> +
> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> +				     struct i915_request *rq)
> +{
> +	int ret;
> +
> +	__i915_request_submit(rq);
> +
> +	trace_i915_request_in(rq, 0);
> +
> +	guc_set_lrc_tail(rq);
> +	ret = guc_add_request(guc, rq);
> +	if (ret == -EBUSY)
> +		guc->stalled_request = rq;
> +
> +	return ret;
> +}
> +
> +static void guc_submit_request(struct i915_request *rq)
> +{
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	/* Will be called from irq-context when using foreign fences. */
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +
> +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +		queue_request(sched_engine, rq, rq_prio(rq));
> +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> +		tasklet_hi_schedule(&sched_engine->tasklet);
> +
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
> +static int new_guc_id(struct intel_guc *guc)
> +{
> +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> +}
> +
> +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	if (!context_guc_id_invalid(ce)) {
> +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> +		reset_lrc_desc(guc, ce->guc_id);
> +		set_context_guc_id_invalid(ce);
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +}
> +
> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	__release_guc_id(guc, ce);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int steal_guc_id(struct intel_guc *guc)
> +{
> +	struct intel_context *ce;
> +	int guc_id;
> +
> +	if (!list_empty(&guc->guc_id_list)) {
> +		ce = list_first_entry(&guc->guc_id_list,
> +				      struct intel_context,
> +				      guc_id_link);
> +
> +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +		GEM_BUG_ON(context_guc_id_invalid(ce));
> +
> +		list_del_init(&ce->guc_id_link);
> +		guc_id = ce->guc_id;
> +		set_context_guc_id_invalid(ce);
> +		return guc_id;
> +	} else {
> +		return -EAGAIN;
> +	}
> +}
> +
> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> +{
> +	int ret;
> +
> +	ret = new_guc_id(guc);
> +	if (unlikely(ret < 0)) {
> +		ret = steal_guc_id(guc);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	*out = ret;
> +	return 0;

Is it worth adding spinlock_held asserts for guc->contexts_lock in these 
ID functions? Doubles up as a documentation of what locking we expect.

> +}
> +
> +#define PIN_GUC_ID_TRIES	4
> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	int ret = 0;
> +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +
> +try_again:
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +
> +	if (context_guc_id_invalid(ce)) {
> +		ret = assign_guc_id(guc, &ce->guc_id);
> +		if (ret)
> +			goto out_unlock;
> +		ret = 1;	/* Indidcates newly assigned guc_id */
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	atomic_inc(&ce->guc_id_ref);
> +
> +out_unlock:
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> +	 * outstanding requests to see if that frees up a guc_id. If the first
> +	 * retire didn't help, insert a sleep with the timeslice duration before
> +	 * attempting to retire more requests. Double the sleep period each
> +	 * subsequent pass before finally giving up. The sleep period has max of
> +	 * 100ms and minimum of 1ms.
> +	 */
> +	if (ret == -EAGAIN && --tries) {
> +		if (PIN_GUC_ID_TRIES - tries > 1) {
> +			unsigned int timeslice_shifted =
> +				ce->engine->props.timeslice_duration_ms <<
> +				(PIN_GUC_ID_TRIES - tries - 2);
> +			unsigned int max = min_t(unsigned int, 100,
> +						 timeslice_shifted);
> +
> +			msleep(max_t(unsigned int, max, 1));
> +		}
> +		intel_gt_retire_requests(guc_to_gt(guc));
> +		goto try_again;
> +	}
> +
> +	return ret;
> +}
> +
> +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> +	    !atomic_read(&ce->guc_id_ref))
> +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int __guc_action_register_context(struct intel_guc *guc,
> +					 u32 guc_id,
> +					 u32 offset)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> +		guc_id,
> +		offset,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int register_context(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> +		ce->guc_id * sizeof(struct guc_lrc_desc);
> +
> +	return __guc_action_register_context(guc, ce->guc_id, offset);
> +}
> +
> +static int __guc_action_deregister_context(struct intel_guc *guc,
> +					   u32 guc_id)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> +		guc_id,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int deregister_context(struct intel_context *ce, u32 guc_id)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	return __guc_action_deregister_context(guc, guc_id);
> +}
> +
> +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> +{
> +	switch (class) {
> +	case RENDER_CLASS:
> +		return mask >> RCS0;
> +	case VIDEO_ENHANCEMENT_CLASS:
> +		return mask >> VECS0;
> +	case VIDEO_DECODE_CLASS:
> +		return mask >> VCS0;
> +	case COPY_ENGINE_CLASS:
> +		return mask >> BCS0;
> +	default:
> +		GEM_BUG_ON("Invalid Class");

we usually use MISSING_CASE for this type of errors.

> +		return 0;
> +	}
> +}
> +
> +static void guc_context_policy_init(struct intel_engine_cs *engine,
> +				    struct guc_lrc_desc *desc)
> +{
> +	desc->policy_flags = 0;
> +
> +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> +}
> +
> +static int guc_lrc_desc_pin(struct intel_context *ce)
> +{
> +	struct intel_runtime_pm *runtime_pm =
> +		&ce->engine->gt->i915->runtime_pm;

If you move this after the engine var below you can skip the ce->engine 
jump. Also, you can shorten the pointer chasing by using 
engine->uncore->rpm.

> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
> +	u32 desc_idx = ce->guc_id;
> +	struct guc_lrc_desc *desc;
> +	bool context_registered;
> +	intel_wakeref_t wakeref;
> +	int ret = 0;
> +
> +	GEM_BUG_ON(!engine->mask);
> +
> +	/*
> +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> +	 * based on CT vma region.
> +	 */
> +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> +
> +	context_registered = lrc_desc_registered(guc, desc_idx);
> +
> +	reset_lrc_desc(guc, desc_idx);
> +	set_lrc_desc_registered(guc, desc_idx, ce);
> +
> +	desc = __get_lrc_desc(guc, desc_idx);
> +	desc->engine_class = engine_class_to_guc_class(engine->class);
> +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> +						      engine->mask);
> +	desc->hw_context_desc = ce->lrc.lrca;
> +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> +	guc_context_policy_init(engine, desc);
> +	init_sched_state(ce);
> +
> +	/*
> +	 * The context_lookup xarray is used to determine if the hardware
> +	 * context is currently registered. There are two cases in which it
> +	 * could be regisgered either the guc_id has been stole from from

typo regisgered

> +	 * another context or the lrc descriptor address of this context has
> +	 * changed. In either case the context needs to be deregistered with the
> +	 * GuC before registering this context.
> +	 */
> +	if (context_registered) {
> +		set_context_wait_for_deregister_to_register(ce);
> +		intel_context_get(ce);
> +
> +		/*
> +		 * If stealing the guc_id, this ce has the same guc_id as the
> +		 * context whos guc_id was stole.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = deregister_context(ce, ce->guc_id);
> +	} else {
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = register_context(ce);
> +	}
> +
> +	return ret;
>   }
>   
>   static int guc_context_pre_pin(struct intel_context *ce,
> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
>   
>   static int guc_context_pin(struct intel_context *ce, void *vaddr)
>   {
> +	if (i915_ggtt_offset(ce->state) !=
> +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);

Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being 
re-assigned in there; it looks like it can only change if the context is 
new, but for future-proofing IMO better to re-order here.

> +
>   	return lrc_pin(ce, ce->engine, vaddr);

Could use a comment to say that the GuC context pinning call is delayed 
until request alloc and to look at the comment in the latter function 
for details (or move the explanation here and refer here from request 
alloc).

>   }
>   
> +static void guc_context_unpin(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	unpin_guc_id(guc, ce);
> +	lrc_unpin(ce);
> +}
> +
> +static void guc_context_post_unpin(struct intel_context *ce)
> +{
> +	lrc_post_unpin(ce);

why do we need this function? we can just pass lrc_post_unpin to the ops 
(like we already do).

> +}
> +
> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	set_context_destroyed(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	deregister_context(ce, ce->guc_id);
> +}
> +
> +static void guc_context_destroy(struct kref *kref)
> +{
> +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;

same as above, going through engine->uncore->rpm is shorter

> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> +	intel_wakeref_t wakeref;
> +	unsigned long flags;
> +
> +	/*
> +	 * If the guc_id is invalid this context has been stolen and we can free
> +	 * it immediately. Also can be freed immediately if the context is not
> +	 * registered with the GuC.
> +	 */
> +	if (context_guc_id_invalid(ce) ||
> +	    !lrc_desc_registered(guc, ce->guc_id)) {
> +		release_guc_id(guc, ce);

it feels a bit weird that we call release_guc_id in the case where the 
id is invalid. The code handles it fine, but still the flow doesn't feel 
clean. Not a blocker.

> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	/*
> +	 * We have to acquire the context spinlock and check guc_id again, if it
> +	 * is valid it hasn't been stolen and needs to be deregistered. We
> +	 * delete this context from the list of unpinned guc_ids available to
> +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> +	 * returns indicating this context has been deregistered the guc_id is
> +	 * returned to the pool of available guc_ids.
> +	 */
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (context_guc_id_invalid(ce)) {
> +		__release_guc_id(guc, ce);

But here the call to __release_guc_id is unneded, right? the ce doesn't 
own the ID anymore and both the steal and the release function already 
clean ce->guc_id_link, so there should be nothing left to clean.

> +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * We defer GuC context deregistration until the context is destroyed
> +	 * in order to save on CTBs. With this optimization ideally we only need
> +	 * 1 CTB to register the context during the first pin and 1 CTB to
> +	 * deregister the context when the context is destroyed. Without this
> +	 * optimization, a CTB would be needed every pin & unpin.
> +	 *
> +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
> +	 * from context_free_worker when not runtime wakeref is held.
> +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> +	 * in H2G CTB to deregister the context. A future patch may defer this
> +	 * H2G CTB if the runtime wakeref is zero.
> +	 */
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		guc_lrc_desc_unpin(ce);
> +}
> +
> +static int guc_context_alloc(struct intel_context *ce)
> +{
> +	return lrc_alloc(ce, ce->engine);
> +}
> +
>   static const struct intel_context_ops guc_context_ops = {
>   	.alloc = guc_context_alloc,
>   
>   	.pre_pin = guc_context_pre_pin,
>   	.pin = guc_context_pin,
> -	.unpin = lrc_unpin,
> -	.post_unpin = lrc_post_unpin,
> +	.unpin = guc_context_unpin,
> +	.post_unpin = guc_context_post_unpin,
>   
>   	.enter = intel_context_enter_engine,
>   	.exit = intel_context_exit_engine,
>   
>   	.reset = lrc_reset,
> -	.destroy = lrc_destroy,
> +	.destroy = guc_context_destroy,
>   };
>   
> -static int guc_request_alloc(struct i915_request *request)
> +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>   {
> +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> +}
> +
> +static int guc_request_alloc(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +	struct intel_guc *guc = ce_to_guc(ce);
>   	int ret;
>   
> -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>   
>   	/*
>   	 * Flush enough space to reduce the likelihood of waiting after
>   	 * we start building the request - in which case we will just
>   	 * have to repeat work.
>   	 */
> -	request->reserved_space += GUC_REQUEST_SIZE;
> +	rq->reserved_space += GUC_REQUEST_SIZE;
>   
>   	/*
>   	 * Note that after this point, we have committed to using
> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
>   	 */
>   
>   	/* Unconditionally invalidate GPU caches and TLBs. */
> -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>   	if (ret)
>   		return ret;
>   
> -	request->reserved_space -= GUC_REQUEST_SIZE;
> -	return 0;
> -}
> +	rq->reserved_space -= GUC_REQUEST_SIZE;
>   
> -static inline void queue_request(struct i915_sched_engine *sched_engine,
> -				 struct i915_request *rq,
> -				 int prio)
> -{
> -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> -	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(sched_engine, prio));
> -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -}
> -
> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> -				     struct i915_request *rq)
> -{
> -	int ret;
> -
> -	__i915_request_submit(rq);
> -
> -	trace_i915_request_in(rq, 0);
> -
> -	guc_set_lrc_tail(rq);
> -	ret = guc_add_request(guc, rq);
> -	if (ret == -EBUSY)
> -		guc->stalled_request = rq;
> -
> -	return ret;
> -}
> -
> -static void guc_submit_request(struct i915_request *rq)
> -{
> -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> -	unsigned long flags;
> +	/*
> +	 * Call pin_guc_id here rather than in the pinning step as with
> +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> +	 * guc_ids and creating horrible race conditions. This is especially bad
> +	 * when guc_ids are being stolen due to over subscription. By the time
> +	 * this function is reached, it is guaranteed that the guc_id will be
> +	 * persistent until the generated request is retired. Thus, sealing these
> +	 * race conditions. It is still safe to fail here if guc_ids are
> +	 * exhausted and return -EAGAIN to the user indicating that they can try
> +	 * again in the future.
> +	 *
> +	 * There is no need for a lock here as the timeline mutex ensures at
> +	 * most one context can be executing this code path at once. The
> +	 * guc_id_ref is incremented once for every request in flight and
> +	 * decremented on each retire. When it is zero, a lock around the
> +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> +	 */
> +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> +		return 0;
>   
> -	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&sched_engine->lock, flags);
> +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> +	if (unlikely(ret < 0))
> +		return ret;;

typo ";;"

> +	if (context_needs_register(ce, !!ret)) {
> +		ret = guc_lrc_desc_pin(ce);
> +		if (unlikely(ret)) {	/* unwind */
> +			atomic_dec(&ce->guc_id_ref);
> +			unpin_guc_id(guc, ce);
> +			return ret;
> +		}
> +	}
>   
> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> -		queue_request(sched_engine, rq, rq_prio(rq));
> -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> -		tasklet_hi_schedule(&sched_engine->tasklet);
> +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);


Might be worth moving the pinning to a separate function to keep the 
request_alloc focused on the request. Can be done as a follow up.

>   
> -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	return 0;
>   }
>   
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>   	engine->submit_request = guc_submit_request;
>   }
>   
> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> +					  struct intel_context *ce)
> +{
> +	if (context_guc_id_invalid(ce))
> +		pin_guc_id(guc, ce);
> +	guc_lrc_desc_pin(ce);
> +}
> +
> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	/* make sure all descriptors are clean... */
> +	xa_destroy(&guc->context_lookup);
> +
> +	/*
> +	 * Some contexts might have been pinned before we enabled GuC
> +	 * submission, so we need to add them to the GuC bookeeping.
> +	 * Also, after a reset the GuC we want to make sure that the information
> +	 * shared with GuC is properly reset. The kernel lrcs are not attached
> +	 * to the gem_context, so they need to be added separately.
> +	 *
> +	 * Note: we purposely do not check the error return of
> +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> +	 * enough here (we're either only pinning a handful of lrc on first boot
> +	 * or we're re-pinning lrcs that were already pinned before the reset).
> +	 * Two, if the GuC has died and CTBs can't make forward progress.
> +	 * Presumably, the GuC should be alive as this function is called on
> +	 * driver load or after a reset. Even if it is dead, another full GPU
> +	 * reset will be triggered and this function would be called again.
> +	 */
> +
> +	for_each_engine(engine, gt, id)
> +		if (engine->kernel_context)
> +			guc_kernel_context_pin(guc, engine->kernel_context);
> +}
> +
>   static void guc_release(struct intel_engine_cs *engine)
>   {
>   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   void intel_guc_submission_enable(struct intel_guc *guc)
>   {
> +	guc_init_lrc_mapping(guc);
>   }
>   
>   void intel_guc_submission_disable(struct intel_guc *guc)
> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
>   	guc->submission_selected = __guc_submission_selected(guc);
>   }
> +
> +static inline struct intel_context *
> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> +{
> +	struct intel_context *ce;
> +
> +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Invalid desc_idx %u", desc_idx);
> +		return NULL;
> +	}
> +
> +	ce = __get_context(guc, desc_idx);
> +	if (unlikely(!ce)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Context is NULL, desc_idx %u", desc_idx);
> +		return NULL;
> +	}
> +
> +	return ce;
> +}
> +
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg,
> +					  u32 len)
> +{
> +	struct intel_context *ce;
> +	u32 desc_idx = msg[0];
> +
> +	if (unlikely(len < 1)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> +		return -EPROTO;
> +	}
> +
> +	ce = g2h_context_lookup(guc, desc_idx);
> +	if (unlikely(!ce))
> +		return -EPROTO;
> +
> +	if (context_wait_for_deregister_to_register(ce)) {
> +		struct intel_runtime_pm *runtime_pm =
> +			&ce->engine->gt->i915->runtime_pm;
> +		intel_wakeref_t wakeref;
> +
> +		/*
> +		 * Previous owner of this guc_id has been deregistered, now safe
> +		 * register this context.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			register_context(ce);
> +		clr_context_wait_for_deregister_to_register(ce);
> +		intel_context_put(ce);
> +	} else if (context_destroyed(ce)) {
> +		/* Context has been destroyed */
> +		release_guc_id(guc, ce);
> +		lrc_destroy(&ce->ref);
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 943fe485c662..204c95c39353 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -4142,6 +4142,7 @@ enum {
>   	FAULT_AND_CONTINUE /* Unsupported */
>   };
>   
> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>   #define GEN8_CTX_VALID (1 << 0)
>   #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>   #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 86b4c9f2613d..b48c4905d3fc 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
>   	 */
>   	if (!list_empty(&rq->sched.link))
>   		remove_from_engine(rq);
> +	atomic_dec(&rq->context->guc_id_ref);

Does this work/make sense  if GuC is disabled?

Daniele

>   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>   
>   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
@ 2021-07-20  0:51     ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-20  0:51 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 7/16/2021 1:16 PM, Matthew Brost wrote:
> Implement GuC context operations which includes GuC specific operations
> alloc, pin, unpin, and destroy.
>
> v2:
>   (Daniel Vetter)
>    - Use msleep_interruptible rather than cond_resched in busy loop
>   (Michal)
>    - Remove C++ style comment
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
>   drivers/gpu/drm/i915/i915_reg.h               |   1 +
>   drivers/gpu/drm/i915/i915_request.c           |   1 +
>   8 files changed, 685 insertions(+), 55 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index bd63813c8a80..32fd6647154b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   
>   	mutex_init(&ce->pin_mutex);
>   
> +	spin_lock_init(&ce->guc_state.lock);
> +
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +	INIT_LIST_HEAD(&ce->guc_id_link);
> +
>   	i915_active_init(&ce->active,
>   			 __intel_context_active, __intel_context_retire, 0);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 6d99631d19b9..606c480aec26 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -96,6 +96,7 @@ struct intel_context {
>   #define CONTEXT_BANNED			6
>   #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
>   #define CONTEXT_NOPREEMPT		8
> +#define CONTEXT_LRCA_DIRTY		9
>   
>   	struct {
>   		u64 timeout_us;
> @@ -138,14 +139,29 @@ struct intel_context {
>   
>   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
>   
> +	struct {
> +		/** lock: protects everything in guc_state */
> +		spinlock_t lock;
> +		/**
> +		 * sched_state: scheduling state of this context using GuC
> +		 * submission
> +		 */
> +		u8 sched_state;
> +	} guc_state;
> +
>   	/* GuC scheduling state flags that do not require a lock. */
>   	atomic_t guc_sched_state_no_lock;
>   
> +	/* GuC LRC descriptor ID */
> +	u16 guc_id;
> +
> +	/* GuC LRC descriptor reference count */
> +	atomic_t guc_id_ref;
> +
>   	/*
> -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> -	 * in the series will.
> +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
>   	 */
> -	u16 guc_id;
> +	struct list_head guc_id_link;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> index 41e5350a7a05..49d4857ad9b7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> @@ -87,7 +87,6 @@
>   #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
>   
>   #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
>   #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
>   /* in Gen12 ID 0x7FF is reserved to indicate idle */
>   #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 8c7b92f699f1..30773cd699f5 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -7,6 +7,7 @@
>   #define _INTEL_GUC_H_
>   
>   #include <linux/xarray.h>
> +#include <linux/delay.h>
>   
>   #include "intel_uncore.h"
>   #include "intel_guc_fw.h"
> @@ -44,6 +45,14 @@ struct intel_guc {
>   		void (*disable)(struct intel_guc *guc);
>   	} interrupts;
>   
> +	/*
> +	 * contexts_lock protects the pool of free guc ids and a linked list of
> +	 * guc ids available to be stolen
> +	 */
> +	spinlock_t contexts_lock;
> +	struct ida guc_ids;
> +	struct list_head guc_id_list;
> +
>   	bool submission_selected;
>   
>   	struct i915_vma *ads_vma;
> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>   				 response_buf, response_buf_size, 0);
>   }
>   
> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> +					   const u32 *action,
> +					   u32 len,
> +					   bool loop)
> +{
> +	int err;
> +	unsigned int sleep_period_ms = 1;
> +	bool not_atomic = !in_atomic() && !irqs_disabled();
> +
> +	/* No sleeping with spin locks, just busy loop */
> +	might_sleep_if(loop && not_atomic);
> +
> +retry:
> +	err = intel_guc_send_nb(guc, action, len);
> +	if (unlikely(err == -EBUSY && loop)) {
> +		if (likely(not_atomic)) {
> +			if (msleep_interruptible(sleep_period_ms))
> +				return -EINTR;
> +			sleep_period_ms = sleep_period_ms << 1;
> +		} else {
> +			cpu_relax();

Potentially something we can change later, but if we're in atomic 
context we can't keep looping without a timeout while we get -EBUSY, we 
need to bail out early.

> +		}
> +		goto retry;
> +	}
> +
> +	return err;
> +}
> +
>   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>   {
>   	intel_guc_ct_event_handler(&guc->ct);
> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   int intel_guc_reset_engine(struct intel_guc *guc,
>   			   struct intel_engine_cs *engine);
>   
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg, u32 len);
> +
>   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 83ec60ea3f89..28ff82c5be45 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>   	case INTEL_GUC_ACTION_DEFAULT:
>   		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>   		break;
> +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> +							    len);
> +		break;
>   	default:
>   		ret = -EOPNOTSUPP;
>   		break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 53b4a5eb4a85..a47b3813b4d0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -13,7 +13,9 @@
>   #include "gt/intel_gt.h"
>   #include "gt/intel_gt_irq.h"
>   #include "gt/intel_gt_pm.h"
> +#include "gt/intel_gt_requests.h"
>   #include "gt/intel_lrc.h"
> +#include "gt/intel_lrc_reg.h"
>   #include "gt/intel_mocs.h"
>   #include "gt/intel_ring.h"
>   
> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
>   		   &ce->guc_sched_state_no_lock);
>   }
>   
> +/*
> + * Below is a set of functions which control the GuC scheduling state which
> + * require a lock, aside from the special case where the functions are called
> + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> + * path to be executing on the context.
> + */

Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even if 
it isn't strictly needed? I'd prefer to avoid the asymmetry in the 
locking scheme if possible, as that might case trouble if thing change 
in the future.

> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> +#define SCHED_STATE_DESTROYED				BIT(1)
> +static inline void init_sched_state(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> +	ce->guc_state.sched_state = 0;
> +}
> +
> +static inline bool
> +context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state &
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);

No need for (). Below as well.

> +}
> +
> +static inline void
> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	/* Only should be called from guc_lrc_desc_pin() */
> +	ce->guc_state.sched_state |=
> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> +}
> +
> +static inline void
> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state =
> +		(ce->guc_state.sched_state &
> +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);

nit: can also use

ce->guc_state.sched_state &=
	~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER


> +}
> +
> +static inline bool
> +context_destroyed(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> +}
> +
> +static inline void
> +set_context_destroyed(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> +}
> +
> +static inline bool context_guc_id_invalid(struct intel_context *ce)
> +{
> +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> +}
> +
> +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> +{
> +	ce->guc_id = GUC_INVALID_LRC_ID;
> +}
> +
> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> +{
> +	return &ce->engine->gt->uc.guc;
> +}
> +
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
>   	return rb_entry(rb, struct i915_priolist, node);
> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   	int len = 0;
>   	bool enabled = context_enabled(ce);
>   
> +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> +	GEM_BUG_ON(context_guc_id_invalid(ce));
> +
>   	if (!enabled) {
>   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>   		action[len++] = ce->guc_id;
> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
>   
>   	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>   
> +	spin_lock_init(&guc->contexts_lock);
> +	INIT_LIST_HEAD(&guc->guc_id_list);
> +	ida_init(&guc->guc_ids);
> +
>   	return 0;
>   }
>   
> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>   	i915_sched_engine_put(guc->sched_engine);
>   }
>   
> -static int guc_context_alloc(struct intel_context *ce)
> +static inline void queue_request(struct i915_sched_engine *sched_engine,
> +				 struct i915_request *rq,
> +				 int prio)
>   {
> -	return lrc_alloc(ce, ce->engine);
> +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> +	list_add_tail(&rq->sched.link,
> +		      i915_sched_lookup_priolist(sched_engine, prio));
> +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +}
> +
> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> +				     struct i915_request *rq)
> +{
> +	int ret;
> +
> +	__i915_request_submit(rq);
> +
> +	trace_i915_request_in(rq, 0);
> +
> +	guc_set_lrc_tail(rq);
> +	ret = guc_add_request(guc, rq);
> +	if (ret == -EBUSY)
> +		guc->stalled_request = rq;
> +
> +	return ret;
> +}
> +
> +static void guc_submit_request(struct i915_request *rq)
> +{
> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	/* Will be called from irq-context when using foreign fences. */
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +
> +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +		queue_request(sched_engine, rq, rq_prio(rq));
> +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> +		tasklet_hi_schedule(&sched_engine->tasklet);
> +
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
> +static int new_guc_id(struct intel_guc *guc)
> +{
> +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> +}
> +
> +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	if (!context_guc_id_invalid(ce)) {
> +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> +		reset_lrc_desc(guc, ce->guc_id);
> +		set_context_guc_id_invalid(ce);
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +}
> +
> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	__release_guc_id(guc, ce);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int steal_guc_id(struct intel_guc *guc)
> +{
> +	struct intel_context *ce;
> +	int guc_id;
> +
> +	if (!list_empty(&guc->guc_id_list)) {
> +		ce = list_first_entry(&guc->guc_id_list,
> +				      struct intel_context,
> +				      guc_id_link);
> +
> +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +		GEM_BUG_ON(context_guc_id_invalid(ce));
> +
> +		list_del_init(&ce->guc_id_link);
> +		guc_id = ce->guc_id;
> +		set_context_guc_id_invalid(ce);
> +		return guc_id;
> +	} else {
> +		return -EAGAIN;
> +	}
> +}
> +
> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> +{
> +	int ret;
> +
> +	ret = new_guc_id(guc);
> +	if (unlikely(ret < 0)) {
> +		ret = steal_guc_id(guc);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	*out = ret;
> +	return 0;

Is it worth adding spinlock_held asserts for guc->contexts_lock in these 
ID functions? Doubles up as a documentation of what locking we expect.

> +}
> +
> +#define PIN_GUC_ID_TRIES	4
> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	int ret = 0;
> +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> +
> +try_again:
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +
> +	if (context_guc_id_invalid(ce)) {
> +		ret = assign_guc_id(guc, &ce->guc_id);
> +		if (ret)
> +			goto out_unlock;
> +		ret = 1;	/* Indidcates newly assigned guc_id */
> +	}
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	atomic_inc(&ce->guc_id_ref);
> +
> +out_unlock:
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> +	 * outstanding requests to see if that frees up a guc_id. If the first
> +	 * retire didn't help, insert a sleep with the timeslice duration before
> +	 * attempting to retire more requests. Double the sleep period each
> +	 * subsequent pass before finally giving up. The sleep period has max of
> +	 * 100ms and minimum of 1ms.
> +	 */
> +	if (ret == -EAGAIN && --tries) {
> +		if (PIN_GUC_ID_TRIES - tries > 1) {
> +			unsigned int timeslice_shifted =
> +				ce->engine->props.timeslice_duration_ms <<
> +				(PIN_GUC_ID_TRIES - tries - 2);
> +			unsigned int max = min_t(unsigned int, 100,
> +						 timeslice_shifted);
> +
> +			msleep(max_t(unsigned int, max, 1));
> +		}
> +		intel_gt_retire_requests(guc_to_gt(guc));
> +		goto try_again;
> +	}
> +
> +	return ret;
> +}
> +
> +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> +{
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> +
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> +	    !atomic_read(&ce->guc_id_ref))
> +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +}
> +
> +static int __guc_action_register_context(struct intel_guc *guc,
> +					 u32 guc_id,
> +					 u32 offset)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> +		guc_id,
> +		offset,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int register_context(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> +		ce->guc_id * sizeof(struct guc_lrc_desc);
> +
> +	return __guc_action_register_context(guc, ce->guc_id, offset);
> +}
> +
> +static int __guc_action_deregister_context(struct intel_guc *guc,
> +					   u32 guc_id)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> +		guc_id,
> +	};
> +
> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> +}
> +
> +static int deregister_context(struct intel_context *ce, u32 guc_id)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	return __guc_action_deregister_context(guc, guc_id);
> +}
> +
> +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> +{
> +	switch (class) {
> +	case RENDER_CLASS:
> +		return mask >> RCS0;
> +	case VIDEO_ENHANCEMENT_CLASS:
> +		return mask >> VECS0;
> +	case VIDEO_DECODE_CLASS:
> +		return mask >> VCS0;
> +	case COPY_ENGINE_CLASS:
> +		return mask >> BCS0;
> +	default:
> +		GEM_BUG_ON("Invalid Class");

we usually use MISSING_CASE for this type of errors.

> +		return 0;
> +	}
> +}
> +
> +static void guc_context_policy_init(struct intel_engine_cs *engine,
> +				    struct guc_lrc_desc *desc)
> +{
> +	desc->policy_flags = 0;
> +
> +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> +}
> +
> +static int guc_lrc_desc_pin(struct intel_context *ce)
> +{
> +	struct intel_runtime_pm *runtime_pm =
> +		&ce->engine->gt->i915->runtime_pm;

If you move this after the engine var below you can skip the ce->engine 
jump. Also, you can shorten the pointer chasing by using 
engine->uncore->rpm.

> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
> +	u32 desc_idx = ce->guc_id;
> +	struct guc_lrc_desc *desc;
> +	bool context_registered;
> +	intel_wakeref_t wakeref;
> +	int ret = 0;
> +
> +	GEM_BUG_ON(!engine->mask);
> +
> +	/*
> +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> +	 * based on CT vma region.
> +	 */
> +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> +
> +	context_registered = lrc_desc_registered(guc, desc_idx);
> +
> +	reset_lrc_desc(guc, desc_idx);
> +	set_lrc_desc_registered(guc, desc_idx, ce);
> +
> +	desc = __get_lrc_desc(guc, desc_idx);
> +	desc->engine_class = engine_class_to_guc_class(engine->class);
> +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> +						      engine->mask);
> +	desc->hw_context_desc = ce->lrc.lrca;
> +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> +	guc_context_policy_init(engine, desc);
> +	init_sched_state(ce);
> +
> +	/*
> +	 * The context_lookup xarray is used to determine if the hardware
> +	 * context is currently registered. There are two cases in which it
> +	 * could be regisgered either the guc_id has been stole from from

typo regisgered

> +	 * another context or the lrc descriptor address of this context has
> +	 * changed. In either case the context needs to be deregistered with the
> +	 * GuC before registering this context.
> +	 */
> +	if (context_registered) {
> +		set_context_wait_for_deregister_to_register(ce);
> +		intel_context_get(ce);
> +
> +		/*
> +		 * If stealing the guc_id, this ce has the same guc_id as the
> +		 * context whos guc_id was stole.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = deregister_context(ce, ce->guc_id);
> +	} else {
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			ret = register_context(ce);
> +	}
> +
> +	return ret;
>   }
>   
>   static int guc_context_pre_pin(struct intel_context *ce,
> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
>   
>   static int guc_context_pin(struct intel_context *ce, void *vaddr)
>   {
> +	if (i915_ggtt_offset(ce->state) !=
> +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);

Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being 
re-assigned in there; it looks like it can only change if the context is 
new, but for future-proofing IMO better to re-order here.

> +
>   	return lrc_pin(ce, ce->engine, vaddr);

Could use a comment to say that the GuC context pinning call is delayed 
until request alloc and to look at the comment in the latter function 
for details (or move the explanation here and refer here from request 
alloc).

>   }
>   
> +static void guc_context_unpin(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +
> +	unpin_guc_id(guc, ce);
> +	lrc_unpin(ce);
> +}
> +
> +static void guc_context_post_unpin(struct intel_context *ce)
> +{
> +	lrc_post_unpin(ce);

why do we need this function? we can just pass lrc_post_unpin to the ops 
(like we already do).

> +}
> +
> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = ce->engine;
> +	struct intel_guc *guc = &engine->gt->uc.guc;
> +	unsigned long flags;
> +
> +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	set_context_destroyed(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	deregister_context(ce, ce->guc_id);
> +}
> +
> +static void guc_context_destroy(struct kref *kref)
> +{
> +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;

same as above, going through engine->uncore->rpm is shorter

> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> +	intel_wakeref_t wakeref;
> +	unsigned long flags;
> +
> +	/*
> +	 * If the guc_id is invalid this context has been stolen and we can free
> +	 * it immediately. Also can be freed immediately if the context is not
> +	 * registered with the GuC.
> +	 */
> +	if (context_guc_id_invalid(ce) ||
> +	    !lrc_desc_registered(guc, ce->guc_id)) {
> +		release_guc_id(guc, ce);

it feels a bit weird that we call release_guc_id in the case where the 
id is invalid. The code handles it fine, but still the flow doesn't feel 
clean. Not a blocker.

> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	/*
> +	 * We have to acquire the context spinlock and check guc_id again, if it
> +	 * is valid it hasn't been stolen and needs to be deregistered. We
> +	 * delete this context from the list of unpinned guc_ids available to
> +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> +	 * returns indicating this context has been deregistered the guc_id is
> +	 * returned to the pool of available guc_ids.
> +	 */
> +	spin_lock_irqsave(&guc->contexts_lock, flags);
> +	if (context_guc_id_invalid(ce)) {
> +		__release_guc_id(guc, ce);

But here the call to __release_guc_id is unneded, right? the ce doesn't 
own the ID anymore and both the steal and the release function already 
clean ce->guc_id_link, so there should be nothing left to clean.

> +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +		lrc_destroy(kref);
> +		return;
> +	}
> +
> +	if (!list_empty(&ce->guc_id_link))
> +		list_del_init(&ce->guc_id_link);
> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> +
> +	/*
> +	 * We defer GuC context deregistration until the context is destroyed
> +	 * in order to save on CTBs. With this optimization ideally we only need
> +	 * 1 CTB to register the context during the first pin and 1 CTB to
> +	 * deregister the context when the context is destroyed. Without this
> +	 * optimization, a CTB would be needed every pin & unpin.
> +	 *
> +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
> +	 * from context_free_worker when not runtime wakeref is held.
> +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> +	 * in H2G CTB to deregister the context. A future patch may defer this
> +	 * H2G CTB if the runtime wakeref is zero.
> +	 */
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		guc_lrc_desc_unpin(ce);
> +}
> +
> +static int guc_context_alloc(struct intel_context *ce)
> +{
> +	return lrc_alloc(ce, ce->engine);
> +}
> +
>   static const struct intel_context_ops guc_context_ops = {
>   	.alloc = guc_context_alloc,
>   
>   	.pre_pin = guc_context_pre_pin,
>   	.pin = guc_context_pin,
> -	.unpin = lrc_unpin,
> -	.post_unpin = lrc_post_unpin,
> +	.unpin = guc_context_unpin,
> +	.post_unpin = guc_context_post_unpin,
>   
>   	.enter = intel_context_enter_engine,
>   	.exit = intel_context_exit_engine,
>   
>   	.reset = lrc_reset,
> -	.destroy = lrc_destroy,
> +	.destroy = guc_context_destroy,
>   };
>   
> -static int guc_request_alloc(struct i915_request *request)
> +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>   {
> +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> +}
> +
> +static int guc_request_alloc(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +	struct intel_guc *guc = ce_to_guc(ce);
>   	int ret;
>   
> -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>   
>   	/*
>   	 * Flush enough space to reduce the likelihood of waiting after
>   	 * we start building the request - in which case we will just
>   	 * have to repeat work.
>   	 */
> -	request->reserved_space += GUC_REQUEST_SIZE;
> +	rq->reserved_space += GUC_REQUEST_SIZE;
>   
>   	/*
>   	 * Note that after this point, we have committed to using
> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
>   	 */
>   
>   	/* Unconditionally invalidate GPU caches and TLBs. */
> -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>   	if (ret)
>   		return ret;
>   
> -	request->reserved_space -= GUC_REQUEST_SIZE;
> -	return 0;
> -}
> +	rq->reserved_space -= GUC_REQUEST_SIZE;
>   
> -static inline void queue_request(struct i915_sched_engine *sched_engine,
> -				 struct i915_request *rq,
> -				 int prio)
> -{
> -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> -	list_add_tail(&rq->sched.link,
> -		      i915_sched_lookup_priolist(sched_engine, prio));
> -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -}
> -
> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> -				     struct i915_request *rq)
> -{
> -	int ret;
> -
> -	__i915_request_submit(rq);
> -
> -	trace_i915_request_in(rq, 0);
> -
> -	guc_set_lrc_tail(rq);
> -	ret = guc_add_request(guc, rq);
> -	if (ret == -EBUSY)
> -		guc->stalled_request = rq;
> -
> -	return ret;
> -}
> -
> -static void guc_submit_request(struct i915_request *rq)
> -{
> -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> -	unsigned long flags;
> +	/*
> +	 * Call pin_guc_id here rather than in the pinning step as with
> +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> +	 * guc_ids and creating horrible race conditions. This is especially bad
> +	 * when guc_ids are being stolen due to over subscription. By the time
> +	 * this function is reached, it is guaranteed that the guc_id will be
> +	 * persistent until the generated request is retired. Thus, sealing these
> +	 * race conditions. It is still safe to fail here if guc_ids are
> +	 * exhausted and return -EAGAIN to the user indicating that they can try
> +	 * again in the future.
> +	 *
> +	 * There is no need for a lock here as the timeline mutex ensures at
> +	 * most one context can be executing this code path at once. The
> +	 * guc_id_ref is incremented once for every request in flight and
> +	 * decremented on each retire. When it is zero, a lock around the
> +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> +	 */
> +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> +		return 0;
>   
> -	/* Will be called from irq-context when using foreign fences. */
> -	spin_lock_irqsave(&sched_engine->lock, flags);
> +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> +	if (unlikely(ret < 0))
> +		return ret;;

typo ";;"

> +	if (context_needs_register(ce, !!ret)) {
> +		ret = guc_lrc_desc_pin(ce);
> +		if (unlikely(ret)) {	/* unwind */
> +			atomic_dec(&ce->guc_id_ref);
> +			unpin_guc_id(guc, ce);
> +			return ret;
> +		}
> +	}
>   
> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> -		queue_request(sched_engine, rq, rq_prio(rq));
> -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> -		tasklet_hi_schedule(&sched_engine->tasklet);
> +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);


Might be worth moving the pinning to a separate function to keep the 
request_alloc focused on the request. Can be done as a follow up.

>   
> -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	return 0;
>   }
>   
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>   	engine->submit_request = guc_submit_request;
>   }
>   
> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> +					  struct intel_context *ce)
> +{
> +	if (context_guc_id_invalid(ce))
> +		pin_guc_id(guc, ce);
> +	guc_lrc_desc_pin(ce);
> +}
> +
> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	/* make sure all descriptors are clean... */
> +	xa_destroy(&guc->context_lookup);
> +
> +	/*
> +	 * Some contexts might have been pinned before we enabled GuC
> +	 * submission, so we need to add them to the GuC bookeeping.
> +	 * Also, after a reset the GuC we want to make sure that the information
> +	 * shared with GuC is properly reset. The kernel lrcs are not attached
> +	 * to the gem_context, so they need to be added separately.
> +	 *
> +	 * Note: we purposely do not check the error return of
> +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> +	 * enough here (we're either only pinning a handful of lrc on first boot
> +	 * or we're re-pinning lrcs that were already pinned before the reset).
> +	 * Two, if the GuC has died and CTBs can't make forward progress.
> +	 * Presumably, the GuC should be alive as this function is called on
> +	 * driver load or after a reset. Even if it is dead, another full GPU
> +	 * reset will be triggered and this function would be called again.
> +	 */
> +
> +	for_each_engine(engine, gt, id)
> +		if (engine->kernel_context)
> +			guc_kernel_context_pin(guc, engine->kernel_context);
> +}
> +
>   static void guc_release(struct intel_engine_cs *engine)
>   {
>   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   void intel_guc_submission_enable(struct intel_guc *guc)
>   {
> +	guc_init_lrc_mapping(guc);
>   }
>   
>   void intel_guc_submission_disable(struct intel_guc *guc)
> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>   {
>   	guc->submission_selected = __guc_submission_selected(guc);
>   }
> +
> +static inline struct intel_context *
> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> +{
> +	struct intel_context *ce;
> +
> +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Invalid desc_idx %u", desc_idx);
> +		return NULL;
> +	}
> +
> +	ce = __get_context(guc, desc_idx);
> +	if (unlikely(!ce)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> +			"Context is NULL, desc_idx %u", desc_idx);
> +		return NULL;
> +	}
> +
> +	return ce;
> +}
> +
> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> +					  const u32 *msg,
> +					  u32 len)
> +{
> +	struct intel_context *ce;
> +	u32 desc_idx = msg[0];
> +
> +	if (unlikely(len < 1)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> +		return -EPROTO;
> +	}
> +
> +	ce = g2h_context_lookup(guc, desc_idx);
> +	if (unlikely(!ce))
> +		return -EPROTO;
> +
> +	if (context_wait_for_deregister_to_register(ce)) {
> +		struct intel_runtime_pm *runtime_pm =
> +			&ce->engine->gt->i915->runtime_pm;
> +		intel_wakeref_t wakeref;
> +
> +		/*
> +		 * Previous owner of this guc_id has been deregistered, now safe
> +		 * register this context.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref)
> +			register_context(ce);
> +		clr_context_wait_for_deregister_to_register(ce);
> +		intel_context_put(ce);
> +	} else if (context_destroyed(ce)) {
> +		/* Context has been destroyed */
> +		release_guc_id(guc, ce);
> +		lrc_destroy(&ce->ref);
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 943fe485c662..204c95c39353 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -4142,6 +4142,7 @@ enum {
>   	FAULT_AND_CONTINUE /* Unsupported */
>   };
>   
> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>   #define GEN8_CTX_VALID (1 << 0)
>   #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>   #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 86b4c9f2613d..b48c4905d3fc 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
>   	 */
>   	if (!list_empty(&rq->sched.link))
>   		remove_from_engine(rq);
> +	atomic_dec(&rq->context->guc_id_ref);

Does this work/make sense  if GuC is disabled?

Daniele

>   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>   
>   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20  1:03     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  1:03 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> When running the GuC the GPU can't be considered idle if the GuC still
> has contexts pinned. As such, a call has been added in
> intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
> the number of unpinned contexts to go to zero.
>
> v2: rtimeout -> remaining_timeout
> v3: Drop unnecessary includes, guc_submission_busy_loop ->
> guc_submission_send_busy_loop, drop negatie timeout trick, move a
> refactor of guc_context_unpin to earlier path (John H)
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>   drivers/gpu/drm/i915/gt/intel_gt.c            | 19 +++++
>   drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
>   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +++++++++++++++++--
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 ++
>   drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
>   .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
>   13 files changed, 129 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index a90f796e85c0..6fffd4d377c2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>   		goto insert;
>   
>   	/* Attempt to reap some mmap space from dead objects */
> -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
> +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
> +					       NULL);
>   	if (err)
>   		goto err;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index e714e21c0a4d..acfdd53b2678 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
>   	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
>   }
>   
> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> +{
> +	long remaining_timeout;
> +
> +	/* If the device is asleep, we have no requests outstanding */
> +	if (!intel_gt_pm_is_awake(gt))
> +		return 0;
> +
> +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
> +							   &remaining_timeout)) > 0) {
> +		cond_resched();
> +		if (signal_pending(current))
> +			return -EINTR;
> +	}
> +
> +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
> +							  remaining_timeout);
> +}
> +
>   int intel_gt_init(struct intel_gt *gt)
>   {
>   	int err;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> index e7aabe0cc5bf..74e771871a9b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
>   
>   void intel_gt_driver_late_release(struct intel_gt *gt);
>   
> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> +
>   void intel_gt_check_and_clear_faults(struct intel_gt *gt);
>   void intel_gt_clear_error_registers(struct intel_gt *gt,
>   				    intel_engine_mask_t engine_mask);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 647eca9d867a..edb881d75630 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(engine->retire);
>   }
>   
> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> +				      long *remaining_timeout)
>   {
>   	struct intel_gt_timelines *timelines = &gt->timelines;
>   	struct intel_timeline *tl, *tn;
> @@ -195,22 +196,10 @@ out_active:	spin_lock(&timelines->lock);
>   	if (flush_submission(gt, timeout)) /* Wait, there's more! */
>   		active_count++;
>   
> -	return active_count ? timeout : 0;
> -}
> -
> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> -{
> -	/* If the device is asleep, we have no requests outstanding */
> -	if (!intel_gt_pm_is_awake(gt))
> -		return 0;
> -
> -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
> -		cond_resched();
> -		if (signal_pending(current))
> -			return -EINTR;
> -	}
> +	if (remaining_timeout)
> +		*remaining_timeout = timeout;
>   
> -	return timeout;
> +	return active_count ? timeout : 0;
>   }
>   
>   static void retire_work_handler(struct work_struct *work)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> index fcc30a6e4fe9..83ff5280c06e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
You were saying the the include of stddef is needed here?

> @@ -10,10 +10,11 @@ struct intel_engine_cs;
>   struct intel_gt;
>   struct intel_timeline;
>   
> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> +				      long *remaining_timeout);
>   static inline void intel_gt_retire_requests(struct intel_gt *gt)
>   {
> -	intel_gt_retire_requests_timeout(gt, 0);
> +	intel_gt_retire_requests_timeout(gt, 0, NULL);
>   }
>   
>   void intel_engine_init_retire(struct intel_engine_cs *engine);
> @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
>   			     struct intel_timeline *tl);
>   void intel_engine_fini_retire(struct intel_engine_cs *engine);
>   
> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> -
>   void intel_gt_init_requests(struct intel_gt *gt);
>   void intel_gt_park_requests(struct intel_gt *gt);
>   void intel_gt_unpark_requests(struct intel_gt *gt);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 80b88bae5f24..3cc566565224 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -39,6 +39,8 @@ struct intel_guc {
>   	spinlock_t irq_lock;
>   	unsigned int msg_enabled_mask;
>   
> +	atomic_t outstanding_submission_g2h;
> +
>   	struct {
>   		void (*reset)(struct intel_guc *guc);
>   		void (*enable)(struct intel_guc *guc);
> @@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   	spin_unlock_irq(&guc->irq_lock);
>   }
>   
> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> +
>   int intel_guc_reset_engine(struct intel_guc *guc,
>   			   struct intel_engine_cs *engine);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index c33906ec478d..f1cbed6b9f0a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>   	INIT_LIST_HEAD(&ct->requests.incoming);
>   	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
>   	tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
> +	init_waitqueue_head(&ct->wq);
>   }
>   
>   static inline const char *guc_ct_buffer_type_to_str(u32 type)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 785dfc5c6efb..4b30a562ae63 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -10,6 +10,7 @@
>   #include <linux/spinlock.h>
>   #include <linux/workqueue.h>
>   #include <linux/ktime.h>
> +#include <linux/wait.h>
>   
>   #include "intel_guc_fwif.h"
>   
> @@ -68,6 +69,9 @@ struct intel_guc_ct {
>   
>   	struct tasklet_struct receive_tasklet;
>   
> +	/** @wq: wait queue for g2h chanenl */
> +	wait_queue_head_t wq;
> +
>   	struct {
>   		u16 last_fence; /* last fence used to send request */
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index f7e34baa9506..088d11e2e497 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -254,6 +254,69 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>   }
>   
> +static int guc_submission_send_busy_loop(struct intel_guc* guc,
> +					 const u32 *action,
> +					 u32 len,
> +					 u32 g2h_len_dw,
> +					 bool loop)
> +{
> +	int err;
> +
> +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> +
> +	if (!err && g2h_len_dw)
> +		atomic_inc(&guc->outstanding_submission_g2h);
> +
> +	return err;
> +}
> +
> +static int guc_wait_for_pending_msg(struct intel_guc *guc,
> +				    atomic_t *wait_var,
> +				    bool interruptible,
> +				    long timeout)
> +{
> +	const int state = interruptible ?
> +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	DEFINE_WAIT(wait);
> +
> +	might_sleep();
> +	GEM_BUG_ON(timeout < 0);
> +
> +	if (!atomic_read(wait_var))
> +		return 0;
> +
> +	if (!timeout)
> +		return -ETIME;
> +
> +	for (;;) {
> +		prepare_to_wait(&guc->ct.wq, &wait, state);
> +
> +		if (!atomic_read(wait_var))
> +			break;
> +
> +		if (signal_pending_state(state, current)) {
> +			timeout = -EINTR;
> +			break;
> +		}
> +
> +		if (!timeout) {
> +			timeout = -ETIME;
> +			break;
> +		}
> +
> +		timeout = io_schedule_timeout(timeout);
> +	}
> +	finish_wait(&guc->ct.wq, &wait);
> +
> +	return (timeout < 0) ? timeout : 0;
> +}
> +
> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> +{
> +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
> +					true, timeout);
> +}
> +
>   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
>   	int err;
> @@ -280,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   
>   	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
>   	if (!enabled && !err) {
> +		atomic_inc(&guc->outstanding_submission_g2h);
>   		set_context_enabled(ce);
>   	} else if (!enabled) {
>   		clr_context_pending_enable(ce);
> @@ -731,7 +795,8 @@ static int __guc_action_register_context(struct intel_guc *guc,
>   		offset,
>   	};
>   
> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +					     0, true);
>   }
>   
>   static int register_context(struct intel_context *ce)
> @@ -751,8 +816,9 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>   		guc_id,
>   	};
>   
> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> -					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +					     G2H_LEN_DW_DEREGISTER_CONTEXT,
> +					     true);
>   }
>   
>   static int deregister_context(struct intel_context *ce, u32 guc_id)
> @@ -893,8 +959,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>   
>   	intel_context_get(ce);
>   
> -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> -				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>   }
>   
>   static u16 prep_context_pending_disable(struct intel_context *ce)
> @@ -1440,6 +1506,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>   	return ce;
>   }
>   
> +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> +{
> +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
> +		wake_up_all(&guc->ct.wq);
> +}
> +
>   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   					  const u32 *msg,
>   					  u32 len)
> @@ -1475,6 +1547,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   		lrc_destroy(&ce->ref);
>   	}
>   
> +	decr_outstanding_submission_g2h(guc);
> +
>   	return 0;
>   }
>   
> @@ -1523,6 +1597,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   	}
>   
> +	decr_outstanding_submission_g2h(guc);
>   	intel_context_put(ce);
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index 9c954c589edf..c4cef885e984 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
>   #undef uc_state_checkers
>   #undef __uc_state_checker
>   
> +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
> +{
> +	return intel_guc_wait_for_idle(&uc->guc, timeout);
> +}
> +
>   #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
>   static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
>   { \
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index 4d2d59a9942b..2b73ddb11c66 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -27,6 +27,7 @@
>    */
>   
>   #include "gem/i915_gem_context.h"
> +#include "gt/intel_gt.h"
Still not seeing a need for this.

>   #include "gt/intel_gt_requests.h"
>   
>   #include "i915_drv.h"
> diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> index c130010a7033..1c721542e277 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> @@ -5,7 +5,7 @@
>    */
>   
>   #include "i915_drv.h"
> -#include "gt/intel_gt_requests.h"
> +#include "gt/intel_gt.h"
Nor this.

John.

>   
>   #include "../i915_selftest.h"
>   #include "igt_flush_test.h"
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index d189c4bd4bef..4f8180146888 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915)
>   	do {
>   		for_each_engine(engine, gt, id)
>   			mock_engine_flush(engine);
> -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
> +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
> +						  NULL));
>   }
>   
>   static void mock_device_release(struct drm_device *dev)


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
@ 2021-07-20  1:03     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  1:03 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> When running the GuC the GPU can't be considered idle if the GuC still
> has contexts pinned. As such, a call has been added in
> intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
> the number of unpinned contexts to go to zero.
>
> v2: rtimeout -> remaining_timeout
> v3: Drop unnecessary includes, guc_submission_busy_loop ->
> guc_submission_send_busy_loop, drop negatie timeout trick, move a
> refactor of guc_context_unpin to earlier path (John H)
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>   drivers/gpu/drm/i915/gt/intel_gt.c            | 19 +++++
>   drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
>   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +++++++++++++++++--
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 ++
>   drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
>   .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
>   13 files changed, 129 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> index a90f796e85c0..6fffd4d377c2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> @@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>   		goto insert;
>   
>   	/* Attempt to reap some mmap space from dead objects */
> -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
> +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
> +					       NULL);
>   	if (err)
>   		goto err;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index e714e21c0a4d..acfdd53b2678 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
>   	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
>   }
>   
> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> +{
> +	long remaining_timeout;
> +
> +	/* If the device is asleep, we have no requests outstanding */
> +	if (!intel_gt_pm_is_awake(gt))
> +		return 0;
> +
> +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
> +							   &remaining_timeout)) > 0) {
> +		cond_resched();
> +		if (signal_pending(current))
> +			return -EINTR;
> +	}
> +
> +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
> +							  remaining_timeout);
> +}
> +
>   int intel_gt_init(struct intel_gt *gt)
>   {
>   	int err;
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> index e7aabe0cc5bf..74e771871a9b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
>   
>   void intel_gt_driver_late_release(struct intel_gt *gt);
>   
> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> +
>   void intel_gt_check_and_clear_faults(struct intel_gt *gt);
>   void intel_gt_clear_error_registers(struct intel_gt *gt,
>   				    intel_engine_mask_t engine_mask);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 647eca9d867a..edb881d75630 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(engine->retire);
>   }
>   
> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> +				      long *remaining_timeout)
>   {
>   	struct intel_gt_timelines *timelines = &gt->timelines;
>   	struct intel_timeline *tl, *tn;
> @@ -195,22 +196,10 @@ out_active:	spin_lock(&timelines->lock);
>   	if (flush_submission(gt, timeout)) /* Wait, there's more! */
>   		active_count++;
>   
> -	return active_count ? timeout : 0;
> -}
> -
> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> -{
> -	/* If the device is asleep, we have no requests outstanding */
> -	if (!intel_gt_pm_is_awake(gt))
> -		return 0;
> -
> -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
> -		cond_resched();
> -		if (signal_pending(current))
> -			return -EINTR;
> -	}
> +	if (remaining_timeout)
> +		*remaining_timeout = timeout;
>   
> -	return timeout;
> +	return active_count ? timeout : 0;
>   }
>   
>   static void retire_work_handler(struct work_struct *work)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> index fcc30a6e4fe9..83ff5280c06e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
You were saying the the include of stddef is needed here?

> @@ -10,10 +10,11 @@ struct intel_engine_cs;
>   struct intel_gt;
>   struct intel_timeline;
>   
> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> +				      long *remaining_timeout);
>   static inline void intel_gt_retire_requests(struct intel_gt *gt)
>   {
> -	intel_gt_retire_requests_timeout(gt, 0);
> +	intel_gt_retire_requests_timeout(gt, 0, NULL);
>   }
>   
>   void intel_engine_init_retire(struct intel_engine_cs *engine);
> @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
>   			     struct intel_timeline *tl);
>   void intel_engine_fini_retire(struct intel_engine_cs *engine);
>   
> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> -
>   void intel_gt_init_requests(struct intel_gt *gt);
>   void intel_gt_park_requests(struct intel_gt *gt);
>   void intel_gt_unpark_requests(struct intel_gt *gt);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 80b88bae5f24..3cc566565224 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -39,6 +39,8 @@ struct intel_guc {
>   	spinlock_t irq_lock;
>   	unsigned int msg_enabled_mask;
>   
> +	atomic_t outstanding_submission_g2h;
> +
>   	struct {
>   		void (*reset)(struct intel_guc *guc);
>   		void (*enable)(struct intel_guc *guc);
> @@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   	spin_unlock_irq(&guc->irq_lock);
>   }
>   
> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> +
>   int intel_guc_reset_engine(struct intel_guc *guc,
>   			   struct intel_engine_cs *engine);
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index c33906ec478d..f1cbed6b9f0a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>   	INIT_LIST_HEAD(&ct->requests.incoming);
>   	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
>   	tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
> +	init_waitqueue_head(&ct->wq);
>   }
>   
>   static inline const char *guc_ct_buffer_type_to_str(u32 type)
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 785dfc5c6efb..4b30a562ae63 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -10,6 +10,7 @@
>   #include <linux/spinlock.h>
>   #include <linux/workqueue.h>
>   #include <linux/ktime.h>
> +#include <linux/wait.h>
>   
>   #include "intel_guc_fwif.h"
>   
> @@ -68,6 +69,9 @@ struct intel_guc_ct {
>   
>   	struct tasklet_struct receive_tasklet;
>   
> +	/** @wq: wait queue for g2h chanenl */
> +	wait_queue_head_t wq;
> +
>   	struct {
>   		u16 last_fence; /* last fence used to send request */
>   
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index f7e34baa9506..088d11e2e497 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -254,6 +254,69 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>   }
>   
> +static int guc_submission_send_busy_loop(struct intel_guc* guc,
> +					 const u32 *action,
> +					 u32 len,
> +					 u32 g2h_len_dw,
> +					 bool loop)
> +{
> +	int err;
> +
> +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> +
> +	if (!err && g2h_len_dw)
> +		atomic_inc(&guc->outstanding_submission_g2h);
> +
> +	return err;
> +}
> +
> +static int guc_wait_for_pending_msg(struct intel_guc *guc,
> +				    atomic_t *wait_var,
> +				    bool interruptible,
> +				    long timeout)
> +{
> +	const int state = interruptible ?
> +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	DEFINE_WAIT(wait);
> +
> +	might_sleep();
> +	GEM_BUG_ON(timeout < 0);
> +
> +	if (!atomic_read(wait_var))
> +		return 0;
> +
> +	if (!timeout)
> +		return -ETIME;
> +
> +	for (;;) {
> +		prepare_to_wait(&guc->ct.wq, &wait, state);
> +
> +		if (!atomic_read(wait_var))
> +			break;
> +
> +		if (signal_pending_state(state, current)) {
> +			timeout = -EINTR;
> +			break;
> +		}
> +
> +		if (!timeout) {
> +			timeout = -ETIME;
> +			break;
> +		}
> +
> +		timeout = io_schedule_timeout(timeout);
> +	}
> +	finish_wait(&guc->ct.wq, &wait);
> +
> +	return (timeout < 0) ? timeout : 0;
> +}
> +
> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> +{
> +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
> +					true, timeout);
> +}
> +
>   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
>   	int err;
> @@ -280,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   
>   	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
>   	if (!enabled && !err) {
> +		atomic_inc(&guc->outstanding_submission_g2h);
>   		set_context_enabled(ce);
>   	} else if (!enabled) {
>   		clr_context_pending_enable(ce);
> @@ -731,7 +795,8 @@ static int __guc_action_register_context(struct intel_guc *guc,
>   		offset,
>   	};
>   
> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +					     0, true);
>   }
>   
>   static int register_context(struct intel_context *ce)
> @@ -751,8 +816,9 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>   		guc_id,
>   	};
>   
> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> -					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +					     G2H_LEN_DW_DEREGISTER_CONTEXT,
> +					     true);
>   }
>   
>   static int deregister_context(struct intel_context *ce, u32 guc_id)
> @@ -893,8 +959,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>   
>   	intel_context_get(ce);
>   
> -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> -				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>   }
>   
>   static u16 prep_context_pending_disable(struct intel_context *ce)
> @@ -1440,6 +1506,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>   	return ce;
>   }
>   
> +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> +{
> +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
> +		wake_up_all(&guc->ct.wq);
> +}
> +
>   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   					  const u32 *msg,
>   					  u32 len)
> @@ -1475,6 +1547,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   		lrc_destroy(&ce->ref);
>   	}
>   
> +	decr_outstanding_submission_g2h(guc);
> +
>   	return 0;
>   }
>   
> @@ -1523,6 +1597,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   	}
>   
> +	decr_outstanding_submission_g2h(guc);
>   	intel_context_put(ce);
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index 9c954c589edf..c4cef885e984 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
>   #undef uc_state_checkers
>   #undef __uc_state_checker
>   
> +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
> +{
> +	return intel_guc_wait_for_idle(&uc->guc, timeout);
> +}
> +
>   #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
>   static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
>   { \
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index 4d2d59a9942b..2b73ddb11c66 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -27,6 +27,7 @@
>    */
>   
>   #include "gem/i915_gem_context.h"
> +#include "gt/intel_gt.h"
Still not seeing a need for this.

>   #include "gt/intel_gt_requests.h"
>   
>   #include "i915_drv.h"
> diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> index c130010a7033..1c721542e277 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> @@ -5,7 +5,7 @@
>    */
>   
>   #include "i915_drv.h"
> -#include "gt/intel_gt_requests.h"
> +#include "gt/intel_gt.h"
Nor this.

John.

>   
>   #include "../i915_selftest.h"
>   #include "igt_flush_test.h"
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index d189c4bd4bef..4f8180146888 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915)
>   	do {
>   		for_each_engine(engine, gt, id)
>   			mock_engine_flush(engine);
> -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
> +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
> +						  NULL));
>   }
>   
>   static void mock_device_release(struct drm_device *dev)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 16/51] drm/i915/guc: Update GuC debugfs to support new GuC
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20  1:13     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  1:13 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> Update GuC debugfs to support the new GuC structures.
>
> v2:
>   (John Harrison)
>    - Remove intel_lrc_reg.h include from i915_debugfs.c
>   (Michal)
>    - Rename GuC debugfs functions
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 22 ++++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  3 +
>   .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    | 23 +++++++-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 55 +++++++++++++++++++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
>   5 files changed, 107 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index f1cbed6b9f0a..503a78517610 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -1171,3 +1171,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
>   
>   	ct_try_receive_message(ct);
>   }
> +
> +void intel_guc_ct_print_info(struct intel_guc_ct *ct,
> +			     struct drm_printer *p)
> +{
> +	drm_printf(p, "CT %s\n", enableddisabled(ct->enabled));
> +
> +	if (!ct->enabled)
> +		return;
> +
> +	drm_printf(p, "H2G Space: %u\n",
> +		   atomic_read(&ct->ctbs.send.space) * 4);
> +	drm_printf(p, "Head: %u\n",
> +		   ct->ctbs.send.desc->head);
> +	drm_printf(p, "Tail: %u\n",
> +		   ct->ctbs.send.desc->tail);
> +	drm_printf(p, "G2H Space: %u\n",
> +		   atomic_read(&ct->ctbs.recv.space) * 4);
> +	drm_printf(p, "Head: %u\n",
> +		   ct->ctbs.recv.desc->head);
> +	drm_printf(p, "Tail: %u\n",
> +		   ct->ctbs.recv.desc->tail);
> +}
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 4b30a562ae63..7b34026d264a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -16,6 +16,7 @@
>   
>   struct i915_vma;
>   struct intel_guc;
> +struct drm_printer;
>   
>   /**
>    * DOC: Command Transport (CT).
> @@ -112,4 +113,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>   		      u32 *response_buf, u32 response_buf_size, u32 flags);
>   void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>   
> +void intel_guc_ct_print_info(struct intel_guc_ct *ct, struct drm_printer *p);
> +
>   #endif /* _INTEL_GUC_CT_H_ */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
> index fe7cb7b29a1e..7a454c91a736 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
> @@ -9,6 +9,8 @@
>   #include "intel_guc.h"
>   #include "intel_guc_debugfs.h"
>   #include "intel_guc_log_debugfs.h"
> +#include "gt/uc/intel_guc_ct.h"
> +#include "gt/uc/intel_guc_submission.h"
>   
>   static int guc_info_show(struct seq_file *m, void *data)
>   {
> @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data)
>   	drm_puts(&p, "\n");
>   	intel_guc_log_info(&guc->log, &p);
>   
> -	/* Add more as required ... */
> +	if (!intel_guc_submission_is_used(guc))
> +		return 0;
> +
> +	intel_guc_ct_print_info(&guc->ct, &p);
> +	intel_guc_submission_print_info(guc, &p);
>   
>   	return 0;
>   }
>   DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
>   
> +static int guc_registered_contexts_show(struct seq_file *m, void *data)
> +{
> +	struct intel_guc *guc = m->private;
> +	struct drm_printer p = drm_seq_file_printer(m);
> +
> +	if (!intel_guc_submission_is_used(guc))
> +		return -ENODEV;
> +
> +	intel_guc_submission_print_context_info(guc, &p);
> +
> +	return 0;
> +}
> +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
> +
>   void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
>   {
>   	static const struct debugfs_gt_file files[] = {
>   		{ "guc_info", &guc_info_fops, NULL },
> +		{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
>   	};
>   
>   	if (!intel_guc_is_supported(guc))
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 088d11e2e497..a2af7e17dcc2 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1602,3 +1602,58 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   
>   	return 0;
>   }
> +
> +void intel_guc_submission_print_info(struct intel_guc *guc,
> +				     struct drm_printer *p)
> +{
> +	struct i915_sched_engine *sched_engine = guc->sched_engine;
> +	struct rb_node *rb;
> +	unsigned long flags;
> +
> +	if (!sched_engine)
> +		return;
> +
> +	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
> +		   atomic_read(&guc->outstanding_submission_g2h));
> +	drm_printf(p, "GuC tasklet count: %u\n\n",
> +		   atomic_read(&sched_engine->tasklet.count));
> +
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	drm_printf(p, "Requests in GuC submit tasklet:\n");
> +	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
> +		struct i915_priolist *pl = to_priolist(rb);
> +		struct i915_request *rq;
> +
> +		priolist_for_each_request(rq, pl)
> +			drm_printf(p, "guc_id=%u, seqno=%llu\n",
> +				   rq->context->guc_id,
> +				   rq->fence.seqno);
> +	}
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	drm_printf(p, "\n");
> +}
> +
> +void intel_guc_submission_print_context_info(struct intel_guc *guc,
> +					     struct drm_printer *p)
> +{
> +	struct intel_context *ce;
> +	unsigned long index;
> +
> +	xa_for_each(&guc->context_lookup, index, ce) {
> +		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> +		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> +		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> +			   ce->ring->head,
> +			   ce->lrc_reg_state[CTX_RING_HEAD]);
> +		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> +			   ce->ring->tail,
> +			   ce->lrc_reg_state[CTX_RING_TAIL]);
> +		drm_printf(p, "\t\tContext Pin Count: %u\n",
> +			   atomic_read(&ce->pin_count));
> +		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> +			   atomic_read(&ce->guc_id_ref));
> +		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> +			   ce->guc_state.sched_state,
> +			   atomic_read(&ce->guc_sched_state_no_lock));
> +	}
> +}
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index 3f7005018939..2b9470c90558 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -10,6 +10,7 @@
>   
>   #include "intel_guc.h"
>   
> +struct drm_printer;
>   struct intel_engine_cs;
>   
>   void intel_guc_submission_init_early(struct intel_guc *guc);
> @@ -20,6 +21,10 @@ void intel_guc_submission_fini(struct intel_guc *guc);
>   int intel_guc_preempt_work_create(struct intel_guc *guc);
>   void intel_guc_preempt_work_destroy(struct intel_guc *guc);
>   int intel_guc_submission_setup(struct intel_engine_cs *engine);
> +void intel_guc_submission_print_info(struct intel_guc *guc,
> +				     struct drm_printer *p);
> +void intel_guc_submission_print_context_info(struct intel_guc *guc,
> +					     struct drm_printer *p);
>   
>   static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
>   {


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 16/51] drm/i915/guc: Update GuC debugfs to support new GuC
@ 2021-07-20  1:13     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  1:13 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> Update GuC debugfs to support the new GuC structures.
>
> v2:
>   (John Harrison)
>    - Remove intel_lrc_reg.h include from i915_debugfs.c
>   (Michal)
>    - Rename GuC debugfs functions
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     | 22 ++++++++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  3 +
>   .../gpu/drm/i915/gt/uc/intel_guc_debugfs.c    | 23 +++++++-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 55 +++++++++++++++++++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.h |  5 ++
>   5 files changed, 107 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index f1cbed6b9f0a..503a78517610 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -1171,3 +1171,25 @@ void intel_guc_ct_event_handler(struct intel_guc_ct *ct)
>   
>   	ct_try_receive_message(ct);
>   }
> +
> +void intel_guc_ct_print_info(struct intel_guc_ct *ct,
> +			     struct drm_printer *p)
> +{
> +	drm_printf(p, "CT %s\n", enableddisabled(ct->enabled));
> +
> +	if (!ct->enabled)
> +		return;
> +
> +	drm_printf(p, "H2G Space: %u\n",
> +		   atomic_read(&ct->ctbs.send.space) * 4);
> +	drm_printf(p, "Head: %u\n",
> +		   ct->ctbs.send.desc->head);
> +	drm_printf(p, "Tail: %u\n",
> +		   ct->ctbs.send.desc->tail);
> +	drm_printf(p, "G2H Space: %u\n",
> +		   atomic_read(&ct->ctbs.recv.space) * 4);
> +	drm_printf(p, "Head: %u\n",
> +		   ct->ctbs.recv.desc->head);
> +	drm_printf(p, "Tail: %u\n",
> +		   ct->ctbs.recv.desc->tail);
> +}
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> index 4b30a562ae63..7b34026d264a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> @@ -16,6 +16,7 @@
>   
>   struct i915_vma;
>   struct intel_guc;
> +struct drm_printer;
>   
>   /**
>    * DOC: Command Transport (CT).
> @@ -112,4 +113,6 @@ int intel_guc_ct_send(struct intel_guc_ct *ct, const u32 *action, u32 len,
>   		      u32 *response_buf, u32 response_buf_size, u32 flags);
>   void intel_guc_ct_event_handler(struct intel_guc_ct *ct);
>   
> +void intel_guc_ct_print_info(struct intel_guc_ct *ct, struct drm_printer *p);
> +
>   #endif /* _INTEL_GUC_CT_H_ */
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
> index fe7cb7b29a1e..7a454c91a736 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_debugfs.c
> @@ -9,6 +9,8 @@
>   #include "intel_guc.h"
>   #include "intel_guc_debugfs.h"
>   #include "intel_guc_log_debugfs.h"
> +#include "gt/uc/intel_guc_ct.h"
> +#include "gt/uc/intel_guc_submission.h"
>   
>   static int guc_info_show(struct seq_file *m, void *data)
>   {
> @@ -22,16 +24,35 @@ static int guc_info_show(struct seq_file *m, void *data)
>   	drm_puts(&p, "\n");
>   	intel_guc_log_info(&guc->log, &p);
>   
> -	/* Add more as required ... */
> +	if (!intel_guc_submission_is_used(guc))
> +		return 0;
> +
> +	intel_guc_ct_print_info(&guc->ct, &p);
> +	intel_guc_submission_print_info(guc, &p);
>   
>   	return 0;
>   }
>   DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_info);
>   
> +static int guc_registered_contexts_show(struct seq_file *m, void *data)
> +{
> +	struct intel_guc *guc = m->private;
> +	struct drm_printer p = drm_seq_file_printer(m);
> +
> +	if (!intel_guc_submission_is_used(guc))
> +		return -ENODEV;
> +
> +	intel_guc_submission_print_context_info(guc, &p);
> +
> +	return 0;
> +}
> +DEFINE_GT_DEBUGFS_ATTRIBUTE(guc_registered_contexts);
> +
>   void intel_guc_debugfs_register(struct intel_guc *guc, struct dentry *root)
>   {
>   	static const struct debugfs_gt_file files[] = {
>   		{ "guc_info", &guc_info_fops, NULL },
> +		{ "guc_registered_contexts", &guc_registered_contexts_fops, NULL },
>   	};
>   
>   	if (!intel_guc_is_supported(guc))
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 088d11e2e497..a2af7e17dcc2 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1602,3 +1602,58 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   
>   	return 0;
>   }
> +
> +void intel_guc_submission_print_info(struct intel_guc *guc,
> +				     struct drm_printer *p)
> +{
> +	struct i915_sched_engine *sched_engine = guc->sched_engine;
> +	struct rb_node *rb;
> +	unsigned long flags;
> +
> +	if (!sched_engine)
> +		return;
> +
> +	drm_printf(p, "GuC Number Outstanding Submission G2H: %u\n",
> +		   atomic_read(&guc->outstanding_submission_g2h));
> +	drm_printf(p, "GuC tasklet count: %u\n\n",
> +		   atomic_read(&sched_engine->tasklet.count));
> +
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	drm_printf(p, "Requests in GuC submit tasklet:\n");
> +	for (rb = rb_first_cached(&sched_engine->queue); rb; rb = rb_next(rb)) {
> +		struct i915_priolist *pl = to_priolist(rb);
> +		struct i915_request *rq;
> +
> +		priolist_for_each_request(rq, pl)
> +			drm_printf(p, "guc_id=%u, seqno=%llu\n",
> +				   rq->context->guc_id,
> +				   rq->fence.seqno);
> +	}
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	drm_printf(p, "\n");
> +}
> +
> +void intel_guc_submission_print_context_info(struct intel_guc *guc,
> +					     struct drm_printer *p)
> +{
> +	struct intel_context *ce;
> +	unsigned long index;
> +
> +	xa_for_each(&guc->context_lookup, index, ce) {
> +		drm_printf(p, "GuC lrc descriptor %u:\n", ce->guc_id);
> +		drm_printf(p, "\tHW Context Desc: 0x%08x\n", ce->lrc.lrca);
> +		drm_printf(p, "\t\tLRC Head: Internal %u, Memory %u\n",
> +			   ce->ring->head,
> +			   ce->lrc_reg_state[CTX_RING_HEAD]);
> +		drm_printf(p, "\t\tLRC Tail: Internal %u, Memory %u\n",
> +			   ce->ring->tail,
> +			   ce->lrc_reg_state[CTX_RING_TAIL]);
> +		drm_printf(p, "\t\tContext Pin Count: %u\n",
> +			   atomic_read(&ce->pin_count));
> +		drm_printf(p, "\t\tGuC ID Ref Count: %u\n",
> +			   atomic_read(&ce->guc_id_ref));
> +		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> +			   ce->guc_state.sched_state,
> +			   atomic_read(&ce->guc_sched_state_no_lock));
> +	}
> +}
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> index 3f7005018939..2b9470c90558 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.h
> @@ -10,6 +10,7 @@
>   
>   #include "intel_guc.h"
>   
> +struct drm_printer;
>   struct intel_engine_cs;
>   
>   void intel_guc_submission_init_early(struct intel_guc *guc);
> @@ -20,6 +21,10 @@ void intel_guc_submission_fini(struct intel_guc *guc);
>   int intel_guc_preempt_work_create(struct intel_guc *guc);
>   void intel_guc_preempt_work_destroy(struct intel_guc *guc);
>   int intel_guc_submission_setup(struct intel_engine_cs *engine);
> +void intel_guc_submission_print_info(struct intel_guc *guc,
> +				     struct drm_printer *p);
> +void intel_guc_submission_print_context_info(struct intel_guc *guc,
> +					     struct drm_printer *p);
>   
>   static inline bool intel_guc_submission_is_supported(struct intel_guc *guc)
>   {

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 17/51] drm/i915/guc: Add several request trace points
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20  1:27     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  1:27 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> Add trace points for request dependencies and GuC submit. Extended
> existing request trace points to include submit fence value,, guc_id,
Still has misplaced commas.

Also, Tvrtko has a bunch of comments/questions on the previous version 
that need to be addressed.

John.

> and ring tail value.
>
> v2: Fix white space alignment in i915_request_add trace point
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
>   drivers/gpu/drm/i915/i915_request.c           |  3 ++
>   drivers/gpu/drm/i915/i915_trace.h             | 43 +++++++++++++++++--
>   3 files changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index a2af7e17dcc2..480fb2184ecf 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -417,6 +417,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   			guc->stalled_request = last;
>   			return false;
>   		}
> +		trace_i915_request_guc_submit(last);
>   	}
>   
>   	guc->stalled_request = NULL;
> @@ -637,6 +638,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>   	ret = guc_add_request(guc, rq);
>   	if (ret == -EBUSY)
>   		guc->stalled_request = rq;
> +	else
> +		trace_i915_request_guc_submit(rq);
>   
>   	return ret;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 2b2b63cba06c..01aa3d1ee2b1 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1319,6 +1319,9 @@ __i915_request_await_execution(struct i915_request *to,
>   			return err;
>   	}
>   
> +	trace_i915_request_dep_to(to);
> +	trace_i915_request_dep_from(from);
> +
>   	/* Couple the dependency tree for PI on this exposed to->fence */
>   	if (to->engine->sched_engine->schedule) {
>   		err = i915_sched_node_add_dependency(&to->sched,
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 6778ad2a14a4..ea41d069bf7d 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -794,30 +794,50 @@ DECLARE_EVENT_CLASS(i915_request,
>   	    TP_STRUCT__entry(
>   			     __field(u32, dev)
>   			     __field(u64, ctx)
> +			     __field(u32, guc_id)
>   			     __field(u16, class)
>   			     __field(u16, instance)
>   			     __field(u32, seqno)
> +			     __field(u32, tail)
>   			     ),
>   
>   	    TP_fast_assign(
>   			   __entry->dev = rq->engine->i915->drm.primary->index;
>   			   __entry->class = rq->engine->uabi_class;
>   			   __entry->instance = rq->engine->uabi_instance;
> +			   __entry->guc_id = rq->context->guc_id;
>   			   __entry->ctx = rq->fence.context;
>   			   __entry->seqno = rq->fence.seqno;
> +			   __entry->tail = rq->tail;
>   			   ),
>   
> -	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
> +	    TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u",
>   		      __entry->dev, __entry->class, __entry->instance,
> -		      __entry->ctx, __entry->seqno)
> +		      __entry->guc_id, __entry->ctx, __entry->seqno,
> +		      __entry->tail)
>   );
>   
>   DEFINE_EVENT(i915_request, i915_request_add,
> -	    TP_PROTO(struct i915_request *rq),
> -	    TP_ARGS(rq)
> +	     TP_PROTO(struct i915_request *rq),
> +	     TP_ARGS(rq)
>   );
>   
>   #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
> +DEFINE_EVENT(i915_request, i915_request_dep_to,
> +	     TP_PROTO(struct i915_request *rq),
> +	     TP_ARGS(rq)
> +);
> +
> +DEFINE_EVENT(i915_request, i915_request_dep_from,
> +	     TP_PROTO(struct i915_request *rq),
> +	     TP_ARGS(rq)
> +);
> +
> +DEFINE_EVENT(i915_request, i915_request_guc_submit,
> +	     TP_PROTO(struct i915_request *rq),
> +	     TP_ARGS(rq)
> +);
> +
>   DEFINE_EVENT(i915_request, i915_request_submit,
>   	     TP_PROTO(struct i915_request *rq),
>   	     TP_ARGS(rq)
> @@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
>   
>   #else
>   #if !defined(TRACE_HEADER_MULTI_READ)
> +static inline void
> +trace_i915_request_dep_to(struct i915_request *rq)
> +{
> +}
> +
> +static inline void
> +trace_i915_request_dep_from(struct i915_request *rq)
> +{
> +}
> +
> +static inline void
> +trace_i915_request_guc_submit(struct i915_request *rq)
> +{
> +}
> +
>   static inline void
>   trace_i915_request_submit(struct i915_request *rq)
>   {


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 17/51] drm/i915/guc: Add several request trace points
@ 2021-07-20  1:27     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  1:27 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> Add trace points for request dependencies and GuC submit. Extended
> existing request trace points to include submit fence value,, guc_id,
Still has misplaced commas.

Also, Tvrtko has a bunch of comments/questions on the previous version 
that need to be addressed.

John.

> and ring tail value.
>
> v2: Fix white space alignment in i915_request_add trace point
>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
>   drivers/gpu/drm/i915/i915_request.c           |  3 ++
>   drivers/gpu/drm/i915/i915_trace.h             | 43 +++++++++++++++++--
>   3 files changed, 45 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index a2af7e17dcc2..480fb2184ecf 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -417,6 +417,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   			guc->stalled_request = last;
>   			return false;
>   		}
> +		trace_i915_request_guc_submit(last);
>   	}
>   
>   	guc->stalled_request = NULL;
> @@ -637,6 +638,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>   	ret = guc_add_request(guc, rq);
>   	if (ret == -EBUSY)
>   		guc->stalled_request = rq;
> +	else
> +		trace_i915_request_guc_submit(rq);
>   
>   	return ret;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 2b2b63cba06c..01aa3d1ee2b1 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1319,6 +1319,9 @@ __i915_request_await_execution(struct i915_request *to,
>   			return err;
>   	}
>   
> +	trace_i915_request_dep_to(to);
> +	trace_i915_request_dep_from(from);
> +
>   	/* Couple the dependency tree for PI on this exposed to->fence */
>   	if (to->engine->sched_engine->schedule) {
>   		err = i915_sched_node_add_dependency(&to->sched,
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 6778ad2a14a4..ea41d069bf7d 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -794,30 +794,50 @@ DECLARE_EVENT_CLASS(i915_request,
>   	    TP_STRUCT__entry(
>   			     __field(u32, dev)
>   			     __field(u64, ctx)
> +			     __field(u32, guc_id)
>   			     __field(u16, class)
>   			     __field(u16, instance)
>   			     __field(u32, seqno)
> +			     __field(u32, tail)
>   			     ),
>   
>   	    TP_fast_assign(
>   			   __entry->dev = rq->engine->i915->drm.primary->index;
>   			   __entry->class = rq->engine->uabi_class;
>   			   __entry->instance = rq->engine->uabi_instance;
> +			   __entry->guc_id = rq->context->guc_id;
>   			   __entry->ctx = rq->fence.context;
>   			   __entry->seqno = rq->fence.seqno;
> +			   __entry->tail = rq->tail;
>   			   ),
>   
> -	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
> +	    TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u",
>   		      __entry->dev, __entry->class, __entry->instance,
> -		      __entry->ctx, __entry->seqno)
> +		      __entry->guc_id, __entry->ctx, __entry->seqno,
> +		      __entry->tail)
>   );
>   
>   DEFINE_EVENT(i915_request, i915_request_add,
> -	    TP_PROTO(struct i915_request *rq),
> -	    TP_ARGS(rq)
> +	     TP_PROTO(struct i915_request *rq),
> +	     TP_ARGS(rq)
>   );
>   
>   #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
> +DEFINE_EVENT(i915_request, i915_request_dep_to,
> +	     TP_PROTO(struct i915_request *rq),
> +	     TP_ARGS(rq)
> +);
> +
> +DEFINE_EVENT(i915_request, i915_request_dep_from,
> +	     TP_PROTO(struct i915_request *rq),
> +	     TP_ARGS(rq)
> +);
> +
> +DEFINE_EVENT(i915_request, i915_request_guc_submit,
> +	     TP_PROTO(struct i915_request *rq),
> +	     TP_ARGS(rq)
> +);
> +
>   DEFINE_EVENT(i915_request, i915_request_submit,
>   	     TP_PROTO(struct i915_request *rq),
>   	     TP_ARGS(rq)
> @@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
>   
>   #else
>   #if !defined(TRACE_HEADER_MULTI_READ)
> +static inline void
> +trace_i915_request_dep_to(struct i915_request *rq)
> +{
> +}
> +
> +static inline void
> +trace_i915_request_dep_from(struct i915_request *rq)
> +{
> +}
> +
> +static inline void
> +trace_i915_request_guc_submit(struct i915_request *rq)
> +{
> +}
> +
>   static inline void
>   trace_i915_request_submit(struct i915_request *rq)
>   {

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20  1:28     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  1:28 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The serial number tracking of engines happens at the backend of
> request submission and was expecting to only be given physical
> engines. However, in GuC submission mode, the decomposition of virtual
> to physical engines does not happen in i915. Instead, requests are
> submitted to their virtual engine mask all the way through to the
> hardware (i.e. to GuC). This would mean that the heart beat code
> thinks the physical engines are idle due to the serial number not
> incrementing.
>
> This patch updates the tracking to decompose virtual engines into
> their physical constituents and tracks the request against each. This
> is not entirely accurate as the GuC will only be issuing the request
> to one physical engine. However, it is the best that i915 can do given
> that it has no knowledge of the GuC's scheduling decisions.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Still needs to pull in Tvrtko's updated subject and description.

John.

> ---
>   drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
>   .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
>   drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
>   drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c              |  4 +++-
>   6 files changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 1cb9c3b70b29..8ad304b2f2e4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -388,6 +388,8 @@ struct intel_engine_cs {
>   	void		(*park)(struct intel_engine_cs *engine);
>   	void		(*unpark)(struct intel_engine_cs *engine);
>   
> +	void		(*bump_serial)(struct intel_engine_cs *engine);
> +
>   	void		(*set_default_submission)(struct intel_engine_cs *engine);
>   
>   	const struct intel_context_ops *cops;
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 28492cdce706..920707e22eb0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs *engine)
>   	lrc_fini_wa_ctx(engine);
>   }
>   
> +static void execlist_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   static void
>   logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   {
> @@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->cops = &execlists_context_ops;
>   	engine->request_alloc = execlists_request_alloc;
> +	engine->bump_serial = execlist_bump_serial;
>   
>   	engine->reset.prepare = execlists_reset_prepare;
>   	engine->reset.rewind = execlists_reset_rewind;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 5c4d204d07cc..61469c631057 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine)
>   	}
>   }
>   
> +static void ring_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   static void setup_common(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> @@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine)
>   
>   	engine->cops = &ring_context_ops;
>   	engine->request_alloc = ring_request_alloc;
> +	engine->bump_serial = ring_bump_serial;
>   
>   	/*
>   	 * Using a global execution timeline; the previous final breadcrumb is
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 68970398e4ef..9203c766db80 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   	intel_engine_fini_retire(engine);
>   }
>   
> +static void mock_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   				    const char *name,
>   				    int id)
> @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   
>   	engine->base.cops = &mock_context_ops;
>   	engine->base.request_alloc = mock_request_alloc;
> +	engine->base.bump_serial = mock_bump_serial;
>   	engine->base.emit_flush = mock_emit_flush;
>   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>   	engine->base.submit_request = mock_submit_request;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 7b3e1c91e689..372e0dc7617a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1485,6 +1485,20 @@ static void guc_release(struct intel_engine_cs *engine)
>   	lrc_fini_wa_ctx(engine);
>   }
>   
> +static void guc_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
> +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_cs *e;
> +	intel_engine_mask_t tmp, mask = engine->mask;
> +
> +	for_each_engine_masked(e, engine->gt, mask, tmp)
> +		e->serial++;
> +}
> +
>   static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   {
>   	/* Default vfuncs which can be overridden by each engine. */
> @@ -1493,6 +1507,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->cops = &guc_context_ops;
>   	engine->request_alloc = guc_request_alloc;
> +	engine->bump_serial = guc_bump_serial;
>   
>   	engine->sched_engine->schedule = i915_schedule;
>   
> @@ -1828,6 +1843,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   
>   	ve->base.cops = &virtual_guc_context_ops;
>   	ve->base.request_alloc = guc_request_alloc;
> +	ve->base.bump_serial = virtual_guc_bump_serial;
>   
>   	ve->base.submit_request = guc_submit_request;
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 01aa3d1ee2b1..30ecdc46a12f 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -669,7 +669,9 @@ bool __i915_request_submit(struct i915_request *request)
>   				     request->ring->vaddr + request->postfix);
>   
>   	trace_i915_request_execute(request);
> -	engine->serial++;
> +	if (engine->bump_serial)
> +		engine->bump_serial(engine);
> +
>   	result = true;
>   
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines
@ 2021-07-20  1:28     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20  1:28 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The serial number tracking of engines happens at the backend of
> request submission and was expecting to only be given physical
> engines. However, in GuC submission mode, the decomposition of virtual
> to physical engines does not happen in i915. Instead, requests are
> submitted to their virtual engine mask all the way through to the
> hardware (i.e. to GuC). This would mean that the heart beat code
> thinks the physical engines are idle due to the serial number not
> incrementing.
>
> This patch updates the tracking to decompose virtual engines into
> their physical constituents and tracks the request against each. This
> is not entirely accurate as the GuC will only be issuing the request
> to one physical engine. However, it is the best that i915 can do given
> that it has no knowledge of the GuC's scheduling decisions.
>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Still needs to pull in Tvrtko's updated subject and description.

John.

> ---
>   drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
>   .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
>   drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
>   drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c              |  4 +++-
>   6 files changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 1cb9c3b70b29..8ad304b2f2e4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -388,6 +388,8 @@ struct intel_engine_cs {
>   	void		(*park)(struct intel_engine_cs *engine);
>   	void		(*unpark)(struct intel_engine_cs *engine);
>   
> +	void		(*bump_serial)(struct intel_engine_cs *engine);
> +
>   	void		(*set_default_submission)(struct intel_engine_cs *engine);
>   
>   	const struct intel_context_ops *cops;
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 28492cdce706..920707e22eb0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs *engine)
>   	lrc_fini_wa_ctx(engine);
>   }
>   
> +static void execlist_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   static void
>   logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   {
> @@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->cops = &execlists_context_ops;
>   	engine->request_alloc = execlists_request_alloc;
> +	engine->bump_serial = execlist_bump_serial;
>   
>   	engine->reset.prepare = execlists_reset_prepare;
>   	engine->reset.rewind = execlists_reset_rewind;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 5c4d204d07cc..61469c631057 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine)
>   	}
>   }
>   
> +static void ring_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   static void setup_common(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> @@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine)
>   
>   	engine->cops = &ring_context_ops;
>   	engine->request_alloc = ring_request_alloc;
> +	engine->bump_serial = ring_bump_serial;
>   
>   	/*
>   	 * Using a global execution timeline; the previous final breadcrumb is
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 68970398e4ef..9203c766db80 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   	intel_engine_fini_retire(engine);
>   }
>   
> +static void mock_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
>   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   				    const char *name,
>   				    int id)
> @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   
>   	engine->base.cops = &mock_context_ops;
>   	engine->base.request_alloc = mock_request_alloc;
> +	engine->base.bump_serial = mock_bump_serial;
>   	engine->base.emit_flush = mock_emit_flush;
>   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>   	engine->base.submit_request = mock_submit_request;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 7b3e1c91e689..372e0dc7617a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1485,6 +1485,20 @@ static void guc_release(struct intel_engine_cs *engine)
>   	lrc_fini_wa_ctx(engine);
>   }
>   
> +static void guc_bump_serial(struct intel_engine_cs *engine)
> +{
> +	engine->serial++;
> +}
> +
> +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_cs *e;
> +	intel_engine_mask_t tmp, mask = engine->mask;
> +
> +	for_each_engine_masked(e, engine->gt, mask, tmp)
> +		e->serial++;
> +}
> +
>   static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   {
>   	/* Default vfuncs which can be overridden by each engine. */
> @@ -1493,6 +1507,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->cops = &guc_context_ops;
>   	engine->request_alloc = guc_request_alloc;
> +	engine->bump_serial = guc_bump_serial;
>   
>   	engine->sched_engine->schedule = i915_schedule;
>   
> @@ -1828,6 +1843,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   
>   	ve->base.cops = &virtual_guc_context_ops;
>   	ve->base.request_alloc = guc_request_alloc;
> +	ve->base.bump_serial = virtual_guc_bump_serial;
>   
>   	ve->base.submit_request = guc_submit_request;
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 01aa3d1ee2b1..30ecdc46a12f 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -669,7 +669,9 @@ bool __i915_request_submit(struct i915_request *request)
>   				     request->ring->vaddr + request->postfix);
>   
>   	trace_i915_request_execute(request);
> -	engine->serial++;
> +	if (engine->bump_serial)
> +		engine->bump_serial(engine);
> +
>   	result = true;
>   
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-07-20  1:03     ` [Intel-gfx] " John Harrison
@ 2021-07-20  1:53       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  1:53 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On Mon, Jul 19, 2021 at 06:03:05PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > When running the GuC the GPU can't be considered idle if the GuC still
> > has contexts pinned. As such, a call has been added in
> > intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
> > the number of unpinned contexts to go to zero.
> > 
> > v2: rtimeout -> remaining_timeout
> > v3: Drop unnecessary includes, guc_submission_busy_loop ->
> > guc_submission_send_busy_loop, drop negatie timeout trick, move a
> > refactor of guc_context_unpin to earlier path (John H)
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
> >   drivers/gpu/drm/i915/gt/intel_gt.c            | 19 +++++
> >   drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +++++++++++++++++--
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 ++
> >   drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
> >   .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
> >   .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
> >   13 files changed, 129 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > index a90f796e85c0..6fffd4d377c2 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > @@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
> >   		goto insert;
> >   	/* Attempt to reap some mmap space from dead objects */
> > -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
> > +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
> > +					       NULL);
> >   	if (err)
> >   		goto err;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > index e714e21c0a4d..acfdd53b2678 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
> >   	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
> >   }
> > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > +{
> > +	long remaining_timeout;
> > +
> > +	/* If the device is asleep, we have no requests outstanding */
> > +	if (!intel_gt_pm_is_awake(gt))
> > +		return 0;
> > +
> > +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
> > +							   &remaining_timeout)) > 0) {
> > +		cond_resched();
> > +		if (signal_pending(current))
> > +			return -EINTR;
> > +	}
> > +
> > +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
> > +							  remaining_timeout);
> > +}
> > +
> >   int intel_gt_init(struct intel_gt *gt)
> >   {
> >   	int err;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> > index e7aabe0cc5bf..74e771871a9b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> > @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
> >   void intel_gt_driver_late_release(struct intel_gt *gt);
> > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > +
> >   void intel_gt_check_and_clear_faults(struct intel_gt *gt);
> >   void intel_gt_clear_error_registers(struct intel_gt *gt,
> >   				    intel_engine_mask_t engine_mask);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > index 647eca9d867a..edb881d75630 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > @@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
> >   	GEM_BUG_ON(engine->retire);
> >   }
> > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
> > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > +				      long *remaining_timeout)
> >   {
> >   	struct intel_gt_timelines *timelines = &gt->timelines;
> >   	struct intel_timeline *tl, *tn;
> > @@ -195,22 +196,10 @@ out_active:	spin_lock(&timelines->lock);
> >   	if (flush_submission(gt, timeout)) /* Wait, there's more! */
> >   		active_count++;
> > -	return active_count ? timeout : 0;
> > -}
> > -
> > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > -{
> > -	/* If the device is asleep, we have no requests outstanding */
> > -	if (!intel_gt_pm_is_awake(gt))
> > -		return 0;
> > -
> > -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
> > -		cond_resched();
> > -		if (signal_pending(current))
> > -			return -EINTR;
> > -	}
> > +	if (remaining_timeout)
> > +		*remaining_timeout = timeout;
> > -	return timeout;
> > +	return active_count ? timeout : 0;
> >   }
> >   static void retire_work_handler(struct work_struct *work)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > index fcc30a6e4fe9..83ff5280c06e 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> You were saying the the include of stddef is needed here?
> 

Yes, HDRTEST [1] complains otherwise.

[1] https://patchwork.freedesktop.org/series/91840/#rev3

> > @@ -10,10 +10,11 @@ struct intel_engine_cs;
> >   struct intel_gt;
> >   struct intel_timeline;
> > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
> > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > +				      long *remaining_timeout);
> >   static inline void intel_gt_retire_requests(struct intel_gt *gt)
> >   {
> > -	intel_gt_retire_requests_timeout(gt, 0);
> > +	intel_gt_retire_requests_timeout(gt, 0, NULL);
> >   }
> >   void intel_engine_init_retire(struct intel_engine_cs *engine);
> > @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
> >   			     struct intel_timeline *tl);
> >   void intel_engine_fini_retire(struct intel_engine_cs *engine);
> > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > -
> >   void intel_gt_init_requests(struct intel_gt *gt);
> >   void intel_gt_park_requests(struct intel_gt *gt);
> >   void intel_gt_unpark_requests(struct intel_gt *gt);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 80b88bae5f24..3cc566565224 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -39,6 +39,8 @@ struct intel_guc {
> >   	spinlock_t irq_lock;
> >   	unsigned int msg_enabled_mask;
> > +	atomic_t outstanding_submission_g2h;
> > +
> >   	struct {
> >   		void (*reset)(struct intel_guc *guc);
> >   		void (*enable)(struct intel_guc *guc);
> > @@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   	spin_unlock_irq(&guc->irq_lock);
> >   }
> > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > +
> >   int intel_guc_reset_engine(struct intel_guc *guc,
> >   			   struct intel_engine_cs *engine);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index c33906ec478d..f1cbed6b9f0a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
> >   	INIT_LIST_HEAD(&ct->requests.incoming);
> >   	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
> >   	tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
> > +	init_waitqueue_head(&ct->wq);
> >   }
> >   static inline const char *guc_ct_buffer_type_to_str(u32 type)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 785dfc5c6efb..4b30a562ae63 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -10,6 +10,7 @@
> >   #include <linux/spinlock.h>
> >   #include <linux/workqueue.h>
> >   #include <linux/ktime.h>
> > +#include <linux/wait.h>
> >   #include "intel_guc_fwif.h"
> > @@ -68,6 +69,9 @@ struct intel_guc_ct {
> >   	struct tasklet_struct receive_tasklet;
> > +	/** @wq: wait queue for g2h chanenl */
> > +	wait_queue_head_t wq;
> > +
> >   	struct {
> >   		u16 last_fence; /* last fence used to send request */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index f7e34baa9506..088d11e2e497 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -254,6 +254,69 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> >   }
> > +static int guc_submission_send_busy_loop(struct intel_guc* guc,
> > +					 const u32 *action,
> > +					 u32 len,
> > +					 u32 g2h_len_dw,
> > +					 bool loop)
> > +{
> > +	int err;
> > +
> > +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> > +
> > +	if (!err && g2h_len_dw)
> > +		atomic_inc(&guc->outstanding_submission_g2h);
> > +
> > +	return err;
> > +}
> > +
> > +static int guc_wait_for_pending_msg(struct intel_guc *guc,
> > +				    atomic_t *wait_var,
> > +				    bool interruptible,
> > +				    long timeout)
> > +{
> > +	const int state = interruptible ?
> > +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> > +	DEFINE_WAIT(wait);
> > +
> > +	might_sleep();
> > +	GEM_BUG_ON(timeout < 0);
> > +
> > +	if (!atomic_read(wait_var))
> > +		return 0;
> > +
> > +	if (!timeout)
> > +		return -ETIME;
> > +
> > +	for (;;) {
> > +		prepare_to_wait(&guc->ct.wq, &wait, state);
> > +
> > +		if (!atomic_read(wait_var))
> > +			break;
> > +
> > +		if (signal_pending_state(state, current)) {
> > +			timeout = -EINTR;
> > +			break;
> > +		}
> > +
> > +		if (!timeout) {
> > +			timeout = -ETIME;
> > +			break;
> > +		}
> > +
> > +		timeout = io_schedule_timeout(timeout);
> > +	}
> > +	finish_wait(&guc->ct.wq, &wait);
> > +
> > +	return (timeout < 0) ? timeout : 0;
> > +}
> > +
> > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> > +{
> > +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
> > +					true, timeout);
> > +}
> > +
> >   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> >   	int err;
> > @@ -280,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
> >   	if (!enabled && !err) {
> > +		atomic_inc(&guc->outstanding_submission_g2h);
> >   		set_context_enabled(ce);
> >   	} else if (!enabled) {
> >   		clr_context_pending_enable(ce);
> > @@ -731,7 +795,8 @@ static int __guc_action_register_context(struct intel_guc *guc,
> >   		offset,
> >   	};
> > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +					     0, true);
> >   }
> >   static int register_context(struct intel_context *ce)
> > @@ -751,8 +816,9 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> >   		guc_id,
> >   	};
> > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > -					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> > +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +					     G2H_LEN_DW_DEREGISTER_CONTEXT,
> > +					     true);
> >   }
> >   static int deregister_context(struct intel_context *ce, u32 guc_id)
> > @@ -893,8 +959,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> >   	intel_context_get(ce);
> > -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > -				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> >   }
> >   static u16 prep_context_pending_disable(struct intel_context *ce)
> > @@ -1440,6 +1506,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> >   	return ce;
> >   }
> > +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> > +{
> > +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
> > +		wake_up_all(&guc->ct.wq);
> > +}
> > +
> >   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   					  const u32 *msg,
> >   					  u32 len)
> > @@ -1475,6 +1547,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   		lrc_destroy(&ce->ref);
> >   	}
> > +	decr_outstanding_submission_g2h(guc);
> > +
> >   	return 0;
> >   }
> > @@ -1523,6 +1597,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >   	}
> > +	decr_outstanding_submission_g2h(guc);
> >   	intel_context_put(ce);
> >   	return 0;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > index 9c954c589edf..c4cef885e984 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
> >   #undef uc_state_checkers
> >   #undef __uc_state_checker
> > +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
> > +{
> > +	return intel_guc_wait_for_idle(&uc->guc, timeout);
> > +}
> > +
> >   #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
> >   static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
> >   { \
> > diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> > index 4d2d59a9942b..2b73ddb11c66 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> > @@ -27,6 +27,7 @@
> >    */
> >   #include "gem/i915_gem_context.h"
> > +#include "gt/intel_gt.h"
> Still not seeing a need for this.
> 
> >   #include "gt/intel_gt_requests.h"
> >   #include "i915_drv.h"
> > diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > index c130010a7033..1c721542e277 100644
> > --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > @@ -5,7 +5,7 @@
> >    */
> >   #include "i915_drv.h"
> > -#include "gt/intel_gt_requests.h"
> > +#include "gt/intel_gt.h"
> Nor this.
> 

We need these because intel_gt_wait_for_idle which moved from
"gt/intel_gt_requests.h" to "gt/intel_gt.h".

Matt

> John.
> 
> >   #include "../i915_selftest.h"
> >   #include "igt_flush_test.h"
> > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > index d189c4bd4bef..4f8180146888 100644
> > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915)
> >   	do {
> >   		for_each_engine(engine, gt, id)
> >   			mock_engine_flush(engine);
> > -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
> > +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
> > +						  NULL));
> >   }
> >   static void mock_device_release(struct drm_device *dev)
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
@ 2021-07-20  1:53       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  1:53 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 06:03:05PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > When running the GuC the GPU can't be considered idle if the GuC still
> > has contexts pinned. As such, a call has been added in
> > intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
> > the number of unpinned contexts to go to zero.
> > 
> > v2: rtimeout -> remaining_timeout
> > v3: Drop unnecessary includes, guc_submission_busy_loop ->
> > guc_submission_send_busy_loop, drop negatie timeout trick, move a
> > refactor of guc_context_unpin to earlier path (John H)
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
> >   drivers/gpu/drm/i915/gt/intel_gt.c            | 19 +++++
> >   drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
> >   drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +++++++++++++++++--
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 ++
> >   drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
> >   .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
> >   .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
> >   13 files changed, 129 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > index a90f796e85c0..6fffd4d377c2 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
> > @@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
> >   		goto insert;
> >   	/* Attempt to reap some mmap space from dead objects */
> > -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
> > +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
> > +					       NULL);
> >   	if (err)
> >   		goto err;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> > index e714e21c0a4d..acfdd53b2678 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> > @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
> >   	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
> >   }
> > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > +{
> > +	long remaining_timeout;
> > +
> > +	/* If the device is asleep, we have no requests outstanding */
> > +	if (!intel_gt_pm_is_awake(gt))
> > +		return 0;
> > +
> > +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
> > +							   &remaining_timeout)) > 0) {
> > +		cond_resched();
> > +		if (signal_pending(current))
> > +			return -EINTR;
> > +	}
> > +
> > +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
> > +							  remaining_timeout);
> > +}
> > +
> >   int intel_gt_init(struct intel_gt *gt)
> >   {
> >   	int err;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
> > index e7aabe0cc5bf..74e771871a9b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
> > @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
> >   void intel_gt_driver_late_release(struct intel_gt *gt);
> > +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > +
> >   void intel_gt_check_and_clear_faults(struct intel_gt *gt);
> >   void intel_gt_clear_error_registers(struct intel_gt *gt,
> >   				    intel_engine_mask_t engine_mask);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > index 647eca9d867a..edb881d75630 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> > @@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
> >   	GEM_BUG_ON(engine->retire);
> >   }
> > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
> > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > +				      long *remaining_timeout)
> >   {
> >   	struct intel_gt_timelines *timelines = &gt->timelines;
> >   	struct intel_timeline *tl, *tn;
> > @@ -195,22 +196,10 @@ out_active:	spin_lock(&timelines->lock);
> >   	if (flush_submission(gt, timeout)) /* Wait, there's more! */
> >   		active_count++;
> > -	return active_count ? timeout : 0;
> > -}
> > -
> > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
> > -{
> > -	/* If the device is asleep, we have no requests outstanding */
> > -	if (!intel_gt_pm_is_awake(gt))
> > -		return 0;
> > -
> > -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
> > -		cond_resched();
> > -		if (signal_pending(current))
> > -			return -EINTR;
> > -	}
> > +	if (remaining_timeout)
> > +		*remaining_timeout = timeout;
> > -	return timeout;
> > +	return active_count ? timeout : 0;
> >   }
> >   static void retire_work_handler(struct work_struct *work)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > index fcc30a6e4fe9..83ff5280c06e 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> You were saying the the include of stddef is needed here?
> 

Yes, HDRTEST [1] complains otherwise.

[1] https://patchwork.freedesktop.org/series/91840/#rev3

> > @@ -10,10 +10,11 @@ struct intel_engine_cs;
> >   struct intel_gt;
> >   struct intel_timeline;
> > -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
> > +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
> > +				      long *remaining_timeout);
> >   static inline void intel_gt_retire_requests(struct intel_gt *gt)
> >   {
> > -	intel_gt_retire_requests_timeout(gt, 0);
> > +	intel_gt_retire_requests_timeout(gt, 0, NULL);
> >   }
> >   void intel_engine_init_retire(struct intel_engine_cs *engine);
> > @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
> >   			     struct intel_timeline *tl);
> >   void intel_engine_fini_retire(struct intel_engine_cs *engine);
> > -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
> > -
> >   void intel_gt_init_requests(struct intel_gt *gt);
> >   void intel_gt_park_requests(struct intel_gt *gt);
> >   void intel_gt_unpark_requests(struct intel_gt *gt);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 80b88bae5f24..3cc566565224 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -39,6 +39,8 @@ struct intel_guc {
> >   	spinlock_t irq_lock;
> >   	unsigned int msg_enabled_mask;
> > +	atomic_t outstanding_submission_g2h;
> > +
> >   	struct {
> >   		void (*reset)(struct intel_guc *guc);
> >   		void (*enable)(struct intel_guc *guc);
> > @@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   	spin_unlock_irq(&guc->irq_lock);
> >   }
> > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > +
> >   int intel_guc_reset_engine(struct intel_guc *guc,
> >   			   struct intel_engine_cs *engine);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index c33906ec478d..f1cbed6b9f0a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
> >   	INIT_LIST_HEAD(&ct->requests.incoming);
> >   	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
> >   	tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
> > +	init_waitqueue_head(&ct->wq);
> >   }
> >   static inline const char *guc_ct_buffer_type_to_str(u32 type)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > index 785dfc5c6efb..4b30a562ae63 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
> > @@ -10,6 +10,7 @@
> >   #include <linux/spinlock.h>
> >   #include <linux/workqueue.h>
> >   #include <linux/ktime.h>
> > +#include <linux/wait.h>
> >   #include "intel_guc_fwif.h"
> > @@ -68,6 +69,9 @@ struct intel_guc_ct {
> >   	struct tasklet_struct receive_tasklet;
> > +	/** @wq: wait queue for g2h chanenl */
> > +	wait_queue_head_t wq;
> > +
> >   	struct {
> >   		u16 last_fence; /* last fence used to send request */
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index f7e34baa9506..088d11e2e497 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -254,6 +254,69 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> >   }
> > +static int guc_submission_send_busy_loop(struct intel_guc* guc,
> > +					 const u32 *action,
> > +					 u32 len,
> > +					 u32 g2h_len_dw,
> > +					 bool loop)
> > +{
> > +	int err;
> > +
> > +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
> > +
> > +	if (!err && g2h_len_dw)
> > +		atomic_inc(&guc->outstanding_submission_g2h);
> > +
> > +	return err;
> > +}
> > +
> > +static int guc_wait_for_pending_msg(struct intel_guc *guc,
> > +				    atomic_t *wait_var,
> > +				    bool interruptible,
> > +				    long timeout)
> > +{
> > +	const int state = interruptible ?
> > +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> > +	DEFINE_WAIT(wait);
> > +
> > +	might_sleep();
> > +	GEM_BUG_ON(timeout < 0);
> > +
> > +	if (!atomic_read(wait_var))
> > +		return 0;
> > +
> > +	if (!timeout)
> > +		return -ETIME;
> > +
> > +	for (;;) {
> > +		prepare_to_wait(&guc->ct.wq, &wait, state);
> > +
> > +		if (!atomic_read(wait_var))
> > +			break;
> > +
> > +		if (signal_pending_state(state, current)) {
> > +			timeout = -EINTR;
> > +			break;
> > +		}
> > +
> > +		if (!timeout) {
> > +			timeout = -ETIME;
> > +			break;
> > +		}
> > +
> > +		timeout = io_schedule_timeout(timeout);
> > +	}
> > +	finish_wait(&guc->ct.wq, &wait);
> > +
> > +	return (timeout < 0) ? timeout : 0;
> > +}
> > +
> > +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> > +{
> > +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
> > +					true, timeout);
> > +}
> > +
> >   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> >   	int err;
> > @@ -280,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
> >   	if (!enabled && !err) {
> > +		atomic_inc(&guc->outstanding_submission_g2h);
> >   		set_context_enabled(ce);
> >   	} else if (!enabled) {
> >   		clr_context_pending_enable(ce);
> > @@ -731,7 +795,8 @@ static int __guc_action_register_context(struct intel_guc *guc,
> >   		offset,
> >   	};
> > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +					     0, true);
> >   }
> >   static int register_context(struct intel_context *ce)
> > @@ -751,8 +816,9 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> >   		guc_id,
> >   	};
> > -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > -					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
> > +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +					     G2H_LEN_DW_DEREGISTER_CONTEXT,
> > +					     true);
> >   }
> >   static int deregister_context(struct intel_context *ce, u32 guc_id)
> > @@ -893,8 +959,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> >   	intel_context_get(ce);
> > -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > -				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> >   }
> >   static u16 prep_context_pending_disable(struct intel_context *ce)
> > @@ -1440,6 +1506,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> >   	return ce;
> >   }
> > +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
> > +{
> > +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
> > +		wake_up_all(&guc->ct.wq);
> > +}
> > +
> >   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   					  const u32 *msg,
> >   					  u32 len)
> > @@ -1475,6 +1547,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   		lrc_destroy(&ce->ref);
> >   	}
> > +	decr_outstanding_submission_g2h(guc);
> > +
> >   	return 0;
> >   }
> > @@ -1523,6 +1597,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >   	}
> > +	decr_outstanding_submission_g2h(guc);
> >   	intel_context_put(ce);
> >   	return 0;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > index 9c954c589edf..c4cef885e984 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
> >   #undef uc_state_checkers
> >   #undef __uc_state_checker
> > +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
> > +{
> > +	return intel_guc_wait_for_idle(&uc->guc, timeout);
> > +}
> > +
> >   #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
> >   static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
> >   { \
> > diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> > index 4d2d59a9942b..2b73ddb11c66 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> > @@ -27,6 +27,7 @@
> >    */
> >   #include "gem/i915_gem_context.h"
> > +#include "gt/intel_gt.h"
> Still not seeing a need for this.
> 
> >   #include "gt/intel_gt_requests.h"
> >   #include "i915_drv.h"
> > diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > index c130010a7033..1c721542e277 100644
> > --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
> > @@ -5,7 +5,7 @@
> >    */
> >   #include "i915_drv.h"
> > -#include "gt/intel_gt_requests.h"
> > +#include "gt/intel_gt.h"
> Nor this.
> 

We need these because intel_gt_wait_for_idle which moved from
"gt/intel_gt_requests.h" to "gt/intel_gt.h".

Matt

> John.
> 
> >   #include "../i915_selftest.h"
> >   #include "igt_flush_test.h"
> > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > index d189c4bd4bef..4f8180146888 100644
> > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> > @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915)
> >   	do {
> >   		for_each_engine(engine, gt, id)
> >   			mock_engine_flush(engine);
> > -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
> > +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
> > +						  NULL));
> >   }
> >   static void mock_device_release(struct drm_device *dev)
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines
  2021-07-20  1:28     ` [Intel-gfx] " John Harrison
@ 2021-07-20  1:54       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  1:54 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On Mon, Jul 19, 2021 at 06:28:47PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > The serial number tracking of engines happens at the backend of
> > request submission and was expecting to only be given physical
> > engines. However, in GuC submission mode, the decomposition of virtual
> > to physical engines does not happen in i915. Instead, requests are
> > submitted to their virtual engine mask all the way through to the
> > hardware (i.e. to GuC). This would mean that the heart beat code
> > thinks the physical engines are idle due to the serial number not
> > incrementing.
> > 
> > This patch updates the tracking to decompose virtual engines into
> > their physical constituents and tracks the request against each. This
> > is not entirely accurate as the GuC will only be issuing the request
> > to one physical engine. However, it is the best that i915 can do given
> > that it has no knowledge of the GuC's scheduling decisions.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Still needs to pull in Tvrtko's updated subject and description.
> 

Forgot this in the post, already have it fixed locally.

Matt

> John.
> 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
> >   .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
> >   drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
> >   drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
> >   drivers/gpu/drm/i915/i915_request.c              |  4 +++-
> >   6 files changed, 39 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 1cb9c3b70b29..8ad304b2f2e4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -388,6 +388,8 @@ struct intel_engine_cs {
> >   	void		(*park)(struct intel_engine_cs *engine);
> >   	void		(*unpark)(struct intel_engine_cs *engine);
> > +	void		(*bump_serial)(struct intel_engine_cs *engine);
> > +
> >   	void		(*set_default_submission)(struct intel_engine_cs *engine);
> >   	const struct intel_context_ops *cops;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 28492cdce706..920707e22eb0 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs *engine)
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > +static void execlist_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> >   static void
> >   logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> >   {
> > @@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->cops = &execlists_context_ops;
> >   	engine->request_alloc = execlists_request_alloc;
> > +	engine->bump_serial = execlist_bump_serial;
> >   	engine->reset.prepare = execlists_reset_prepare;
> >   	engine->reset.rewind = execlists_reset_rewind;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > index 5c4d204d07cc..61469c631057 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > @@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine)
> >   	}
> >   }
> > +static void ring_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> >   static void setup_common(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > @@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine)
> >   	engine->cops = &ring_context_ops;
> >   	engine->request_alloc = ring_request_alloc;
> > +	engine->bump_serial = ring_bump_serial;
> >   	/*
> >   	 * Using a global execution timeline; the previous final breadcrumb is
> > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > index 68970398e4ef..9203c766db80 100644
> > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
> >   	intel_engine_fini_retire(engine);
> >   }
> > +static void mock_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> >   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> >   				    const char *name,
> >   				    int id)
> > @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> >   	engine->base.cops = &mock_context_ops;
> >   	engine->base.request_alloc = mock_request_alloc;
> > +	engine->base.bump_serial = mock_bump_serial;
> >   	engine->base.emit_flush = mock_emit_flush;
> >   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
> >   	engine->base.submit_request = mock_submit_request;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 7b3e1c91e689..372e0dc7617a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1485,6 +1485,20 @@ static void guc_release(struct intel_engine_cs *engine)
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > +static void guc_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> > +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	struct intel_engine_cs *e;
> > +	intel_engine_mask_t tmp, mask = engine->mask;
> > +
> > +	for_each_engine_masked(e, engine->gt, mask, tmp)
> > +		e->serial++;
> > +}
> > +
> >   static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >   {
> >   	/* Default vfuncs which can be overridden by each engine. */
> > @@ -1493,6 +1507,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->cops = &guc_context_ops;
> >   	engine->request_alloc = guc_request_alloc;
> > +	engine->bump_serial = guc_bump_serial;
> >   	engine->sched_engine->schedule = i915_schedule;
> > @@ -1828,6 +1843,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   	ve->base.cops = &virtual_guc_context_ops;
> >   	ve->base.request_alloc = guc_request_alloc;
> > +	ve->base.bump_serial = virtual_guc_bump_serial;
> >   	ve->base.submit_request = guc_submit_request;
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 01aa3d1ee2b1..30ecdc46a12f 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -669,7 +669,9 @@ bool __i915_request_submit(struct i915_request *request)
> >   				     request->ring->vaddr + request->postfix);
> >   	trace_i915_request_execute(request);
> > -	engine->serial++;
> > +	if (engine->bump_serial)
> > +		engine->bump_serial(engine);
> > +
> >   	result = true;
> >   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines
@ 2021-07-20  1:54       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  1:54 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 06:28:47PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > The serial number tracking of engines happens at the backend of
> > request submission and was expecting to only be given physical
> > engines. However, in GuC submission mode, the decomposition of virtual
> > to physical engines does not happen in i915. Instead, requests are
> > submitted to their virtual engine mask all the way through to the
> > hardware (i.e. to GuC). This would mean that the heart beat code
> > thinks the physical engines are idle due to the serial number not
> > incrementing.
> > 
> > This patch updates the tracking to decompose virtual engines into
> > their physical constituents and tracks the request against each. This
> > is not entirely accurate as the GuC will only be issuing the request
> > to one physical engine. However, it is the best that i915 can do given
> > that it has no knowledge of the GuC's scheduling decisions.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Still needs to pull in Tvrtko's updated subject and description.
> 

Forgot this in the post, already have it fixed locally.

Matt

> John.
> 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
> >   .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
> >   drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
> >   drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
> >   drivers/gpu/drm/i915/i915_request.c              |  4 +++-
> >   6 files changed, 39 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 1cb9c3b70b29..8ad304b2f2e4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -388,6 +388,8 @@ struct intel_engine_cs {
> >   	void		(*park)(struct intel_engine_cs *engine);
> >   	void		(*unpark)(struct intel_engine_cs *engine);
> > +	void		(*bump_serial)(struct intel_engine_cs *engine);
> > +
> >   	void		(*set_default_submission)(struct intel_engine_cs *engine);
> >   	const struct intel_context_ops *cops;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 28492cdce706..920707e22eb0 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs *engine)
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > +static void execlist_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> >   static void
> >   logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> >   {
> > @@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->cops = &execlists_context_ops;
> >   	engine->request_alloc = execlists_request_alloc;
> > +	engine->bump_serial = execlist_bump_serial;
> >   	engine->reset.prepare = execlists_reset_prepare;
> >   	engine->reset.rewind = execlists_reset_rewind;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > index 5c4d204d07cc..61469c631057 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > @@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine)
> >   	}
> >   }
> > +static void ring_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> >   static void setup_common(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > @@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine)
> >   	engine->cops = &ring_context_ops;
> >   	engine->request_alloc = ring_request_alloc;
> > +	engine->bump_serial = ring_bump_serial;
> >   	/*
> >   	 * Using a global execution timeline; the previous final breadcrumb is
> > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > index 68970398e4ef..9203c766db80 100644
> > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
> >   	intel_engine_fini_retire(engine);
> >   }
> > +static void mock_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> >   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> >   				    const char *name,
> >   				    int id)
> > @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> >   	engine->base.cops = &mock_context_ops;
> >   	engine->base.request_alloc = mock_request_alloc;
> > +	engine->base.bump_serial = mock_bump_serial;
> >   	engine->base.emit_flush = mock_emit_flush;
> >   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
> >   	engine->base.submit_request = mock_submit_request;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 7b3e1c91e689..372e0dc7617a 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1485,6 +1485,20 @@ static void guc_release(struct intel_engine_cs *engine)
> >   	lrc_fini_wa_ctx(engine);
> >   }
> > +static void guc_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	engine->serial++;
> > +}
> > +
> > +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
> > +{
> > +	struct intel_engine_cs *e;
> > +	intel_engine_mask_t tmp, mask = engine->mask;
> > +
> > +	for_each_engine_masked(e, engine->gt, mask, tmp)
> > +		e->serial++;
> > +}
> > +
> >   static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >   {
> >   	/* Default vfuncs which can be overridden by each engine. */
> > @@ -1493,6 +1507,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->cops = &guc_context_ops;
> >   	engine->request_alloc = guc_request_alloc;
> > +	engine->bump_serial = guc_bump_serial;
> >   	engine->sched_engine->schedule = i915_schedule;
> > @@ -1828,6 +1843,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   	ve->base.cops = &virtual_guc_context_ops;
> >   	ve->base.request_alloc = guc_request_alloc;
> > +	ve->base.bump_serial = virtual_guc_bump_serial;
> >   	ve->base.submit_request = guc_submit_request;
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 01aa3d1ee2b1..30ecdc46a12f 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -669,7 +669,9 @@ bool __i915_request_submit(struct i915_request *request)
> >   				     request->ring->vaddr + request->postfix);
> >   	trace_i915_request_execute(request);
> > -	engine->serial++;
> > +	if (engine->bump_serial)
> > +		engine->bump_serial(engine);
> > +
> >   	result = true;
> >   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 17/51] drm/i915/guc: Add several request trace points
  2021-07-20  1:27     ` [Intel-gfx] " John Harrison
@ 2021-07-20  2:10       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  2:10 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On Mon, Jul 19, 2021 at 06:27:06PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > Add trace points for request dependencies and GuC submit. Extended
> > existing request trace points to include submit fence value,, guc_id,
> Still has misplaced commas.
> 
> Also, Tvrtko has a bunch of comments/questions on the previous version that
> need to be addressed.
> 

Replied. Landed on just deleting the contentious, albiet very useful,
trace points. Will revisit in the future.

Matt

> John.
> 
> > and ring tail value.
> > 
> > v2: Fix white space alignment in i915_request_add trace point
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> > ---
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
> >   drivers/gpu/drm/i915/i915_request.c           |  3 ++
> >   drivers/gpu/drm/i915/i915_trace.h             | 43 +++++++++++++++++--
> >   3 files changed, 45 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index a2af7e17dcc2..480fb2184ecf 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -417,6 +417,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> >   			guc->stalled_request = last;
> >   			return false;
> >   		}
> > +		trace_i915_request_guc_submit(last);
> >   	}
> >   	guc->stalled_request = NULL;
> > @@ -637,6 +638,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> >   	ret = guc_add_request(guc, rq);
> >   	if (ret == -EBUSY)
> >   		guc->stalled_request = rq;
> > +	else
> > +		trace_i915_request_guc_submit(rq);
> >   	return ret;
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 2b2b63cba06c..01aa3d1ee2b1 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1319,6 +1319,9 @@ __i915_request_await_execution(struct i915_request *to,
> >   			return err;
> >   	}
> > +	trace_i915_request_dep_to(to);
> > +	trace_i915_request_dep_from(from);
> > +
> >   	/* Couple the dependency tree for PI on this exposed to->fence */
> >   	if (to->engine->sched_engine->schedule) {
> >   		err = i915_sched_node_add_dependency(&to->sched,
> > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> > index 6778ad2a14a4..ea41d069bf7d 100644
> > --- a/drivers/gpu/drm/i915/i915_trace.h
> > +++ b/drivers/gpu/drm/i915/i915_trace.h
> > @@ -794,30 +794,50 @@ DECLARE_EVENT_CLASS(i915_request,
> >   	    TP_STRUCT__entry(
> >   			     __field(u32, dev)
> >   			     __field(u64, ctx)
> > +			     __field(u32, guc_id)
> >   			     __field(u16, class)
> >   			     __field(u16, instance)
> >   			     __field(u32, seqno)
> > +			     __field(u32, tail)
> >   			     ),
> >   	    TP_fast_assign(
> >   			   __entry->dev = rq->engine->i915->drm.primary->index;
> >   			   __entry->class = rq->engine->uabi_class;
> >   			   __entry->instance = rq->engine->uabi_instance;
> > +			   __entry->guc_id = rq->context->guc_id;
> >   			   __entry->ctx = rq->fence.context;
> >   			   __entry->seqno = rq->fence.seqno;
> > +			   __entry->tail = rq->tail;
> >   			   ),
> > -	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
> > +	    TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u",
> >   		      __entry->dev, __entry->class, __entry->instance,
> > -		      __entry->ctx, __entry->seqno)
> > +		      __entry->guc_id, __entry->ctx, __entry->seqno,
> > +		      __entry->tail)
> >   );
> >   DEFINE_EVENT(i915_request, i915_request_add,
> > -	    TP_PROTO(struct i915_request *rq),
> > -	    TP_ARGS(rq)
> > +	     TP_PROTO(struct i915_request *rq),
> > +	     TP_ARGS(rq)
> >   );
> >   #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
> > +DEFINE_EVENT(i915_request, i915_request_dep_to,
> > +	     TP_PROTO(struct i915_request *rq),
> > +	     TP_ARGS(rq)
> > +);
> > +
> > +DEFINE_EVENT(i915_request, i915_request_dep_from,
> > +	     TP_PROTO(struct i915_request *rq),
> > +	     TP_ARGS(rq)
> > +);
> > +
> > +DEFINE_EVENT(i915_request, i915_request_guc_submit,
> > +	     TP_PROTO(struct i915_request *rq),
> > +	     TP_ARGS(rq)
> > +);
> > +
> >   DEFINE_EVENT(i915_request, i915_request_submit,
> >   	     TP_PROTO(struct i915_request *rq),
> >   	     TP_ARGS(rq)
> > @@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
> >   #else
> >   #if !defined(TRACE_HEADER_MULTI_READ)
> > +static inline void
> > +trace_i915_request_dep_to(struct i915_request *rq)
> > +{
> > +}
> > +
> > +static inline void
> > +trace_i915_request_dep_from(struct i915_request *rq)
> > +{
> > +}
> > +
> > +static inline void
> > +trace_i915_request_guc_submit(struct i915_request *rq)
> > +{
> > +}
> > +
> >   static inline void
> >   trace_i915_request_submit(struct i915_request *rq)
> >   {
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 17/51] drm/i915/guc: Add several request trace points
@ 2021-07-20  2:10       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  2:10 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 06:27:06PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > Add trace points for request dependencies and GuC submit. Extended
> > existing request trace points to include submit fence value,, guc_id,
> Still has misplaced commas.
> 
> Also, Tvrtko has a bunch of comments/questions on the previous version that
> need to be addressed.
> 

Replied. Landed on just deleting the contentious, albiet very useful,
trace points. Will revisit in the future.

Matt

> John.
> 
> > and ring tail value.
> > 
> > v2: Fix white space alignment in i915_request_add trace point
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> > ---
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  3 ++
> >   drivers/gpu/drm/i915/i915_request.c           |  3 ++
> >   drivers/gpu/drm/i915/i915_trace.h             | 43 +++++++++++++++++--
> >   3 files changed, 45 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index a2af7e17dcc2..480fb2184ecf 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -417,6 +417,7 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> >   			guc->stalled_request = last;
> >   			return false;
> >   		}
> > +		trace_i915_request_guc_submit(last);
> >   	}
> >   	guc->stalled_request = NULL;
> > @@ -637,6 +638,8 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> >   	ret = guc_add_request(guc, rq);
> >   	if (ret == -EBUSY)
> >   		guc->stalled_request = rq;
> > +	else
> > +		trace_i915_request_guc_submit(rq);
> >   	return ret;
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 2b2b63cba06c..01aa3d1ee2b1 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1319,6 +1319,9 @@ __i915_request_await_execution(struct i915_request *to,
> >   			return err;
> >   	}
> > +	trace_i915_request_dep_to(to);
> > +	trace_i915_request_dep_from(from);
> > +
> >   	/* Couple the dependency tree for PI on this exposed to->fence */
> >   	if (to->engine->sched_engine->schedule) {
> >   		err = i915_sched_node_add_dependency(&to->sched,
> > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> > index 6778ad2a14a4..ea41d069bf7d 100644
> > --- a/drivers/gpu/drm/i915/i915_trace.h
> > +++ b/drivers/gpu/drm/i915/i915_trace.h
> > @@ -794,30 +794,50 @@ DECLARE_EVENT_CLASS(i915_request,
> >   	    TP_STRUCT__entry(
> >   			     __field(u32, dev)
> >   			     __field(u64, ctx)
> > +			     __field(u32, guc_id)
> >   			     __field(u16, class)
> >   			     __field(u16, instance)
> >   			     __field(u32, seqno)
> > +			     __field(u32, tail)
> >   			     ),
> >   	    TP_fast_assign(
> >   			   __entry->dev = rq->engine->i915->drm.primary->index;
> >   			   __entry->class = rq->engine->uabi_class;
> >   			   __entry->instance = rq->engine->uabi_instance;
> > +			   __entry->guc_id = rq->context->guc_id;
> >   			   __entry->ctx = rq->fence.context;
> >   			   __entry->seqno = rq->fence.seqno;
> > +			   __entry->tail = rq->tail;
> >   			   ),
> > -	    TP_printk("dev=%u, engine=%u:%u, ctx=%llu, seqno=%u",
> > +	    TP_printk("dev=%u, engine=%u:%u, guc_id=%u, ctx=%llu, seqno=%u, tail=%u",
> >   		      __entry->dev, __entry->class, __entry->instance,
> > -		      __entry->ctx, __entry->seqno)
> > +		      __entry->guc_id, __entry->ctx, __entry->seqno,
> > +		      __entry->tail)
> >   );
> >   DEFINE_EVENT(i915_request, i915_request_add,
> > -	    TP_PROTO(struct i915_request *rq),
> > -	    TP_ARGS(rq)
> > +	     TP_PROTO(struct i915_request *rq),
> > +	     TP_ARGS(rq)
> >   );
> >   #if defined(CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS)
> > +DEFINE_EVENT(i915_request, i915_request_dep_to,
> > +	     TP_PROTO(struct i915_request *rq),
> > +	     TP_ARGS(rq)
> > +);
> > +
> > +DEFINE_EVENT(i915_request, i915_request_dep_from,
> > +	     TP_PROTO(struct i915_request *rq),
> > +	     TP_ARGS(rq)
> > +);
> > +
> > +DEFINE_EVENT(i915_request, i915_request_guc_submit,
> > +	     TP_PROTO(struct i915_request *rq),
> > +	     TP_ARGS(rq)
> > +);
> > +
> >   DEFINE_EVENT(i915_request, i915_request_submit,
> >   	     TP_PROTO(struct i915_request *rq),
> >   	     TP_ARGS(rq)
> > @@ -887,6 +907,21 @@ TRACE_EVENT(i915_request_out,
> >   #else
> >   #if !defined(TRACE_HEADER_MULTI_READ)
> > +static inline void
> > +trace_i915_request_dep_to(struct i915_request *rq)
> > +{
> > +}
> > +
> > +static inline void
> > +trace_i915_request_dep_from(struct i915_request *rq)
> > +{
> > +}
> > +
> > +static inline void
> > +trace_i915_request_guc_submit(struct i915_request *rq)
> > +{
> > +}
> > +
> >   static inline void
> >   trace_i915_request_submit(struct i915_request *rq)
> >   {
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
  2021-07-20  0:23     ` [Intel-gfx] " John Harrison
@ 2021-07-20  2:45       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  2:45 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On Mon, Jul 19, 2021 at 05:23:55PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > Implement GuC context operations which includes GuC specific operations
> > alloc, pin, unpin, and destroy.
> > 
> > v2:
> >   (Daniel Vetter)
> >    - Use msleep_interruptible rather than cond_resched in busy loop
> >   (Michal)
> >    - Remove C++ style comment
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
> >   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
> >   drivers/gpu/drm/i915/i915_reg.h               |   1 +
> >   drivers/gpu/drm/i915/i915_request.c           |   1 +
> >   8 files changed, 685 insertions(+), 55 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index bd63813c8a80..32fd6647154b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >   	mutex_init(&ce->pin_mutex);
> > +	spin_lock_init(&ce->guc_state.lock);
> > +
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +	INIT_LIST_HEAD(&ce->guc_id_link);
> > +
> >   	i915_active_init(&ce->active,
> >   			 __intel_context_active, __intel_context_retire, 0);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 6d99631d19b9..606c480aec26 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -96,6 +96,7 @@ struct intel_context {
> >   #define CONTEXT_BANNED			6
> >   #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
> >   #define CONTEXT_NOPREEMPT		8
> > +#define CONTEXT_LRCA_DIRTY		9
> >   	struct {
> >   		u64 timeout_us;
> > @@ -138,14 +139,29 @@ struct intel_context {
> >   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> > +	struct {
> > +		/** lock: protects everything in guc_state */
> > +		spinlock_t lock;
> > +		/**
> > +		 * sched_state: scheduling state of this context using GuC
> > +		 * submission
> > +		 */
> > +		u8 sched_state;
> > +	} guc_state;
> > +
> >   	/* GuC scheduling state flags that do not require a lock. */
> >   	atomic_t guc_sched_state_no_lock;
> > +	/* GuC LRC descriptor ID */
> > +	u16 guc_id;
> > +
> > +	/* GuC LRC descriptor reference count */
> > +	atomic_t guc_id_ref;
> > +
> >   	/*
> > -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> > -	 * in the series will.
> > +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
> >   	 */
> > -	u16 guc_id;
> > +	struct list_head guc_id_link;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > index 41e5350a7a05..49d4857ad9b7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > @@ -87,7 +87,6 @@
> >   #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
> >   #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> > -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
> >   #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
> >   /* in Gen12 ID 0x7FF is reserved to indicate idle */
> >   #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 8c7b92f699f1..30773cd699f5 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -7,6 +7,7 @@
> >   #define _INTEL_GUC_H_
> >   #include <linux/xarray.h>
> > +#include <linux/delay.h>
> >   #include "intel_uncore.h"
> >   #include "intel_guc_fw.h"
> > @@ -44,6 +45,14 @@ struct intel_guc {
> >   		void (*disable)(struct intel_guc *guc);
> >   	} interrupts;
> > +	/*
> > +	 * contexts_lock protects the pool of free guc ids and a linked list of
> > +	 * guc ids available to be stolen
> > +	 */
> > +	spinlock_t contexts_lock;
> > +	struct ida guc_ids;
> > +	struct list_head guc_id_list;
> > +
> >   	bool submission_selected;
> >   	struct i915_vma *ads_vma;
> > @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >   				 response_buf, response_buf_size, 0);
> >   }
> > +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> > +					   const u32 *action,
> > +					   u32 len,
> > +					   bool loop)
> > +{
> > +	int err;
> > +	unsigned int sleep_period_ms = 1;
> > +	bool not_atomic = !in_atomic() && !irqs_disabled();
> > +
> > +	/* No sleeping with spin locks, just busy loop */
> > +	might_sleep_if(loop && not_atomic);
> > +
> > +retry:
> > +	err = intel_guc_send_nb(guc, action, len);
> > +	if (unlikely(err == -EBUSY && loop)) {
> > +		if (likely(not_atomic)) {
> > +			if (msleep_interruptible(sleep_period_ms))
> > +				return -EINTR;
> > +			sleep_period_ms = sleep_period_ms << 1;
> > +		} else {
> > +			cpu_relax();
> > +		}
> > +		goto retry;
> > +	}
> > +
> > +	return err;
> > +}
> > +
> >   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >   {
> >   	intel_guc_ct_event_handler(&guc->ct);
> > @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   int intel_guc_reset_engine(struct intel_guc *guc,
> >   			   struct intel_engine_cs *engine);
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg, u32 len);
> > +
> >   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 83ec60ea3f89..28ff82c5be45 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
> >   	case INTEL_GUC_ACTION_DEFAULT:
> >   		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
> >   		break;
> > +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> > +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> > +							    len);
> > +		break;
> >   	default:
> >   		ret = -EOPNOTSUPP;
> >   		break;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 53b4a5eb4a85..a47b3813b4d0 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -13,7 +13,9 @@
> >   #include "gt/intel_gt.h"
> >   #include "gt/intel_gt_irq.h"
> >   #include "gt/intel_gt_pm.h"
> > +#include "gt/intel_gt_requests.h"
> >   #include "gt/intel_lrc.h"
> > +#include "gt/intel_lrc_reg.h"
> >   #include "gt/intel_mocs.h"
> >   #include "gt/intel_ring.h"
> > @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
> >   		   &ce->guc_sched_state_no_lock);
> >   }
> > +/*
> > + * Below is a set of functions which control the GuC scheduling state which
> > + * require a lock, aside from the special case where the functions are called
> > + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> > + * path to be executing on the context.
> > + */
> > +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> > +#define SCHED_STATE_DESTROYED				BIT(1)
> > +static inline void init_sched_state(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> > +	ce->guc_state.sched_state = 0;
> > +}
> > +
> > +static inline bool
> > +context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state &
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> > +}
> > +
> > +static inline void
> > +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	ce->guc_state.sched_state |=
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> > +}
> > +
> > +static inline void
> > +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state =
> > +		(ce->guc_state.sched_state &
> > +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> > +}
> > +
> > +static inline bool
> > +context_destroyed(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> > +}
> > +
> > +static inline void
> > +set_context_destroyed(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> > +}
> > +
> > +static inline bool context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> > +}
> > +
> > +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +}
> > +
> > +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> > +{
> > +	return &ce->engine->gt->uc.guc;
> > +}
> > +
> >   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >   {
> >   	return rb_entry(rb, struct i915_priolist, node);
> > @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	int len = 0;
> >   	bool enabled = context_enabled(ce);
> > +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> > +	GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> >   	if (!enabled) {
> >   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >   		action[len++] = ce->guc_id;
> > @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >   	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > +	spin_lock_init(&guc->contexts_lock);
> > +	INIT_LIST_HEAD(&guc->guc_id_list);
> > +	ida_init(&guc->guc_ids);
> > +
> >   	return 0;
> >   }
> > @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
> >   	i915_sched_engine_put(guc->sched_engine);
> >   }
> > -static int guc_context_alloc(struct intel_context *ce)
> > +static inline void queue_request(struct i915_sched_engine *sched_engine,
> > +				 struct i915_request *rq,
> > +				 int prio)
> >   {
> > -	return lrc_alloc(ce, ce->engine);
> > +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > +	list_add_tail(&rq->sched.link,
> > +		      i915_sched_lookup_priolist(sched_engine, prio));
> > +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +}
> > +
> > +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > +				     struct i915_request *rq)
> > +{
> > +	int ret;
> > +
> > +	__i915_request_submit(rq);
> > +
> > +	trace_i915_request_in(rq, 0);
> > +
> > +	guc_set_lrc_tail(rq);
> > +	ret = guc_add_request(guc, rq);
> > +	if (ret == -EBUSY)
> > +		guc->stalled_request = rq;
> > +
> > +	return ret;
> > +}
> > +
> > +static void guc_submit_request(struct i915_request *rq)
> > +{
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > +	unsigned long flags;
> > +
> > +	/* Will be called from irq-context when using foreign fences. */
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +
> > +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > +		queue_request(sched_engine, rq, rq_prio(rq));
> > +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > +		tasklet_hi_schedule(&sched_engine->tasklet);
> > +
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
> Reserved for what?
>

A memory corruption which I found today. This is no longer needed.
 
> > +static int new_guc_id(struct intel_guc *guc)
> > +{
> > +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> > +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> > +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > +}
> > +
> > +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	if (!context_guc_id_invalid(ce)) {
> > +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> > +		reset_lrc_desc(guc, ce->guc_id);
> > +		set_context_guc_id_invalid(ce);
> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +}
> > +
> > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	__release_guc_id(guc, ce);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int steal_guc_id(struct intel_guc *guc)
> > +{
> > +	struct intel_context *ce;
> > +	int guc_id;
> > +
> > +	if (!list_empty(&guc->guc_id_list)) {
> > +		ce = list_first_entry(&guc->guc_id_list,
> > +				      struct intel_context,
> > +				      guc_id_link);
> > +
> > +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +		GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> > +		list_del_init(&ce->guc_id_link);
> > +		guc_id = ce->guc_id;
> > +		set_context_guc_id_invalid(ce);
> > +		return guc_id;
> > +	} else {
> > +		return -EAGAIN;
> > +	}
> > +}
> > +
> > +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> > +{
> > +	int ret;
> > +
> > +	ret = new_guc_id(guc);
> > +	if (unlikely(ret < 0)) {
> > +		ret = steal_guc_id(guc);
> > +		if (ret < 0)
> > +			return ret;
> > +	}
> > +
> > +	*out = ret;
> > +	return 0;
> > +}
> > +
> > +#define PIN_GUC_ID_TRIES	4
> > +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +
> > +try_again:
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +
> > +	if (context_guc_id_invalid(ce)) {
> > +		ret = assign_guc_id(guc, &ce->guc_id);
> > +		if (ret)
> > +			goto out_unlock;
> > +		ret = 1;	/* Indidcates newly assigned guc_id */
> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	atomic_inc(&ce->guc_id_ref);
> > +
> > +out_unlock:
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> > +	 * outstanding requests to see if that frees up a guc_id. If the first
> > +	 * retire didn't help, insert a sleep with the timeslice duration before
> > +	 * attempting to retire more requests. Double the sleep period each
> > +	 * subsequent pass before finally giving up. The sleep period has max of
> > +	 * 100ms and minimum of 1ms.
> > +	 */
> > +	if (ret == -EAGAIN && --tries) {
> > +		if (PIN_GUC_ID_TRIES - tries > 1) {
> > +			unsigned int timeslice_shifted =
> > +				ce->engine->props.timeslice_duration_ms <<
> > +				(PIN_GUC_ID_TRIES - tries - 2);
> > +			unsigned int max = min_t(unsigned int, 100,
> > +						 timeslice_shifted);
> > +
> > +			msleep(max_t(unsigned int, max, 1));
> > +		}
> > +		intel_gt_retire_requests(guc_to_gt(guc));
> > +		goto try_again;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> > +	    !atomic_read(&ce->guc_id_ref))
> > +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int __guc_action_register_context(struct intel_guc *guc,
> > +					 u32 guc_id,
> > +					 u32 offset)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > +		guc_id,
> > +		offset,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int register_context(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > +		ce->guc_id * sizeof(struct guc_lrc_desc);
> > +
> > +	return __guc_action_register_context(guc, ce->guc_id, offset);
> > +}
> > +
> > +static int __guc_action_deregister_context(struct intel_guc *guc,
> > +					   u32 guc_id)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > +		guc_id,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int deregister_context(struct intel_context *ce, u32 guc_id)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	return __guc_action_deregister_context(guc, guc_id);
> > +}
> > +
> > +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > +{
> > +	switch (class) {
> > +	case RENDER_CLASS:
> > +		return mask >> RCS0;
> > +	case VIDEO_ENHANCEMENT_CLASS:
> > +		return mask >> VECS0;
> > +	case VIDEO_DECODE_CLASS:
> > +		return mask >> VCS0;
> > +	case COPY_ENGINE_CLASS:
> > +		return mask >> BCS0;
> > +	default:
> > +		GEM_BUG_ON("Invalid Class");
> > +		return 0;
> > +	}
> > +}
> > +
> > +static void guc_context_policy_init(struct intel_engine_cs *engine,
> > +				    struct guc_lrc_desc *desc)
> > +{
> > +	desc->policy_flags = 0;
> > +
> > +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> > +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> > +}
> > +
> > +static int guc_lrc_desc_pin(struct intel_context *ce)
> > +{
> > +	struct intel_runtime_pm *runtime_pm =
> > +		&ce->engine->gt->i915->runtime_pm;
> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	u32 desc_idx = ce->guc_id;
> > +	struct guc_lrc_desc *desc;
> > +	bool context_registered;
> > +	intel_wakeref_t wakeref;
> > +	int ret = 0;
> > +
> > +	GEM_BUG_ON(!engine->mask);
> > +
> > +	/*
> > +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> > +	 * based on CT vma region.
> > +	 */
> > +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> > +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> > +
> > +	context_registered = lrc_desc_registered(guc, desc_idx);
> > +
> > +	reset_lrc_desc(guc, desc_idx);
> > +	set_lrc_desc_registered(guc, desc_idx, ce);
> > +
> > +	desc = __get_lrc_desc(guc, desc_idx);
> > +	desc->engine_class = engine_class_to_guc_class(engine->class);
> > +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> > +						      engine->mask);
> > +	desc->hw_context_desc = ce->lrc.lrca;
> > +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > +	guc_context_policy_init(engine, desc);
> > +	init_sched_state(ce);
> > +
> > +	/*
> > +	 * The context_lookup xarray is used to determine if the hardware
> > +	 * context is currently registered. There are two cases in which it
> > +	 * could be regisgered either the guc_id has been stole from from
> 'regisgered' and 'stole' should be 'stolen'
> 

Yep.

> > +	 * another context or the lrc descriptor address of this context has
> > +	 * changed. In either case the context needs to be deregistered with the
> > +	 * GuC before registering this context.
> > +	 */
> > +	if (context_registered) {
> > +		set_context_wait_for_deregister_to_register(ce);
> > +		intel_context_get(ce);
> > +
> > +		/*
> > +		 * If stealing the guc_id, this ce has the same guc_id as the
> > +		 * context whos guc_id was stole.
> whose guc_id was stolen
> 

Yep.

> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = deregister_context(ce, ce->guc_id);
> > +	} else {
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = register_context(ce);
> > +	}
> > +
> > +	return ret;
> >   }
> >   static int guc_context_pre_pin(struct intel_context *ce,
> > @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
> >   static int guc_context_pin(struct intel_context *ce, void *vaddr)
> >   {
> > +	if (i915_ggtt_offset(ce->state) !=
> > +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> > +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> > +
> >   	return lrc_pin(ce, ce->engine, vaddr);
> >   }
> > +static void guc_context_unpin(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	unpin_guc_id(guc, ce);
> > +	lrc_unpin(ce);
> > +}
> > +
> > +static void guc_context_post_unpin(struct intel_context *ce)
> > +{
> > +	lrc_post_unpin(ce);
> > +}
> > +
> > +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> Use ce_to_guc?
>

Yes.
 
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> > +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> > +
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	set_context_destroyed(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +	deregister_context(ce, ce->guc_id);
> > +}
> > +
> > +static void guc_context_destroy(struct kref *kref)
> > +{
> > +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> > +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> Also ce_to_guc?
> 

Yes.

> > +	intel_wakeref_t wakeref;
> > +	unsigned long flags;
> > +
> > +	/*
> > +	 * If the guc_id is invalid this context has been stolen and we can free
> > +	 * it immediately. Also can be freed immediately if the context is not
> > +	 * registered with the GuC.
> > +	 */
> > +	if (context_guc_id_invalid(ce) ||
> > +	    !lrc_desc_registered(guc, ce->guc_id)) {
> > +		release_guc_id(guc, ce);
> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * We have to acquire the context spinlock and check guc_id again, if it
> > +	 * is valid it hasn't been stolen and needs to be deregistered. We
> > +	 * delete this context from the list of unpinned guc_ids available to
> > +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> stole -> steal
> 
> > +	 * returns indicating this context has been deregistered the guc_id is
> > +	 * returned to the pool of available guc_ids.
> > +	 */
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (context_guc_id_invalid(ce)) {
> > +		__release_guc_id(guc, ce);
> > +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * We defer GuC context deregistration until the context is destroyed
> > +	 * in order to save on CTBs. With this optimization ideally we only need
> > +	 * 1 CTB to register the context during the first pin and 1 CTB to
> > +	 * deregister the context when the context is destroyed. Without this
> > +	 * optimization, a CTB would be needed every pin & unpin.
> > +	 *
> > +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
> FIXME rather than XXX?
> 

I think XXX is correct as it isn't broken. After this series gets merged
have a patch in the follow on series that fixes this.

> > +	 * from context_free_worker when not runtime wakeref is held.
> when runtime wakeref is not held
> 

Yep.

> > +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> > +	 * in H2G CTB to deregister the context. A future patch may defer this
> > +	 * H2G CTB if the runtime wakeref is zero.
> > +	 */
> > +	with_intel_runtime_pm(runtime_pm, wakeref)
> > +		guc_lrc_desc_unpin(ce);
> > +}
> > +
> > +static int guc_context_alloc(struct intel_context *ce)
> > +{
> > +	return lrc_alloc(ce, ce->engine);
> > +}
> > +
> >   static const struct intel_context_ops guc_context_ops = {
> >   	.alloc = guc_context_alloc,
> >   	.pre_pin = guc_context_pre_pin,
> >   	.pin = guc_context_pin,
> > -	.unpin = lrc_unpin,
> > -	.post_unpin = lrc_post_unpin,
> > +	.unpin = guc_context_unpin,
> > +	.post_unpin = guc_context_post_unpin,
> >   	.enter = intel_context_enter_engine,
> >   	.exit = intel_context_exit_engine,
> >   	.reset = lrc_reset,
> > -	.destroy = lrc_destroy,
> > +	.destroy = guc_context_destroy,
> >   };
> > -static int guc_request_alloc(struct i915_request *request)
> > +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> >   {
> > +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > +}
> > +
> > +static int guc_request_alloc(struct i915_request *rq)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +	struct intel_guc *guc = ce_to_guc(ce);
> >   	int ret;
> > -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> > +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
> >   	/*
> >   	 * Flush enough space to reduce the likelihood of waiting after
> >   	 * we start building the request - in which case we will just
> >   	 * have to repeat work.
> >   	 */
> > -	request->reserved_space += GUC_REQUEST_SIZE;
> > +	rq->reserved_space += GUC_REQUEST_SIZE;
> >   	/*
> >   	 * Note that after this point, we have committed to using
> > @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
> >   	 */
> >   	/* Unconditionally invalidate GPU caches and TLBs. */
> > -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> > +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> >   	if (ret)
> >   		return ret;
> > -	request->reserved_space -= GUC_REQUEST_SIZE;
> > -	return 0;
> > -}
> > +	rq->reserved_space -= GUC_REQUEST_SIZE;
> > -static inline void queue_request(struct i915_sched_engine *sched_engine,
> > -				 struct i915_request *rq,
> > -				 int prio)
> > -{
> > -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > -	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(sched_engine, prio));
> > -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > -}
> > -
> > -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > -				     struct i915_request *rq)
> > -{
> > -	int ret;
> > -
> > -	__i915_request_submit(rq);
> > -
> > -	trace_i915_request_in(rq, 0);
> > -
> > -	guc_set_lrc_tail(rq);
> > -	ret = guc_add_request(guc, rq);
> > -	if (ret == -EBUSY)
> > -		guc->stalled_request = rq;
> > -
> > -	return ret;
> > -}
> > -
> > -static void guc_submit_request(struct i915_request *rq)
> > -{
> > -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > -	unsigned long flags;
> > +	/*
> > +	 * Call pin_guc_id here rather than in the pinning step as with
> > +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> > +	 * guc_ids and creating horrible race conditions. This is especially bad
> > +	 * when guc_ids are being stolen due to over subscription. By the time
> > +	 * this function is reached, it is guaranteed that the guc_id will be
> > +	 * persistent until the generated request is retired. Thus, sealing these
> > +	 * race conditions. It is still safe to fail here if guc_ids are
> > +	 * exhausted and return -EAGAIN to the user indicating that they can try
> > +	 * again in the future.
> > +	 *
> > +	 * There is no need for a lock here as the timeline mutex ensures at
> > +	 * most one context can be executing this code path at once. The
> > +	 * guc_id_ref is incremented once for every request in flight and
> > +	 * decremented on each retire. When it is zero, a lock around the
> > +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> > +	 */
> > +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> > +		return 0;
> > -	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> > +	if (unlikely(ret < 0))
> > +		return ret;;
> ;;
> 
> > +	if (context_needs_register(ce, !!ret)) {
> > +		ret = guc_lrc_desc_pin(ce);
> > +		if (unlikely(ret)) {	/* unwind */
> > +			atomic_dec(&ce->guc_id_ref);
> > +			unpin_guc_id(guc, ce);
> > +			return ret;
> > +		}
> > +	}
> > -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > -		queue_request(sched_engine, rq, rq_prio(rq));
> > -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > -		tasklet_hi_schedule(&sched_engine->tasklet);
> > +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> > -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +	return 0;
> >   }
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
> >   	engine->submit_request = guc_submit_request;
> >   }
> > +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> > +					  struct intel_context *ce)
> > +{
> > +	if (context_guc_id_invalid(ce))
> > +		pin_guc_id(guc, ce);
> > +	guc_lrc_desc_pin(ce);
> > +}
> > +
> > +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > +{
> > +	struct intel_gt *gt = guc_to_gt(guc);
> > +	struct intel_engine_cs *engine;
> > +	enum intel_engine_id id;
> > +
> > +	/* make sure all descriptors are clean... */
> > +	xa_destroy(&guc->context_lookup);
> > +
> > +	/*
> > +	 * Some contexts might have been pinned before we enabled GuC
> > +	 * submission, so we need to add them to the GuC bookeeping.
> > +	 * Also, after a reset the GuC we want to make sure that the information
> reset of the GuC
> 

Yep.

> > +	 * shared with GuC is properly reset. The kernel lrcs are not attached
> LRCs
> 

Yep.

> > +	 * to the gem_context, so they need to be added separately.
> > +	 *
> > +	 * Note: we purposely do not check the error return of
> check the return of gldp for errors

Yep.

> > +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> > +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> > +	 * enough here (we're either only pinning a handful of lrc on first boot
> LRCs and same on line below
> 
> > +	 * or we're re-pinning lrcs that were already pinned before the reset).
> > +	 * Two, if the GuC has died and CTBs can't make forward progress.
> > +	 * Presumably, the GuC should be alive as this function is called on
> > +	 * driver load or after a reset. Even if it is dead, another full GPU
> GPU -> GT
> 

This comment it stale. Need just update the whole thing.

> > +	 * reset will be triggered and this function would be called again.
> > +	 */
> > +
> > +	for_each_engine(engine, gt, id)
> > +		if (engine->kernel_context)
> > +			guc_kernel_context_pin(guc, engine->kernel_context);
> > +}
> > +
> >   static void guc_release(struct intel_engine_cs *engine)
> >   {
> >   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> > @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   void intel_guc_submission_enable(struct intel_guc *guc)
> >   {
> > +	guc_init_lrc_mapping(guc);
> >   }
> >   void intel_guc_submission_disable(struct intel_guc *guc)
> > @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
> >   {
> >   	guc->submission_selected = __guc_submission_selected(guc);
> >   }
> > +
> > +static inline struct intel_context *
> > +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> > +{
> > +	struct intel_context *ce;
> > +
> > +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Invalid desc_idx %u", desc_idx);
> Agree these should not be BUG_ONs or similar, but a drm_err seems
> appropriate. If either this or the error below occurs the something has gone
> quite wrong and it should be reported.
>

Sure.
 
> > +		return NULL;
> > +	}
> > +
> > +	ce = __get_context(guc, desc_idx);
> > +	if (unlikely(!ce)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Context is NULL, desc_idx %u", desc_idx);
> > +		return NULL;
> > +	}
> > +
> > +	return ce;
> > +}
> > +
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg,
> > +					  u32 len)
> > +{
> > +	struct intel_context *ce;
> > +	u32 desc_idx = msg[0];
> > +
> > +	if (unlikely(len < 1)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> As above, should be a drm_err I would say.
>

Sure.

Matt
 
> > +		return -EPROTO;
> > +	}
> > +
> > +	ce = g2h_context_lookup(guc, desc_idx);
> > +	if (unlikely(!ce))
> > +		return -EPROTO;
> > +
> > +	if (context_wait_for_deregister_to_register(ce)) {
> > +		struct intel_runtime_pm *runtime_pm =
> > +			&ce->engine->gt->i915->runtime_pm;
> > +		intel_wakeref_t wakeref;
> > +
> > +		/*
> > +		 * Previous owner of this guc_id has been deregistered, now safe
> > +		 * register this context.
> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			register_context(ce);
> > +		clr_context_wait_for_deregister_to_register(ce);
> > +		intel_context_put(ce);
> > +	} else if (context_destroyed(ce)) {
> > +		/* Context has been destroyed */
> > +		release_guc_id(guc, ce);
> > +		lrc_destroy(&ce->ref);
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 943fe485c662..204c95c39353 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -4142,6 +4142,7 @@ enum {
> >   	FAULT_AND_CONTINUE /* Unsupported */
> >   };
> > +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
> >   #define GEN8_CTX_VALID (1 << 0)
> >   #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
> >   #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 86b4c9f2613d..b48c4905d3fc 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
> >   	 */
> >   	if (!list_empty(&rq->sched.link))
> >   		remove_from_engine(rq);
> > +	atomic_dec(&rq->context->guc_id_ref);
> >   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> >   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
@ 2021-07-20  2:45       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  2:45 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 05:23:55PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > Implement GuC context operations which includes GuC specific operations
> > alloc, pin, unpin, and destroy.
> > 
> > v2:
> >   (Daniel Vetter)
> >    - Use msleep_interruptible rather than cond_resched in busy loop
> >   (Michal)
> >    - Remove C++ style comment
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
> >   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
> >   drivers/gpu/drm/i915/i915_reg.h               |   1 +
> >   drivers/gpu/drm/i915/i915_request.c           |   1 +
> >   8 files changed, 685 insertions(+), 55 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index bd63813c8a80..32fd6647154b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >   	mutex_init(&ce->pin_mutex);
> > +	spin_lock_init(&ce->guc_state.lock);
> > +
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +	INIT_LIST_HEAD(&ce->guc_id_link);
> > +
> >   	i915_active_init(&ce->active,
> >   			 __intel_context_active, __intel_context_retire, 0);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 6d99631d19b9..606c480aec26 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -96,6 +96,7 @@ struct intel_context {
> >   #define CONTEXT_BANNED			6
> >   #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
> >   #define CONTEXT_NOPREEMPT		8
> > +#define CONTEXT_LRCA_DIRTY		9
> >   	struct {
> >   		u64 timeout_us;
> > @@ -138,14 +139,29 @@ struct intel_context {
> >   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> > +	struct {
> > +		/** lock: protects everything in guc_state */
> > +		spinlock_t lock;
> > +		/**
> > +		 * sched_state: scheduling state of this context using GuC
> > +		 * submission
> > +		 */
> > +		u8 sched_state;
> > +	} guc_state;
> > +
> >   	/* GuC scheduling state flags that do not require a lock. */
> >   	atomic_t guc_sched_state_no_lock;
> > +	/* GuC LRC descriptor ID */
> > +	u16 guc_id;
> > +
> > +	/* GuC LRC descriptor reference count */
> > +	atomic_t guc_id_ref;
> > +
> >   	/*
> > -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> > -	 * in the series will.
> > +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
> >   	 */
> > -	u16 guc_id;
> > +	struct list_head guc_id_link;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > index 41e5350a7a05..49d4857ad9b7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > @@ -87,7 +87,6 @@
> >   #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
> >   #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> > -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
> >   #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
> >   /* in Gen12 ID 0x7FF is reserved to indicate idle */
> >   #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 8c7b92f699f1..30773cd699f5 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -7,6 +7,7 @@
> >   #define _INTEL_GUC_H_
> >   #include <linux/xarray.h>
> > +#include <linux/delay.h>
> >   #include "intel_uncore.h"
> >   #include "intel_guc_fw.h"
> > @@ -44,6 +45,14 @@ struct intel_guc {
> >   		void (*disable)(struct intel_guc *guc);
> >   	} interrupts;
> > +	/*
> > +	 * contexts_lock protects the pool of free guc ids and a linked list of
> > +	 * guc ids available to be stolen
> > +	 */
> > +	spinlock_t contexts_lock;
> > +	struct ida guc_ids;
> > +	struct list_head guc_id_list;
> > +
> >   	bool submission_selected;
> >   	struct i915_vma *ads_vma;
> > @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >   				 response_buf, response_buf_size, 0);
> >   }
> > +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> > +					   const u32 *action,
> > +					   u32 len,
> > +					   bool loop)
> > +{
> > +	int err;
> > +	unsigned int sleep_period_ms = 1;
> > +	bool not_atomic = !in_atomic() && !irqs_disabled();
> > +
> > +	/* No sleeping with spin locks, just busy loop */
> > +	might_sleep_if(loop && not_atomic);
> > +
> > +retry:
> > +	err = intel_guc_send_nb(guc, action, len);
> > +	if (unlikely(err == -EBUSY && loop)) {
> > +		if (likely(not_atomic)) {
> > +			if (msleep_interruptible(sleep_period_ms))
> > +				return -EINTR;
> > +			sleep_period_ms = sleep_period_ms << 1;
> > +		} else {
> > +			cpu_relax();
> > +		}
> > +		goto retry;
> > +	}
> > +
> > +	return err;
> > +}
> > +
> >   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >   {
> >   	intel_guc_ct_event_handler(&guc->ct);
> > @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   int intel_guc_reset_engine(struct intel_guc *guc,
> >   			   struct intel_engine_cs *engine);
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg, u32 len);
> > +
> >   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 83ec60ea3f89..28ff82c5be45 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
> >   	case INTEL_GUC_ACTION_DEFAULT:
> >   		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
> >   		break;
> > +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> > +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> > +							    len);
> > +		break;
> >   	default:
> >   		ret = -EOPNOTSUPP;
> >   		break;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 53b4a5eb4a85..a47b3813b4d0 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -13,7 +13,9 @@
> >   #include "gt/intel_gt.h"
> >   #include "gt/intel_gt_irq.h"
> >   #include "gt/intel_gt_pm.h"
> > +#include "gt/intel_gt_requests.h"
> >   #include "gt/intel_lrc.h"
> > +#include "gt/intel_lrc_reg.h"
> >   #include "gt/intel_mocs.h"
> >   #include "gt/intel_ring.h"
> > @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
> >   		   &ce->guc_sched_state_no_lock);
> >   }
> > +/*
> > + * Below is a set of functions which control the GuC scheduling state which
> > + * require a lock, aside from the special case where the functions are called
> > + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> > + * path to be executing on the context.
> > + */
> > +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> > +#define SCHED_STATE_DESTROYED				BIT(1)
> > +static inline void init_sched_state(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> > +	ce->guc_state.sched_state = 0;
> > +}
> > +
> > +static inline bool
> > +context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state &
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> > +}
> > +
> > +static inline void
> > +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	ce->guc_state.sched_state |=
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> > +}
> > +
> > +static inline void
> > +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state =
> > +		(ce->guc_state.sched_state &
> > +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> > +}
> > +
> > +static inline bool
> > +context_destroyed(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> > +}
> > +
> > +static inline void
> > +set_context_destroyed(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> > +}
> > +
> > +static inline bool context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> > +}
> > +
> > +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +}
> > +
> > +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> > +{
> > +	return &ce->engine->gt->uc.guc;
> > +}
> > +
> >   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >   {
> >   	return rb_entry(rb, struct i915_priolist, node);
> > @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	int len = 0;
> >   	bool enabled = context_enabled(ce);
> > +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> > +	GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> >   	if (!enabled) {
> >   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >   		action[len++] = ce->guc_id;
> > @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >   	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > +	spin_lock_init(&guc->contexts_lock);
> > +	INIT_LIST_HEAD(&guc->guc_id_list);
> > +	ida_init(&guc->guc_ids);
> > +
> >   	return 0;
> >   }
> > @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
> >   	i915_sched_engine_put(guc->sched_engine);
> >   }
> > -static int guc_context_alloc(struct intel_context *ce)
> > +static inline void queue_request(struct i915_sched_engine *sched_engine,
> > +				 struct i915_request *rq,
> > +				 int prio)
> >   {
> > -	return lrc_alloc(ce, ce->engine);
> > +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > +	list_add_tail(&rq->sched.link,
> > +		      i915_sched_lookup_priolist(sched_engine, prio));
> > +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +}
> > +
> > +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > +				     struct i915_request *rq)
> > +{
> > +	int ret;
> > +
> > +	__i915_request_submit(rq);
> > +
> > +	trace_i915_request_in(rq, 0);
> > +
> > +	guc_set_lrc_tail(rq);
> > +	ret = guc_add_request(guc, rq);
> > +	if (ret == -EBUSY)
> > +		guc->stalled_request = rq;
> > +
> > +	return ret;
> > +}
> > +
> > +static void guc_submit_request(struct i915_request *rq)
> > +{
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > +	unsigned long flags;
> > +
> > +	/* Will be called from irq-context when using foreign fences. */
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +
> > +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > +		queue_request(sched_engine, rq, rq_prio(rq));
> > +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > +		tasklet_hi_schedule(&sched_engine->tasklet);
> > +
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
> Reserved for what?
>

A memory corruption which I found today. This is no longer needed.
 
> > +static int new_guc_id(struct intel_guc *guc)
> > +{
> > +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> > +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> > +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > +}
> > +
> > +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	if (!context_guc_id_invalid(ce)) {
> > +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> > +		reset_lrc_desc(guc, ce->guc_id);
> > +		set_context_guc_id_invalid(ce);
> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +}
> > +
> > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	__release_guc_id(guc, ce);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int steal_guc_id(struct intel_guc *guc)
> > +{
> > +	struct intel_context *ce;
> > +	int guc_id;
> > +
> > +	if (!list_empty(&guc->guc_id_list)) {
> > +		ce = list_first_entry(&guc->guc_id_list,
> > +				      struct intel_context,
> > +				      guc_id_link);
> > +
> > +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +		GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> > +		list_del_init(&ce->guc_id_link);
> > +		guc_id = ce->guc_id;
> > +		set_context_guc_id_invalid(ce);
> > +		return guc_id;
> > +	} else {
> > +		return -EAGAIN;
> > +	}
> > +}
> > +
> > +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> > +{
> > +	int ret;
> > +
> > +	ret = new_guc_id(guc);
> > +	if (unlikely(ret < 0)) {
> > +		ret = steal_guc_id(guc);
> > +		if (ret < 0)
> > +			return ret;
> > +	}
> > +
> > +	*out = ret;
> > +	return 0;
> > +}
> > +
> > +#define PIN_GUC_ID_TRIES	4
> > +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +
> > +try_again:
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +
> > +	if (context_guc_id_invalid(ce)) {
> > +		ret = assign_guc_id(guc, &ce->guc_id);
> > +		if (ret)
> > +			goto out_unlock;
> > +		ret = 1;	/* Indidcates newly assigned guc_id */
> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	atomic_inc(&ce->guc_id_ref);
> > +
> > +out_unlock:
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> > +	 * outstanding requests to see if that frees up a guc_id. If the first
> > +	 * retire didn't help, insert a sleep with the timeslice duration before
> > +	 * attempting to retire more requests. Double the sleep period each
> > +	 * subsequent pass before finally giving up. The sleep period has max of
> > +	 * 100ms and minimum of 1ms.
> > +	 */
> > +	if (ret == -EAGAIN && --tries) {
> > +		if (PIN_GUC_ID_TRIES - tries > 1) {
> > +			unsigned int timeslice_shifted =
> > +				ce->engine->props.timeslice_duration_ms <<
> > +				(PIN_GUC_ID_TRIES - tries - 2);
> > +			unsigned int max = min_t(unsigned int, 100,
> > +						 timeslice_shifted);
> > +
> > +			msleep(max_t(unsigned int, max, 1));
> > +		}
> > +		intel_gt_retire_requests(guc_to_gt(guc));
> > +		goto try_again;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> > +	    !atomic_read(&ce->guc_id_ref))
> > +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int __guc_action_register_context(struct intel_guc *guc,
> > +					 u32 guc_id,
> > +					 u32 offset)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > +		guc_id,
> > +		offset,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int register_context(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > +		ce->guc_id * sizeof(struct guc_lrc_desc);
> > +
> > +	return __guc_action_register_context(guc, ce->guc_id, offset);
> > +}
> > +
> > +static int __guc_action_deregister_context(struct intel_guc *guc,
> > +					   u32 guc_id)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > +		guc_id,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int deregister_context(struct intel_context *ce, u32 guc_id)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	return __guc_action_deregister_context(guc, guc_id);
> > +}
> > +
> > +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > +{
> > +	switch (class) {
> > +	case RENDER_CLASS:
> > +		return mask >> RCS0;
> > +	case VIDEO_ENHANCEMENT_CLASS:
> > +		return mask >> VECS0;
> > +	case VIDEO_DECODE_CLASS:
> > +		return mask >> VCS0;
> > +	case COPY_ENGINE_CLASS:
> > +		return mask >> BCS0;
> > +	default:
> > +		GEM_BUG_ON("Invalid Class");
> > +		return 0;
> > +	}
> > +}
> > +
> > +static void guc_context_policy_init(struct intel_engine_cs *engine,
> > +				    struct guc_lrc_desc *desc)
> > +{
> > +	desc->policy_flags = 0;
> > +
> > +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> > +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> > +}
> > +
> > +static int guc_lrc_desc_pin(struct intel_context *ce)
> > +{
> > +	struct intel_runtime_pm *runtime_pm =
> > +		&ce->engine->gt->i915->runtime_pm;
> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	u32 desc_idx = ce->guc_id;
> > +	struct guc_lrc_desc *desc;
> > +	bool context_registered;
> > +	intel_wakeref_t wakeref;
> > +	int ret = 0;
> > +
> > +	GEM_BUG_ON(!engine->mask);
> > +
> > +	/*
> > +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> > +	 * based on CT vma region.
> > +	 */
> > +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> > +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> > +
> > +	context_registered = lrc_desc_registered(guc, desc_idx);
> > +
> > +	reset_lrc_desc(guc, desc_idx);
> > +	set_lrc_desc_registered(guc, desc_idx, ce);
> > +
> > +	desc = __get_lrc_desc(guc, desc_idx);
> > +	desc->engine_class = engine_class_to_guc_class(engine->class);
> > +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> > +						      engine->mask);
> > +	desc->hw_context_desc = ce->lrc.lrca;
> > +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > +	guc_context_policy_init(engine, desc);
> > +	init_sched_state(ce);
> > +
> > +	/*
> > +	 * The context_lookup xarray is used to determine if the hardware
> > +	 * context is currently registered. There are two cases in which it
> > +	 * could be regisgered either the guc_id has been stole from from
> 'regisgered' and 'stole' should be 'stolen'
> 

Yep.

> > +	 * another context or the lrc descriptor address of this context has
> > +	 * changed. In either case the context needs to be deregistered with the
> > +	 * GuC before registering this context.
> > +	 */
> > +	if (context_registered) {
> > +		set_context_wait_for_deregister_to_register(ce);
> > +		intel_context_get(ce);
> > +
> > +		/*
> > +		 * If stealing the guc_id, this ce has the same guc_id as the
> > +		 * context whos guc_id was stole.
> whose guc_id was stolen
> 

Yep.

> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = deregister_context(ce, ce->guc_id);
> > +	} else {
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = register_context(ce);
> > +	}
> > +
> > +	return ret;
> >   }
> >   static int guc_context_pre_pin(struct intel_context *ce,
> > @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
> >   static int guc_context_pin(struct intel_context *ce, void *vaddr)
> >   {
> > +	if (i915_ggtt_offset(ce->state) !=
> > +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> > +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> > +
> >   	return lrc_pin(ce, ce->engine, vaddr);
> >   }
> > +static void guc_context_unpin(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	unpin_guc_id(guc, ce);
> > +	lrc_unpin(ce);
> > +}
> > +
> > +static void guc_context_post_unpin(struct intel_context *ce)
> > +{
> > +	lrc_post_unpin(ce);
> > +}
> > +
> > +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> Use ce_to_guc?
>

Yes.
 
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> > +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> > +
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	set_context_destroyed(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +	deregister_context(ce, ce->guc_id);
> > +}
> > +
> > +static void guc_context_destroy(struct kref *kref)
> > +{
> > +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> > +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> Also ce_to_guc?
> 

Yes.

> > +	intel_wakeref_t wakeref;
> > +	unsigned long flags;
> > +
> > +	/*
> > +	 * If the guc_id is invalid this context has been stolen and we can free
> > +	 * it immediately. Also can be freed immediately if the context is not
> > +	 * registered with the GuC.
> > +	 */
> > +	if (context_guc_id_invalid(ce) ||
> > +	    !lrc_desc_registered(guc, ce->guc_id)) {
> > +		release_guc_id(guc, ce);
> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * We have to acquire the context spinlock and check guc_id again, if it
> > +	 * is valid it hasn't been stolen and needs to be deregistered. We
> > +	 * delete this context from the list of unpinned guc_ids available to
> > +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> stole -> steal
> 
> > +	 * returns indicating this context has been deregistered the guc_id is
> > +	 * returned to the pool of available guc_ids.
> > +	 */
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (context_guc_id_invalid(ce)) {
> > +		__release_guc_id(guc, ce);
> > +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * We defer GuC context deregistration until the context is destroyed
> > +	 * in order to save on CTBs. With this optimization ideally we only need
> > +	 * 1 CTB to register the context during the first pin and 1 CTB to
> > +	 * deregister the context when the context is destroyed. Without this
> > +	 * optimization, a CTB would be needed every pin & unpin.
> > +	 *
> > +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
> FIXME rather than XXX?
> 

I think XXX is correct as it isn't broken. After this series gets merged
have a patch in the follow on series that fixes this.

> > +	 * from context_free_worker when not runtime wakeref is held.
> when runtime wakeref is not held
> 

Yep.

> > +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> > +	 * in H2G CTB to deregister the context. A future patch may defer this
> > +	 * H2G CTB if the runtime wakeref is zero.
> > +	 */
> > +	with_intel_runtime_pm(runtime_pm, wakeref)
> > +		guc_lrc_desc_unpin(ce);
> > +}
> > +
> > +static int guc_context_alloc(struct intel_context *ce)
> > +{
> > +	return lrc_alloc(ce, ce->engine);
> > +}
> > +
> >   static const struct intel_context_ops guc_context_ops = {
> >   	.alloc = guc_context_alloc,
> >   	.pre_pin = guc_context_pre_pin,
> >   	.pin = guc_context_pin,
> > -	.unpin = lrc_unpin,
> > -	.post_unpin = lrc_post_unpin,
> > +	.unpin = guc_context_unpin,
> > +	.post_unpin = guc_context_post_unpin,
> >   	.enter = intel_context_enter_engine,
> >   	.exit = intel_context_exit_engine,
> >   	.reset = lrc_reset,
> > -	.destroy = lrc_destroy,
> > +	.destroy = guc_context_destroy,
> >   };
> > -static int guc_request_alloc(struct i915_request *request)
> > +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> >   {
> > +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > +}
> > +
> > +static int guc_request_alloc(struct i915_request *rq)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +	struct intel_guc *guc = ce_to_guc(ce);
> >   	int ret;
> > -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> > +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
> >   	/*
> >   	 * Flush enough space to reduce the likelihood of waiting after
> >   	 * we start building the request - in which case we will just
> >   	 * have to repeat work.
> >   	 */
> > -	request->reserved_space += GUC_REQUEST_SIZE;
> > +	rq->reserved_space += GUC_REQUEST_SIZE;
> >   	/*
> >   	 * Note that after this point, we have committed to using
> > @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
> >   	 */
> >   	/* Unconditionally invalidate GPU caches and TLBs. */
> > -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> > +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> >   	if (ret)
> >   		return ret;
> > -	request->reserved_space -= GUC_REQUEST_SIZE;
> > -	return 0;
> > -}
> > +	rq->reserved_space -= GUC_REQUEST_SIZE;
> > -static inline void queue_request(struct i915_sched_engine *sched_engine,
> > -				 struct i915_request *rq,
> > -				 int prio)
> > -{
> > -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > -	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(sched_engine, prio));
> > -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > -}
> > -
> > -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > -				     struct i915_request *rq)
> > -{
> > -	int ret;
> > -
> > -	__i915_request_submit(rq);
> > -
> > -	trace_i915_request_in(rq, 0);
> > -
> > -	guc_set_lrc_tail(rq);
> > -	ret = guc_add_request(guc, rq);
> > -	if (ret == -EBUSY)
> > -		guc->stalled_request = rq;
> > -
> > -	return ret;
> > -}
> > -
> > -static void guc_submit_request(struct i915_request *rq)
> > -{
> > -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > -	unsigned long flags;
> > +	/*
> > +	 * Call pin_guc_id here rather than in the pinning step as with
> > +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> > +	 * guc_ids and creating horrible race conditions. This is especially bad
> > +	 * when guc_ids are being stolen due to over subscription. By the time
> > +	 * this function is reached, it is guaranteed that the guc_id will be
> > +	 * persistent until the generated request is retired. Thus, sealing these
> > +	 * race conditions. It is still safe to fail here if guc_ids are
> > +	 * exhausted and return -EAGAIN to the user indicating that they can try
> > +	 * again in the future.
> > +	 *
> > +	 * There is no need for a lock here as the timeline mutex ensures at
> > +	 * most one context can be executing this code path at once. The
> > +	 * guc_id_ref is incremented once for every request in flight and
> > +	 * decremented on each retire. When it is zero, a lock around the
> > +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> > +	 */
> > +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> > +		return 0;
> > -	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> > +	if (unlikely(ret < 0))
> > +		return ret;;
> ;;
> 
> > +	if (context_needs_register(ce, !!ret)) {
> > +		ret = guc_lrc_desc_pin(ce);
> > +		if (unlikely(ret)) {	/* unwind */
> > +			atomic_dec(&ce->guc_id_ref);
> > +			unpin_guc_id(guc, ce);
> > +			return ret;
> > +		}
> > +	}
> > -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > -		queue_request(sched_engine, rq, rq_prio(rq));
> > -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > -		tasklet_hi_schedule(&sched_engine->tasklet);
> > +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> > -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +	return 0;
> >   }
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
> >   	engine->submit_request = guc_submit_request;
> >   }
> > +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> > +					  struct intel_context *ce)
> > +{
> > +	if (context_guc_id_invalid(ce))
> > +		pin_guc_id(guc, ce);
> > +	guc_lrc_desc_pin(ce);
> > +}
> > +
> > +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > +{
> > +	struct intel_gt *gt = guc_to_gt(guc);
> > +	struct intel_engine_cs *engine;
> > +	enum intel_engine_id id;
> > +
> > +	/* make sure all descriptors are clean... */
> > +	xa_destroy(&guc->context_lookup);
> > +
> > +	/*
> > +	 * Some contexts might have been pinned before we enabled GuC
> > +	 * submission, so we need to add them to the GuC bookeeping.
> > +	 * Also, after a reset the GuC we want to make sure that the information
> reset of the GuC
> 

Yep.

> > +	 * shared with GuC is properly reset. The kernel lrcs are not attached
> LRCs
> 

Yep.

> > +	 * to the gem_context, so they need to be added separately.
> > +	 *
> > +	 * Note: we purposely do not check the error return of
> check the return of gldp for errors

Yep.

> > +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> > +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> > +	 * enough here (we're either only pinning a handful of lrc on first boot
> LRCs and same on line below
> 
> > +	 * or we're re-pinning lrcs that were already pinned before the reset).
> > +	 * Two, if the GuC has died and CTBs can't make forward progress.
> > +	 * Presumably, the GuC should be alive as this function is called on
> > +	 * driver load or after a reset. Even if it is dead, another full GPU
> GPU -> GT
> 

This comment it stale. Need just update the whole thing.

> > +	 * reset will be triggered and this function would be called again.
> > +	 */
> > +
> > +	for_each_engine(engine, gt, id)
> > +		if (engine->kernel_context)
> > +			guc_kernel_context_pin(guc, engine->kernel_context);
> > +}
> > +
> >   static void guc_release(struct intel_engine_cs *engine)
> >   {
> >   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> > @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   void intel_guc_submission_enable(struct intel_guc *guc)
> >   {
> > +	guc_init_lrc_mapping(guc);
> >   }
> >   void intel_guc_submission_disable(struct intel_guc *guc)
> > @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
> >   {
> >   	guc->submission_selected = __guc_submission_selected(guc);
> >   }
> > +
> > +static inline struct intel_context *
> > +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> > +{
> > +	struct intel_context *ce;
> > +
> > +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Invalid desc_idx %u", desc_idx);
> Agree these should not be BUG_ONs or similar, but a drm_err seems
> appropriate. If either this or the error below occurs the something has gone
> quite wrong and it should be reported.
>

Sure.
 
> > +		return NULL;
> > +	}
> > +
> > +	ce = __get_context(guc, desc_idx);
> > +	if (unlikely(!ce)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Context is NULL, desc_idx %u", desc_idx);
> > +		return NULL;
> > +	}
> > +
> > +	return ce;
> > +}
> > +
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg,
> > +					  u32 len)
> > +{
> > +	struct intel_context *ce;
> > +	u32 desc_idx = msg[0];
> > +
> > +	if (unlikely(len < 1)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> As above, should be a drm_err I would say.
>

Sure.

Matt
 
> > +		return -EPROTO;
> > +	}
> > +
> > +	ce = g2h_context_lookup(guc, desc_idx);
> > +	if (unlikely(!ce))
> > +		return -EPROTO;
> > +
> > +	if (context_wait_for_deregister_to_register(ce)) {
> > +		struct intel_runtime_pm *runtime_pm =
> > +			&ce->engine->gt->i915->runtime_pm;
> > +		intel_wakeref_t wakeref;
> > +
> > +		/*
> > +		 * Previous owner of this guc_id has been deregistered, now safe
> > +		 * register this context.
> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			register_context(ce);
> > +		clr_context_wait_for_deregister_to_register(ce);
> > +		intel_context_put(ce);
> > +	} else if (context_destroyed(ce)) {
> > +		/* Context has been destroyed */
> > +		release_guc_id(guc, ce);
> > +		lrc_destroy(&ce->ref);
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 943fe485c662..204c95c39353 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -4142,6 +4142,7 @@ enum {
> >   	FAULT_AND_CONTINUE /* Unsupported */
> >   };
> > +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
> >   #define GEN8_CTX_VALID (1 << 0)
> >   #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
> >   #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 86b4c9f2613d..b48c4905d3fc 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
> >   	 */
> >   	if (!list_empty(&rq->sched.link))
> >   		remove_from_engine(rq);
> > +	atomic_dec(&rq->context->guc_id_ref);
> >   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> >   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 12/51] drm/i915/guc: Ensure request ordering via completion fences
  2021-07-19 23:46     ` [Intel-gfx] " Daniele Ceraolo Spurio
@ 2021-07-20  2:48       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  2:48 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, john.c.harrison, dri-devel

On Mon, Jul 19, 2021 at 04:46:57PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > If two requests are on the same ring, they are explicitly ordered by the
> > HW. So, a submission fence is sufficient to ensure ordering when using
> > the new GuC submission interface. Conversely, if two requests share a
> > timeline and are on the same physical engine but different context this
> > doesn't ensure ordering on the new GuC submission interface. So, a
> > completion fence needs to be used to ensure ordering.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  1 -
> >   drivers/gpu/drm/i915/i915_request.c               | 12 ++++++++++--
> >   2 files changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 9dc1a256e185..4443cc6f5320 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -933,7 +933,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   	 * a request before we set the 'context_pending_disable' flag here.
> >   	 */
> >   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> > -		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> 
> incorrect spinlock drop is still here. Everything else looks ok (my

No it isn't not. See the return directly below the drop of the spin lock.
Matt

> suggestion to use an engine flag stands, but can be addressed as a follow
> up).
> 

Not sure I follow this one, but we can sync and address in a follow if
needed.

Matt

> Daniele
> 
> >   		return;
> >   	}
> >   	guc_id = prep_context_pending_disable(ce);
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index b48c4905d3fc..2b2b63cba06c 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -432,6 +432,7 @@ void i915_request_retire_upto(struct i915_request *rq)
> >   	do {
> >   		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
> > +		GEM_BUG_ON(!i915_request_completed(tmp));
> >   	} while (i915_request_retire(tmp) && tmp != rq);
> >   }
> > @@ -1380,6 +1381,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
> >   	return err;
> >   }
> > +static int
> > +i915_request_await_request(struct i915_request *to, struct i915_request *from);
> > +
> >   int
> >   i915_request_await_execution(struct i915_request *rq,
> >   			     struct dma_fence *fence)
> > @@ -1465,7 +1469,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
> >   			return ret;
> >   	}
> > -	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> > +	if (!intel_engine_uses_guc(to->engine) &&
> > +	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> >   		ret = await_request_submit(to, from);
> >   	else
> >   		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
> > @@ -1626,6 +1631,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
> >   	prev = to_request(__i915_active_fence_set(&timeline->last_request,
> >   						  &rq->fence));
> >   	if (prev && !__i915_request_is_complete(prev)) {
> > +		bool uses_guc = intel_engine_uses_guc(rq->engine);
> > +
> >   		/*
> >   		 * The requests are supposed to be kept in order. However,
> >   		 * we need to be wary in case the timeline->last_request
> > @@ -1636,7 +1643,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
> >   			   i915_seqno_passed(prev->fence.seqno,
> >   					     rq->fence.seqno));
> > -		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
> > +		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
> > +		    (uses_guc && prev->context == rq->context))
> >   			i915_sw_fence_await_sw_fence(&rq->submit,
> >   						     &prev->submit,
> >   						     &rq->submitq);
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 12/51] drm/i915/guc: Ensure request ordering via completion fences
@ 2021-07-20  2:48       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  2:48 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 04:46:57PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > If two requests are on the same ring, they are explicitly ordered by the
> > HW. So, a submission fence is sufficient to ensure ordering when using
> > the new GuC submission interface. Conversely, if two requests share a
> > timeline and are on the same physical engine but different context this
> > doesn't ensure ordering on the new GuC submission interface. So, a
> > completion fence needs to be used to ensure ordering.
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  1 -
> >   drivers/gpu/drm/i915/i915_request.c               | 12 ++++++++++--
> >   2 files changed, 10 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 9dc1a256e185..4443cc6f5320 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -933,7 +933,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   	 * a request before we set the 'context_pending_disable' flag here.
> >   	 */
> >   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> > -		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> 
> incorrect spinlock drop is still here. Everything else looks ok (my

No it isn't not. See the return directly below the drop of the spin lock.
Matt

> suggestion to use an engine flag stands, but can be addressed as a follow
> up).
> 

Not sure I follow this one, but we can sync and address in a follow if
needed.

Matt

> Daniele
> 
> >   		return;
> >   	}
> >   	guc_id = prep_context_pending_disable(ce);
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index b48c4905d3fc..2b2b63cba06c 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -432,6 +432,7 @@ void i915_request_retire_upto(struct i915_request *rq)
> >   	do {
> >   		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
> > +		GEM_BUG_ON(!i915_request_completed(tmp));
> >   	} while (i915_request_retire(tmp) && tmp != rq);
> >   }
> > @@ -1380,6 +1381,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
> >   	return err;
> >   }
> > +static int
> > +i915_request_await_request(struct i915_request *to, struct i915_request *from);
> > +
> >   int
> >   i915_request_await_execution(struct i915_request *rq,
> >   			     struct dma_fence *fence)
> > @@ -1465,7 +1469,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
> >   			return ret;
> >   	}
> > -	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> > +	if (!intel_engine_uses_guc(to->engine) &&
> > +	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> >   		ret = await_request_submit(to, from);
> >   	else
> >   		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
> > @@ -1626,6 +1631,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
> >   	prev = to_request(__i915_active_fence_set(&timeline->last_request,
> >   						  &rq->fence));
> >   	if (prev && !__i915_request_is_complete(prev)) {
> > +		bool uses_guc = intel_engine_uses_guc(rq->engine);
> > +
> >   		/*
> >   		 * The requests are supposed to be kept in order. However,
> >   		 * we need to be wary in case the timeline->last_request
> > @@ -1636,7 +1643,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
> >   			   i915_seqno_passed(prev->fence.seqno,
> >   					     rq->fence.seqno));
> > -		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
> > +		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
> > +		    (uses_guc && prev->context == rq->context))
> >   			i915_sw_fence_await_sw_fence(&rq->submit,
> >   						     &prev->submit,
> >   						     &rq->submitq);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 12/51] drm/i915/guc: Ensure request ordering via completion fences
  2021-07-20  2:48       ` [Intel-gfx] " Matthew Brost
@ 2021-07-20  2:50         ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  2:50 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, dri-devel, john.c.harrison

On Mon, Jul 19, 2021 at 07:48:17PM -0700, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 04:46:57PM -0700, Daniele Ceraolo Spurio wrote:
> > 
> > 
> > On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > > If two requests are on the same ring, they are explicitly ordered by the
> > > HW. So, a submission fence is sufficient to ensure ordering when using
> > > the new GuC submission interface. Conversely, if two requests share a
> > > timeline and are on the same physical engine but different context this
> > > doesn't ensure ordering on the new GuC submission interface. So, a
> > > completion fence needs to be used to ensure ordering.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  1 -
> > >   drivers/gpu/drm/i915/i915_request.c               | 12 ++++++++++--
> > >   2 files changed, 10 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 9dc1a256e185..4443cc6f5320 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -933,7 +933,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
> > >   	 * a request before we set the 'context_pending_disable' flag here.
> > >   	 */
> > >   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> > > -		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > 
> > incorrect spinlock drop is still here. Everything else looks ok (my
> 
> No it isn't not. See the return directly below the drop of the spin lock.
> Matt
> 

Ugh, misread this. Yes, this deletion shouldn't be here. Will fix.

Matt

> > suggestion to use an engine flag stands, but can be addressed as a follow
> > up).
> > 
> 
> Not sure I follow this one, but we can sync and address in a follow if
> needed.
> 
> Matt
> 
> > Daniele
> > 
> > >   		return;
> > >   	}
> > >   	guc_id = prep_context_pending_disable(ce);
> > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > > index b48c4905d3fc..2b2b63cba06c 100644
> > > --- a/drivers/gpu/drm/i915/i915_request.c
> > > +++ b/drivers/gpu/drm/i915/i915_request.c
> > > @@ -432,6 +432,7 @@ void i915_request_retire_upto(struct i915_request *rq)
> > >   	do {
> > >   		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
> > > +		GEM_BUG_ON(!i915_request_completed(tmp));
> > >   	} while (i915_request_retire(tmp) && tmp != rq);
> > >   }
> > > @@ -1380,6 +1381,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
> > >   	return err;
> > >   }
> > > +static int
> > > +i915_request_await_request(struct i915_request *to, struct i915_request *from);
> > > +
> > >   int
> > >   i915_request_await_execution(struct i915_request *rq,
> > >   			     struct dma_fence *fence)
> > > @@ -1465,7 +1469,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
> > >   			return ret;
> > >   	}
> > > -	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> > > +	if (!intel_engine_uses_guc(to->engine) &&
> > > +	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> > >   		ret = await_request_submit(to, from);
> > >   	else
> > >   		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
> > > @@ -1626,6 +1631,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
> > >   	prev = to_request(__i915_active_fence_set(&timeline->last_request,
> > >   						  &rq->fence));
> > >   	if (prev && !__i915_request_is_complete(prev)) {
> > > +		bool uses_guc = intel_engine_uses_guc(rq->engine);
> > > +
> > >   		/*
> > >   		 * The requests are supposed to be kept in order. However,
> > >   		 * we need to be wary in case the timeline->last_request
> > > @@ -1636,7 +1643,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
> > >   			   i915_seqno_passed(prev->fence.seqno,
> > >   					     rq->fence.seqno));
> > > -		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
> > > +		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
> > > +		    (uses_guc && prev->context == rq->context))
> > >   			i915_sw_fence_await_sw_fence(&rq->submit,
> > >   						     &prev->submit,
> > >   						     &rq->submitq);
> > 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 12/51] drm/i915/guc: Ensure request ordering via completion fences
@ 2021-07-20  2:50         ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  2:50 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 07:48:17PM -0700, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 04:46:57PM -0700, Daniele Ceraolo Spurio wrote:
> > 
> > 
> > On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > > If two requests are on the same ring, they are explicitly ordered by the
> > > HW. So, a submission fence is sufficient to ensure ordering when using
> > > the new GuC submission interface. Conversely, if two requests share a
> > > timeline and are on the same physical engine but different context this
> > > doesn't ensure ordering on the new GuC submission interface. So, a
> > > completion fence needs to be used to ensure ordering.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c |  1 -
> > >   drivers/gpu/drm/i915/i915_request.c               | 12 ++++++++++--
> > >   2 files changed, 10 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 9dc1a256e185..4443cc6f5320 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -933,7 +933,6 @@ static void guc_context_sched_disable(struct intel_context *ce)
> > >   	 * a request before we set the 'context_pending_disable' flag here.
> > >   	 */
> > >   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> > > -		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > 
> > incorrect spinlock drop is still here. Everything else looks ok (my
> 
> No it isn't not. See the return directly below the drop of the spin lock.
> Matt
> 

Ugh, misread this. Yes, this deletion shouldn't be here. Will fix.

Matt

> > suggestion to use an engine flag stands, but can be addressed as a follow
> > up).
> > 
> 
> Not sure I follow this one, but we can sync and address in a follow if
> needed.
> 
> Matt
> 
> > Daniele
> > 
> > >   		return;
> > >   	}
> > >   	guc_id = prep_context_pending_disable(ce);
> > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > > index b48c4905d3fc..2b2b63cba06c 100644
> > > --- a/drivers/gpu/drm/i915/i915_request.c
> > > +++ b/drivers/gpu/drm/i915/i915_request.c
> > > @@ -432,6 +432,7 @@ void i915_request_retire_upto(struct i915_request *rq)
> > >   	do {
> > >   		tmp = list_first_entry(&tl->requests, typeof(*tmp), link);
> > > +		GEM_BUG_ON(!i915_request_completed(tmp));
> > >   	} while (i915_request_retire(tmp) && tmp != rq);
> > >   }
> > > @@ -1380,6 +1381,9 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
> > >   	return err;
> > >   }
> > > +static int
> > > +i915_request_await_request(struct i915_request *to, struct i915_request *from);
> > > +
> > >   int
> > >   i915_request_await_execution(struct i915_request *rq,
> > >   			     struct dma_fence *fence)
> > > @@ -1465,7 +1469,8 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
> > >   			return ret;
> > >   	}
> > > -	if (is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> > > +	if (!intel_engine_uses_guc(to->engine) &&
> > > +	    is_power_of_2(to->execution_mask | READ_ONCE(from->execution_mask)))
> > >   		ret = await_request_submit(to, from);
> > >   	else
> > >   		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
> > > @@ -1626,6 +1631,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
> > >   	prev = to_request(__i915_active_fence_set(&timeline->last_request,
> > >   						  &rq->fence));
> > >   	if (prev && !__i915_request_is_complete(prev)) {
> > > +		bool uses_guc = intel_engine_uses_guc(rq->engine);
> > > +
> > >   		/*
> > >   		 * The requests are supposed to be kept in order. However,
> > >   		 * we need to be wary in case the timeline->last_request
> > > @@ -1636,7 +1643,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
> > >   			   i915_seqno_passed(prev->fence.seqno,
> > >   					     rq->fence.seqno));
> > > -		if (is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask))
> > > +		if ((!uses_guc && is_power_of_2(READ_ONCE(prev->engine)->mask | rq->engine->mask)) ||
> > > +		    (uses_guc && prev->context == rq->context))
> > >   			i915_sw_fence_await_sw_fence(&rq->submit,
> > >   						     &prev->submit,
> > >   						     &rq->submitq);
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
  2021-07-20  0:51     ` [Intel-gfx] " Daniele Ceraolo Spurio
@ 2021-07-20  4:04       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  4:04 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, john.c.harrison, dri-devel

On Mon, Jul 19, 2021 at 05:51:46PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > Implement GuC context operations which includes GuC specific operations
> > alloc, pin, unpin, and destroy.
> > 
> > v2:
> >   (Daniel Vetter)
> >    - Use msleep_interruptible rather than cond_resched in busy loop
> >   (Michal)
> >    - Remove C++ style comment
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
> >   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
> >   drivers/gpu/drm/i915/i915_reg.h               |   1 +
> >   drivers/gpu/drm/i915/i915_request.c           |   1 +
> >   8 files changed, 685 insertions(+), 55 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index bd63813c8a80..32fd6647154b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >   	mutex_init(&ce->pin_mutex);
> > +	spin_lock_init(&ce->guc_state.lock);
> > +
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +	INIT_LIST_HEAD(&ce->guc_id_link);
> > +
> >   	i915_active_init(&ce->active,
> >   			 __intel_context_active, __intel_context_retire, 0);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 6d99631d19b9..606c480aec26 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -96,6 +96,7 @@ struct intel_context {
> >   #define CONTEXT_BANNED			6
> >   #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
> >   #define CONTEXT_NOPREEMPT		8
> > +#define CONTEXT_LRCA_DIRTY		9
> >   	struct {
> >   		u64 timeout_us;
> > @@ -138,14 +139,29 @@ struct intel_context {
> >   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> > +	struct {
> > +		/** lock: protects everything in guc_state */
> > +		spinlock_t lock;
> > +		/**
> > +		 * sched_state: scheduling state of this context using GuC
> > +		 * submission
> > +		 */
> > +		u8 sched_state;
> > +	} guc_state;
> > +
> >   	/* GuC scheduling state flags that do not require a lock. */
> >   	atomic_t guc_sched_state_no_lock;
> > +	/* GuC LRC descriptor ID */
> > +	u16 guc_id;
> > +
> > +	/* GuC LRC descriptor reference count */
> > +	atomic_t guc_id_ref;
> > +
> >   	/*
> > -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> > -	 * in the series will.
> > +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
> >   	 */
> > -	u16 guc_id;
> > +	struct list_head guc_id_link;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > index 41e5350a7a05..49d4857ad9b7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > @@ -87,7 +87,6 @@
> >   #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
> >   #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> > -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
> >   #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
> >   /* in Gen12 ID 0x7FF is reserved to indicate idle */
> >   #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 8c7b92f699f1..30773cd699f5 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -7,6 +7,7 @@
> >   #define _INTEL_GUC_H_
> >   #include <linux/xarray.h>
> > +#include <linux/delay.h>
> >   #include "intel_uncore.h"
> >   #include "intel_guc_fw.h"
> > @@ -44,6 +45,14 @@ struct intel_guc {
> >   		void (*disable)(struct intel_guc *guc);
> >   	} interrupts;
> > +	/*
> > +	 * contexts_lock protects the pool of free guc ids and a linked list of
> > +	 * guc ids available to be stolen
> > +	 */
> > +	spinlock_t contexts_lock;
> > +	struct ida guc_ids;
> > +	struct list_head guc_id_list;
> > +
> >   	bool submission_selected;
> >   	struct i915_vma *ads_vma;
> > @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >   				 response_buf, response_buf_size, 0);
> >   }
> > +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> > +					   const u32 *action,
> > +					   u32 len,
> > +					   bool loop)
> > +{
> > +	int err;
> > +	unsigned int sleep_period_ms = 1;
> > +	bool not_atomic = !in_atomic() && !irqs_disabled();
> > +
> > +	/* No sleeping with spin locks, just busy loop */
> > +	might_sleep_if(loop && not_atomic);
> > +
> > +retry:
> > +	err = intel_guc_send_nb(guc, action, len);
> > +	if (unlikely(err == -EBUSY && loop)) {
> > +		if (likely(not_atomic)) {
> > +			if (msleep_interruptible(sleep_period_ms))
> > +				return -EINTR;
> > +			sleep_period_ms = sleep_period_ms << 1;
> > +		} else {
> > +			cpu_relax();
> 
> Potentially something we can change later, but if we're in atomic context we
> can't keep looping without a timeout while we get -EBUSY, we need to bail
> out early.
>

Eventually intel_guc_send_nb will pop with a different return code after
1 second I think if the GuC is hung.
 
> > +		}
> > +		goto retry;
> > +	}
> > +
> > +	return err;
> > +}
> > +
> >   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >   {
> >   	intel_guc_ct_event_handler(&guc->ct);
> > @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   int intel_guc_reset_engine(struct intel_guc *guc,
> >   			   struct intel_engine_cs *engine);
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg, u32 len);
> > +
> >   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 83ec60ea3f89..28ff82c5be45 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
> >   	case INTEL_GUC_ACTION_DEFAULT:
> >   		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
> >   		break;
> > +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> > +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> > +							    len);
> > +		break;
> >   	default:
> >   		ret = -EOPNOTSUPP;
> >   		break;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 53b4a5eb4a85..a47b3813b4d0 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -13,7 +13,9 @@
> >   #include "gt/intel_gt.h"
> >   #include "gt/intel_gt_irq.h"
> >   #include "gt/intel_gt_pm.h"
> > +#include "gt/intel_gt_requests.h"
> >   #include "gt/intel_lrc.h"
> > +#include "gt/intel_lrc_reg.h"
> >   #include "gt/intel_mocs.h"
> >   #include "gt/intel_ring.h"
> > @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
> >   		   &ce->guc_sched_state_no_lock);
> >   }
> > +/*
> > + * Below is a set of functions which control the GuC scheduling state which
> > + * require a lock, aside from the special case where the functions are called
> > + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> > + * path to be executing on the context.
> > + */
> 
> Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even if it
> isn't strictly needed? I'd prefer to avoid the asymmetry in the locking

If this code is called from releasing a fence a circular dependency is
created. Once we move the DRM scheduler the locking becomes way easier
and we clean this up then. We already have task for locking clean up.

> scheme if possible, as that might case trouble if thing change in the
> future.

Again this is temporary code, so I think we can live with this until the
DRM scheduler lands.

> 
> > +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> > +#define SCHED_STATE_DESTROYED				BIT(1)
> > +static inline void init_sched_state(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> > +	ce->guc_state.sched_state = 0;
> > +}
> > +
> > +static inline bool
> > +context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state &
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> 
> No need for (). Below as well.
> 

Yep.

> > +}
> > +
> > +static inline void
> > +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	ce->guc_state.sched_state |=
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> > +}
> > +
> > +static inline void
> > +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state =
> > +		(ce->guc_state.sched_state &
> > +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> 
> nit: can also use
> 
> ce->guc_state.sched_state &=
> 	~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER
>

Yep.
 
> 
> > +}
> > +
> > +static inline bool
> > +context_destroyed(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> > +}
> > +
> > +static inline void
> > +set_context_destroyed(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> > +}
> > +
> > +static inline bool context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> > +}
> > +
> > +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +}
> > +
> > +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> > +{
> > +	return &ce->engine->gt->uc.guc;
> > +}
> > +
> >   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >   {
> >   	return rb_entry(rb, struct i915_priolist, node);
> > @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	int len = 0;
> >   	bool enabled = context_enabled(ce);
> > +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> > +	GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> >   	if (!enabled) {
> >   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >   		action[len++] = ce->guc_id;
> > @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >   	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > +	spin_lock_init(&guc->contexts_lock);
> > +	INIT_LIST_HEAD(&guc->guc_id_list);
> > +	ida_init(&guc->guc_ids);
> > +
> >   	return 0;
> >   }
> > @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
> >   	i915_sched_engine_put(guc->sched_engine);
> >   }
> > -static int guc_context_alloc(struct intel_context *ce)
> > +static inline void queue_request(struct i915_sched_engine *sched_engine,
> > +				 struct i915_request *rq,
> > +				 int prio)
> >   {
> > -	return lrc_alloc(ce, ce->engine);
> > +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > +	list_add_tail(&rq->sched.link,
> > +		      i915_sched_lookup_priolist(sched_engine, prio));
> > +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +}
> > +
> > +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > +				     struct i915_request *rq)
> > +{
> > +	int ret;
> > +
> > +	__i915_request_submit(rq);
> > +
> > +	trace_i915_request_in(rq, 0);
> > +
> > +	guc_set_lrc_tail(rq);
> > +	ret = guc_add_request(guc, rq);
> > +	if (ret == -EBUSY)
> > +		guc->stalled_request = rq;
> > +
> > +	return ret;
> > +}
> > +
> > +static void guc_submit_request(struct i915_request *rq)
> > +{
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > +	unsigned long flags;
> > +
> > +	/* Will be called from irq-context when using foreign fences. */
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +
> > +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > +		queue_request(sched_engine, rq, rq_prio(rq));
> > +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > +		tasklet_hi_schedule(&sched_engine->tasklet);
> > +
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
> > +static int new_guc_id(struct intel_guc *guc)
> > +{
> > +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> > +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> > +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > +}
> > +
> > +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	if (!context_guc_id_invalid(ce)) {
> > +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> > +		reset_lrc_desc(guc, ce->guc_id);
> > +		set_context_guc_id_invalid(ce);
> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +}
> > +
> > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	__release_guc_id(guc, ce);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int steal_guc_id(struct intel_guc *guc)
> > +{
> > +	struct intel_context *ce;
> > +	int guc_id;
> > +
> > +	if (!list_empty(&guc->guc_id_list)) {
> > +		ce = list_first_entry(&guc->guc_id_list,
> > +				      struct intel_context,
> > +				      guc_id_link);
> > +
> > +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +		GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> > +		list_del_init(&ce->guc_id_link);
> > +		guc_id = ce->guc_id;
> > +		set_context_guc_id_invalid(ce);
> > +		return guc_id;
> > +	} else {
> > +		return -EAGAIN;
> > +	}
> > +}
> > +
> > +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> > +{
> > +	int ret;
> > +
> > +	ret = new_guc_id(guc);
> > +	if (unlikely(ret < 0)) {
> > +		ret = steal_guc_id(guc);
> > +		if (ret < 0)
> > +			return ret;
> > +	}
> > +
> > +	*out = ret;
> > +	return 0;
> 
> Is it worth adding spinlock_held asserts for guc->contexts_lock in these ID
> functions? Doubles up as a documentation of what locking we expect.
> 

Never a bad idea to have asserts in the code. Will add.

> > +}
> > +
> > +#define PIN_GUC_ID_TRIES	4
> > +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +
> > +try_again:
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +
> > +	if (context_guc_id_invalid(ce)) {
> > +		ret = assign_guc_id(guc, &ce->guc_id);
> > +		if (ret)
> > +			goto out_unlock;
> > +		ret = 1;	/* Indidcates newly assigned guc_id */
> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	atomic_inc(&ce->guc_id_ref);
> > +
> > +out_unlock:
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> > +	 * outstanding requests to see if that frees up a guc_id. If the first
> > +	 * retire didn't help, insert a sleep with the timeslice duration before
> > +	 * attempting to retire more requests. Double the sleep period each
> > +	 * subsequent pass before finally giving up. The sleep period has max of
> > +	 * 100ms and minimum of 1ms.
> > +	 */
> > +	if (ret == -EAGAIN && --tries) {
> > +		if (PIN_GUC_ID_TRIES - tries > 1) {
> > +			unsigned int timeslice_shifted =
> > +				ce->engine->props.timeslice_duration_ms <<
> > +				(PIN_GUC_ID_TRIES - tries - 2);
> > +			unsigned int max = min_t(unsigned int, 100,
> > +						 timeslice_shifted);
> > +
> > +			msleep(max_t(unsigned int, max, 1));
> > +		}
> > +		intel_gt_retire_requests(guc_to_gt(guc));
> > +		goto try_again;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> > +	    !atomic_read(&ce->guc_id_ref))
> > +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int __guc_action_register_context(struct intel_guc *guc,
> > +					 u32 guc_id,
> > +					 u32 offset)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > +		guc_id,
> > +		offset,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int register_context(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > +		ce->guc_id * sizeof(struct guc_lrc_desc);
> > +
> > +	return __guc_action_register_context(guc, ce->guc_id, offset);
> > +}
> > +
> > +static int __guc_action_deregister_context(struct intel_guc *guc,
> > +					   u32 guc_id)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > +		guc_id,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int deregister_context(struct intel_context *ce, u32 guc_id)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	return __guc_action_deregister_context(guc, guc_id);
> > +}
> > +
> > +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > +{
> > +	switch (class) {
> > +	case RENDER_CLASS:
> > +		return mask >> RCS0;
> > +	case VIDEO_ENHANCEMENT_CLASS:
> > +		return mask >> VECS0;
> > +	case VIDEO_DECODE_CLASS:
> > +		return mask >> VCS0;
> > +	case COPY_ENGINE_CLASS:
> > +		return mask >> BCS0;
> > +	default:
> > +		GEM_BUG_ON("Invalid Class");
> 
> we usually use MISSING_CASE for this type of errors.

As soon as start talking in logical space (series immediatly after this
gets merged) this function gets deleted but will fix.

> 
> > +		return 0;
> > +	}
> > +}
> > +
> > +static void guc_context_policy_init(struct intel_engine_cs *engine,
> > +				    struct guc_lrc_desc *desc)
> > +{
> > +	desc->policy_flags = 0;
> > +
> > +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> > +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> > +}
> > +
> > +static int guc_lrc_desc_pin(struct intel_context *ce)
> > +{
> > +	struct intel_runtime_pm *runtime_pm =
> > +		&ce->engine->gt->i915->runtime_pm;
> 
> If you move this after the engine var below you can skip the ce->engine
> jump. Also, you can shorten the pointer chasing by using
> engine->uncore->rpm.
> 

Sure.

> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	u32 desc_idx = ce->guc_id;
> > +	struct guc_lrc_desc *desc;
> > +	bool context_registered;
> > +	intel_wakeref_t wakeref;
> > +	int ret = 0;
> > +
> > +	GEM_BUG_ON(!engine->mask);
> > +
> > +	/*
> > +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> > +	 * based on CT vma region.
> > +	 */
> > +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> > +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> > +
> > +	context_registered = lrc_desc_registered(guc, desc_idx);
> > +
> > +	reset_lrc_desc(guc, desc_idx);
> > +	set_lrc_desc_registered(guc, desc_idx, ce);
> > +
> > +	desc = __get_lrc_desc(guc, desc_idx);
> > +	desc->engine_class = engine_class_to_guc_class(engine->class);
> > +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> > +						      engine->mask);
> > +	desc->hw_context_desc = ce->lrc.lrca;
> > +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > +	guc_context_policy_init(engine, desc);
> > +	init_sched_state(ce);
> > +
> > +	/*
> > +	 * The context_lookup xarray is used to determine if the hardware
> > +	 * context is currently registered. There are two cases in which it
> > +	 * could be regisgered either the guc_id has been stole from from
> 
> typo regisgered
>

John got this one. Fixed.
 
> > +	 * another context or the lrc descriptor address of this context has
> > +	 * changed. In either case the context needs to be deregistered with the
> > +	 * GuC before registering this context.
> > +	 */
> > +	if (context_registered) {
> > +		set_context_wait_for_deregister_to_register(ce);
> > +		intel_context_get(ce);
> > +
> > +		/*
> > +		 * If stealing the guc_id, this ce has the same guc_id as the
> > +		 * context whos guc_id was stole.
> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = deregister_context(ce, ce->guc_id);
> > +	} else {
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = register_context(ce);
> > +	}
> > +
> > +	return ret;
> >   }
> >   static int guc_context_pre_pin(struct intel_context *ce,
> > @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
> >   static int guc_context_pin(struct intel_context *ce, void *vaddr)
> >   {
> > +	if (i915_ggtt_offset(ce->state) !=
> > +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> > +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> 
> Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being
> re-assigned in there; it looks like it can only change if the context is
> new, but for future-proofing IMO better to re-order here.
> 

No this is right. ce->state is assigned in lr_pre_pin and ce->lrc.lrca is
assigned in lrc_pin. To catch the change you have to check between
those two functions.

> > +
> >   	return lrc_pin(ce, ce->engine, vaddr);
> 
> Could use a comment to say that the GuC context pinning call is delayed
> until request alloc and to look at the comment in the latter function for
> details (or move the explanation here and refer here from request alloc).
>

Yes.
 
> >   }
> > +static void guc_context_unpin(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	unpin_guc_id(guc, ce);
> > +	lrc_unpin(ce);
> > +}
> > +
> > +static void guc_context_post_unpin(struct intel_context *ce)
> > +{
> > +	lrc_post_unpin(ce);
> 
> why do we need this function? we can just pass lrc_post_unpin to the ops
> (like we already do).
> 

We don't before style reasons I think having a GuC specific wrapper is
correct.

> > +}
> > +
> > +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> > +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> > +
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	set_context_destroyed(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +	deregister_context(ce, ce->guc_id);
> > +}
> > +
> > +static void guc_context_destroy(struct kref *kref)
> > +{
> > +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> 
> same as above, going through engine->uncore->rpm is shorter
>

Sure.
 
> > +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> > +	intel_wakeref_t wakeref;
> > +	unsigned long flags;
> > +
> > +	/*
> > +	 * If the guc_id is invalid this context has been stolen and we can free
> > +	 * it immediately. Also can be freed immediately if the context is not
> > +	 * registered with the GuC.
> > +	 */
> > +	if (context_guc_id_invalid(ce) ||
> > +	    !lrc_desc_registered(guc, ce->guc_id)) {
> > +		release_guc_id(guc, ce);
> 
> it feels a bit weird that we call release_guc_id in the case where the id is
> invalid. The code handles it fine, but still the flow doesn't feel clean.
> Not a blocker.
> 

I understand. Let's see if this can get cleaned up at some point.

> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * We have to acquire the context spinlock and check guc_id again, if it
> > +	 * is valid it hasn't been stolen and needs to be deregistered. We
> > +	 * delete this context from the list of unpinned guc_ids available to
> > +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> > +	 * returns indicating this context has been deregistered the guc_id is
> > +	 * returned to the pool of available guc_ids.
> > +	 */
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (context_guc_id_invalid(ce)) {
> > +		__release_guc_id(guc, ce);
> 
> But here the call to __release_guc_id is unneded, right? the ce doesn't own
> the ID anymore and both the steal and the release function already clean
> ce->guc_id_link, so there should be nothing left to clean.
>

This is dead code. Will delete.

> > +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * We defer GuC context deregistration until the context is destroyed
> > +	 * in order to save on CTBs. With this optimization ideally we only need
> > +	 * 1 CTB to register the context during the first pin and 1 CTB to
> > +	 * deregister the context when the context is destroyed. Without this
> > +	 * optimization, a CTB would be needed every pin & unpin.
> > +	 *
> > +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
> > +	 * from context_free_worker when not runtime wakeref is held.
> > +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> > +	 * in H2G CTB to deregister the context. A future patch may defer this
> > +	 * H2G CTB if the runtime wakeref is zero.
> > +	 */
> > +	with_intel_runtime_pm(runtime_pm, wakeref)
> > +		guc_lrc_desc_unpin(ce);
> > +}
> > +
> > +static int guc_context_alloc(struct intel_context *ce)
> > +{
> > +	return lrc_alloc(ce, ce->engine);
> > +}
> > +
> >   static const struct intel_context_ops guc_context_ops = {
> >   	.alloc = guc_context_alloc,
> >   	.pre_pin = guc_context_pre_pin,
> >   	.pin = guc_context_pin,
> > -	.unpin = lrc_unpin,
> > -	.post_unpin = lrc_post_unpin,
> > +	.unpin = guc_context_unpin,
> > +	.post_unpin = guc_context_post_unpin,
> >   	.enter = intel_context_enter_engine,
> >   	.exit = intel_context_exit_engine,
> >   	.reset = lrc_reset,
> > -	.destroy = lrc_destroy,
> > +	.destroy = guc_context_destroy,
> >   };
> > -static int guc_request_alloc(struct i915_request *request)
> > +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> >   {
> > +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > +}
> > +
> > +static int guc_request_alloc(struct i915_request *rq)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +	struct intel_guc *guc = ce_to_guc(ce);
> >   	int ret;
> > -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> > +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
> >   	/*
> >   	 * Flush enough space to reduce the likelihood of waiting after
> >   	 * we start building the request - in which case we will just
> >   	 * have to repeat work.
> >   	 */
> > -	request->reserved_space += GUC_REQUEST_SIZE;
> > +	rq->reserved_space += GUC_REQUEST_SIZE;
> >   	/*
> >   	 * Note that after this point, we have committed to using
> > @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
> >   	 */
> >   	/* Unconditionally invalidate GPU caches and TLBs. */
> > -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> > +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> >   	if (ret)
> >   		return ret;
> > -	request->reserved_space -= GUC_REQUEST_SIZE;
> > -	return 0;
> > -}
> > +	rq->reserved_space -= GUC_REQUEST_SIZE;
> > -static inline void queue_request(struct i915_sched_engine *sched_engine,
> > -				 struct i915_request *rq,
> > -				 int prio)
> > -{
> > -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > -	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(sched_engine, prio));
> > -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > -}
> > -
> > -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > -				     struct i915_request *rq)
> > -{
> > -	int ret;
> > -
> > -	__i915_request_submit(rq);
> > -
> > -	trace_i915_request_in(rq, 0);
> > -
> > -	guc_set_lrc_tail(rq);
> > -	ret = guc_add_request(guc, rq);
> > -	if (ret == -EBUSY)
> > -		guc->stalled_request = rq;
> > -
> > -	return ret;
> > -}
> > -
> > -static void guc_submit_request(struct i915_request *rq)
> > -{
> > -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > -	unsigned long flags;
> > +	/*
> > +	 * Call pin_guc_id here rather than in the pinning step as with
> > +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> > +	 * guc_ids and creating horrible race conditions. This is especially bad
> > +	 * when guc_ids are being stolen due to over subscription. By the time
> > +	 * this function is reached, it is guaranteed that the guc_id will be
> > +	 * persistent until the generated request is retired. Thus, sealing these
> > +	 * race conditions. It is still safe to fail here if guc_ids are
> > +	 * exhausted and return -EAGAIN to the user indicating that they can try
> > +	 * again in the future.
> > +	 *
> > +	 * There is no need for a lock here as the timeline mutex ensures at
> > +	 * most one context can be executing this code path at once. The
> > +	 * guc_id_ref is incremented once for every request in flight and
> > +	 * decremented on each retire. When it is zero, a lock around the
> > +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> > +	 */
> > +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> > +		return 0;
> > -	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> > +	if (unlikely(ret < 0))
> > +		return ret;;
> 
> typo ";;"
> 

Yep.

> > +	if (context_needs_register(ce, !!ret)) {
> > +		ret = guc_lrc_desc_pin(ce);
> > +		if (unlikely(ret)) {	/* unwind */
> > +			atomic_dec(&ce->guc_id_ref);
> > +			unpin_guc_id(guc, ce);
> > +			return ret;
> > +		}
> > +	}
> > -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > -		queue_request(sched_engine, rq, rq_prio(rq));
> > -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > -		tasklet_hi_schedule(&sched_engine->tasklet);
> > +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> 
> 
> Might be worth moving the pinning to a separate function to keep the
> request_alloc focused on the request. Can be done as a follow up.
>

Sure. Let me see how that looks in a follow up.
 
> > -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +	return 0;
> >   }
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
> >   	engine->submit_request = guc_submit_request;
> >   }
> > +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> > +					  struct intel_context *ce)
> > +{
> > +	if (context_guc_id_invalid(ce))
> > +		pin_guc_id(guc, ce);
> > +	guc_lrc_desc_pin(ce);
> > +}
> > +
> > +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > +{
> > +	struct intel_gt *gt = guc_to_gt(guc);
> > +	struct intel_engine_cs *engine;
> > +	enum intel_engine_id id;
> > +
> > +	/* make sure all descriptors are clean... */
> > +	xa_destroy(&guc->context_lookup);
> > +
> > +	/*
> > +	 * Some contexts might have been pinned before we enabled GuC
> > +	 * submission, so we need to add them to the GuC bookeeping.
> > +	 * Also, after a reset the GuC we want to make sure that the information
> > +	 * shared with GuC is properly reset. The kernel lrcs are not attached
> > +	 * to the gem_context, so they need to be added separately.
> > +	 *
> > +	 * Note: we purposely do not check the error return of
> > +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> > +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> > +	 * enough here (we're either only pinning a handful of lrc on first boot
> > +	 * or we're re-pinning lrcs that were already pinned before the reset).
> > +	 * Two, if the GuC has died and CTBs can't make forward progress.
> > +	 * Presumably, the GuC should be alive as this function is called on
> > +	 * driver load or after a reset. Even if it is dead, another full GPU
> > +	 * reset will be triggered and this function would be called again.
> > +	 */
> > +
> > +	for_each_engine(engine, gt, id)
> > +		if (engine->kernel_context)
> > +			guc_kernel_context_pin(guc, engine->kernel_context);
> > +}
> > +
> >   static void guc_release(struct intel_engine_cs *engine)
> >   {
> >   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> > @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   void intel_guc_submission_enable(struct intel_guc *guc)
> >   {
> > +	guc_init_lrc_mapping(guc);
> >   }
> >   void intel_guc_submission_disable(struct intel_guc *guc)
> > @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
> >   {
> >   	guc->submission_selected = __guc_submission_selected(guc);
> >   }
> > +
> > +static inline struct intel_context *
> > +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> > +{
> > +	struct intel_context *ce;
> > +
> > +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Invalid desc_idx %u", desc_idx);
> > +		return NULL;
> > +	}
> > +
> > +	ce = __get_context(guc, desc_idx);
> > +	if (unlikely(!ce)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Context is NULL, desc_idx %u", desc_idx);
> > +		return NULL;
> > +	}
> > +
> > +	return ce;
> > +}
> > +
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg,
> > +					  u32 len)
> > +{
> > +	struct intel_context *ce;
> > +	u32 desc_idx = msg[0];
> > +
> > +	if (unlikely(len < 1)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> > +		return -EPROTO;
> > +	}
> > +
> > +	ce = g2h_context_lookup(guc, desc_idx);
> > +	if (unlikely(!ce))
> > +		return -EPROTO;
> > +
> > +	if (context_wait_for_deregister_to_register(ce)) {
> > +		struct intel_runtime_pm *runtime_pm =
> > +			&ce->engine->gt->i915->runtime_pm;
> > +		intel_wakeref_t wakeref;
> > +
> > +		/*
> > +		 * Previous owner of this guc_id has been deregistered, now safe
> > +		 * register this context.
> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			register_context(ce);
> > +		clr_context_wait_for_deregister_to_register(ce);
> > +		intel_context_put(ce);
> > +	} else if (context_destroyed(ce)) {
> > +		/* Context has been destroyed */
> > +		release_guc_id(guc, ce);
> > +		lrc_destroy(&ce->ref);
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 943fe485c662..204c95c39353 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -4142,6 +4142,7 @@ enum {
> >   	FAULT_AND_CONTINUE /* Unsupported */
> >   };
> > +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
> >   #define GEN8_CTX_VALID (1 << 0)
> >   #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
> >   #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 86b4c9f2613d..b48c4905d3fc 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
> >   	 */
> >   	if (!list_empty(&rq->sched.link))
> >   		remove_from_engine(rq);
> > +	atomic_dec(&rq->context->guc_id_ref);
> 
> Does this work/make sense  if GuC is disabled?
> 

It doesn't matter if the GuC is disabled as this var isn't used if it
is. A following patch adds a vfunc here and pushes this in the GuC
backend.

Matt

> Daniele
> 
> >   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> >   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
@ 2021-07-20  4:04       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20  4:04 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 05:51:46PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:16 PM, Matthew Brost wrote:
> > Implement GuC context operations which includes GuC specific operations
> > alloc, pin, unpin, and destroy.
> > 
> > v2:
> >   (Daniel Vetter)
> >    - Use msleep_interruptible rather than cond_resched in busy loop
> >   (Michal)
> >    - Remove C++ style comment
> > 
> > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
> >   drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
> >   drivers/gpu/drm/i915/i915_reg.h               |   1 +
> >   drivers/gpu/drm/i915/i915_request.c           |   1 +
> >   8 files changed, 685 insertions(+), 55 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index bd63813c8a80..32fd6647154b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >   	mutex_init(&ce->pin_mutex);
> > +	spin_lock_init(&ce->guc_state.lock);
> > +
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +	INIT_LIST_HEAD(&ce->guc_id_link);
> > +
> >   	i915_active_init(&ce->active,
> >   			 __intel_context_active, __intel_context_retire, 0);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 6d99631d19b9..606c480aec26 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -96,6 +96,7 @@ struct intel_context {
> >   #define CONTEXT_BANNED			6
> >   #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
> >   #define CONTEXT_NOPREEMPT		8
> > +#define CONTEXT_LRCA_DIRTY		9
> >   	struct {
> >   		u64 timeout_us;
> > @@ -138,14 +139,29 @@ struct intel_context {
> >   	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
> > +	struct {
> > +		/** lock: protects everything in guc_state */
> > +		spinlock_t lock;
> > +		/**
> > +		 * sched_state: scheduling state of this context using GuC
> > +		 * submission
> > +		 */
> > +		u8 sched_state;
> > +	} guc_state;
> > +
> >   	/* GuC scheduling state flags that do not require a lock. */
> >   	atomic_t guc_sched_state_no_lock;
> > +	/* GuC LRC descriptor ID */
> > +	u16 guc_id;
> > +
> > +	/* GuC LRC descriptor reference count */
> > +	atomic_t guc_id_ref;
> > +
> >   	/*
> > -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
> > -	 * in the series will.
> > +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
> >   	 */
> > -	u16 guc_id;
> > +	struct list_head guc_id_link;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > index 41e5350a7a05..49d4857ad9b7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> > @@ -87,7 +87,6 @@
> >   #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
> >   #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
> > -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
> >   #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
> >   /* in Gen12 ID 0x7FF is reserved to indicate idle */
> >   #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 8c7b92f699f1..30773cd699f5 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -7,6 +7,7 @@
> >   #define _INTEL_GUC_H_
> >   #include <linux/xarray.h>
> > +#include <linux/delay.h>
> >   #include "intel_uncore.h"
> >   #include "intel_guc_fw.h"
> > @@ -44,6 +45,14 @@ struct intel_guc {
> >   		void (*disable)(struct intel_guc *guc);
> >   	} interrupts;
> > +	/*
> > +	 * contexts_lock protects the pool of free guc ids and a linked list of
> > +	 * guc ids available to be stolen
> > +	 */
> > +	spinlock_t contexts_lock;
> > +	struct ida guc_ids;
> > +	struct list_head guc_id_list;
> > +
> >   	bool submission_selected;
> >   	struct i915_vma *ads_vma;
> > @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
> >   				 response_buf, response_buf_size, 0);
> >   }
> > +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> > +					   const u32 *action,
> > +					   u32 len,
> > +					   bool loop)
> > +{
> > +	int err;
> > +	unsigned int sleep_period_ms = 1;
> > +	bool not_atomic = !in_atomic() && !irqs_disabled();
> > +
> > +	/* No sleeping with spin locks, just busy loop */
> > +	might_sleep_if(loop && not_atomic);
> > +
> > +retry:
> > +	err = intel_guc_send_nb(guc, action, len);
> > +	if (unlikely(err == -EBUSY && loop)) {
> > +		if (likely(not_atomic)) {
> > +			if (msleep_interruptible(sleep_period_ms))
> > +				return -EINTR;
> > +			sleep_period_ms = sleep_period_ms << 1;
> > +		} else {
> > +			cpu_relax();
> 
> Potentially something we can change later, but if we're in atomic context we
> can't keep looping without a timeout while we get -EBUSY, we need to bail
> out early.
>

Eventually intel_guc_send_nb will pop with a different return code after
1 second I think if the GuC is hung.
 
> > +		}
> > +		goto retry;
> > +	}
> > +
> > +	return err;
> > +}
> > +
> >   static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
> >   {
> >   	intel_guc_ct_event_handler(&guc->ct);
> > @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   int intel_guc_reset_engine(struct intel_guc *guc,
> >   			   struct intel_engine_cs *engine);
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg, u32 len);
> > +
> >   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 83ec60ea3f89..28ff82c5be45 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
> >   	case INTEL_GUC_ACTION_DEFAULT:
> >   		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
> >   		break;
> > +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> > +		ret = intel_guc_deregister_done_process_msg(guc, payload,
> > +							    len);
> > +		break;
> >   	default:
> >   		ret = -EOPNOTSUPP;
> >   		break;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 53b4a5eb4a85..a47b3813b4d0 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -13,7 +13,9 @@
> >   #include "gt/intel_gt.h"
> >   #include "gt/intel_gt_irq.h"
> >   #include "gt/intel_gt_pm.h"
> > +#include "gt/intel_gt_requests.h"
> >   #include "gt/intel_lrc.h"
> > +#include "gt/intel_lrc_reg.h"
> >   #include "gt/intel_mocs.h"
> >   #include "gt/intel_ring.h"
> > @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
> >   		   &ce->guc_sched_state_no_lock);
> >   }
> > +/*
> > + * Below is a set of functions which control the GuC scheduling state which
> > + * require a lock, aside from the special case where the functions are called
> > + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
> > + * path to be executing on the context.
> > + */
> 
> Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even if it
> isn't strictly needed? I'd prefer to avoid the asymmetry in the locking

If this code is called from releasing a fence a circular dependency is
created. Once we move the DRM scheduler the locking becomes way easier
and we clean this up then. We already have task for locking clean up.

> scheme if possible, as that might case trouble if thing change in the
> future.

Again this is temporary code, so I think we can live with this until the
DRM scheduler lands.

> 
> > +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
> > +#define SCHED_STATE_DESTROYED				BIT(1)
> > +static inline void init_sched_state(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	atomic_set(&ce->guc_sched_state_no_lock, 0);
> > +	ce->guc_state.sched_state = 0;
> > +}
> > +
> > +static inline bool
> > +context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state &
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> 
> No need for (). Below as well.
> 

Yep.

> > +}
> > +
> > +static inline void
> > +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	/* Only should be called from guc_lrc_desc_pin() */
> > +	ce->guc_state.sched_state |=
> > +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> > +}
> > +
> > +static inline void
> > +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state =
> > +		(ce->guc_state.sched_state &
> > +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> 
> nit: can also use
> 
> ce->guc_state.sched_state &=
> 	~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER
>

Yep.
 
> 
> > +}
> > +
> > +static inline bool
> > +context_destroyed(struct intel_context *ce)
> > +{
> > +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> > +}
> > +
> > +static inline void
> > +set_context_destroyed(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> > +}
> > +
> > +static inline bool context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	return (ce->guc_id == GUC_INVALID_LRC_ID);
> > +}
> > +
> > +static inline void set_context_guc_id_invalid(struct intel_context *ce)
> > +{
> > +	ce->guc_id = GUC_INVALID_LRC_ID;
> > +}
> > +
> > +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> > +{
> > +	return &ce->engine->gt->uc.guc;
> > +}
> > +
> >   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >   {
> >   	return rb_entry(rb, struct i915_priolist, node);
> > @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	int len = 0;
> >   	bool enabled = context_enabled(ce);
> > +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> > +	GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> >   	if (!enabled) {
> >   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >   		action[len++] = ce->guc_id;
> > @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
> >   	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > +	spin_lock_init(&guc->contexts_lock);
> > +	INIT_LIST_HEAD(&guc->guc_id_list);
> > +	ida_init(&guc->guc_ids);
> > +
> >   	return 0;
> >   }
> > @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
> >   	i915_sched_engine_put(guc->sched_engine);
> >   }
> > -static int guc_context_alloc(struct intel_context *ce)
> > +static inline void queue_request(struct i915_sched_engine *sched_engine,
> > +				 struct i915_request *rq,
> > +				 int prio)
> >   {
> > -	return lrc_alloc(ce, ce->engine);
> > +	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > +	list_add_tail(&rq->sched.link,
> > +		      i915_sched_lookup_priolist(sched_engine, prio));
> > +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +}
> > +
> > +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > +				     struct i915_request *rq)
> > +{
> > +	int ret;
> > +
> > +	__i915_request_submit(rq);
> > +
> > +	trace_i915_request_in(rq, 0);
> > +
> > +	guc_set_lrc_tail(rq);
> > +	ret = guc_add_request(guc, rq);
> > +	if (ret == -EBUSY)
> > +		guc->stalled_request = rq;
> > +
> > +	return ret;
> > +}
> > +
> > +static void guc_submit_request(struct i915_request *rq)
> > +{
> > +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > +	unsigned long flags;
> > +
> > +	/* Will be called from irq-context when using foreign fences. */
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +
> > +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > +		queue_request(sched_engine, rq, rq_prio(rq));
> > +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > +		tasklet_hi_schedule(&sched_engine->tasklet);
> > +
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
> > +static int new_guc_id(struct intel_guc *guc)
> > +{
> > +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> > +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> > +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > +}
> > +
> > +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	if (!context_guc_id_invalid(ce)) {
> > +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> > +		reset_lrc_desc(guc, ce->guc_id);
> > +		set_context_guc_id_invalid(ce);
> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +}
> > +
> > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	__release_guc_id(guc, ce);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int steal_guc_id(struct intel_guc *guc)
> > +{
> > +	struct intel_context *ce;
> > +	int guc_id;
> > +
> > +	if (!list_empty(&guc->guc_id_list)) {
> > +		ce = list_first_entry(&guc->guc_id_list,
> > +				      struct intel_context,
> > +				      guc_id_link);
> > +
> > +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +		GEM_BUG_ON(context_guc_id_invalid(ce));
> > +
> > +		list_del_init(&ce->guc_id_link);
> > +		guc_id = ce->guc_id;
> > +		set_context_guc_id_invalid(ce);
> > +		return guc_id;
> > +	} else {
> > +		return -EAGAIN;
> > +	}
> > +}
> > +
> > +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> > +{
> > +	int ret;
> > +
> > +	ret = new_guc_id(guc);
> > +	if (unlikely(ret < 0)) {
> > +		ret = steal_guc_id(guc);
> > +		if (ret < 0)
> > +			return ret;
> > +	}
> > +
> > +	*out = ret;
> > +	return 0;
> 
> Is it worth adding spinlock_held asserts for guc->contexts_lock in these ID
> functions? Doubles up as a documentation of what locking we expect.
> 

Never a bad idea to have asserts in the code. Will add.

> > +}
> > +
> > +#define PIN_GUC_ID_TRIES	4
> > +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	int ret = 0;
> > +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > +
> > +try_again:
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +
> > +	if (context_guc_id_invalid(ce)) {
> > +		ret = assign_guc_id(guc, &ce->guc_id);
> > +		if (ret)
> > +			goto out_unlock;
> > +		ret = 1;	/* Indidcates newly assigned guc_id */
> > +	}
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	atomic_inc(&ce->guc_id_ref);
> > +
> > +out_unlock:
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * -EAGAIN indicates no guc_ids are available, let's retire any
> > +	 * outstanding requests to see if that frees up a guc_id. If the first
> > +	 * retire didn't help, insert a sleep with the timeslice duration before
> > +	 * attempting to retire more requests. Double the sleep period each
> > +	 * subsequent pass before finally giving up. The sleep period has max of
> > +	 * 100ms and minimum of 1ms.
> > +	 */
> > +	if (ret == -EAGAIN && --tries) {
> > +		if (PIN_GUC_ID_TRIES - tries > 1) {
> > +			unsigned int timeslice_shifted =
> > +				ce->engine->props.timeslice_duration_ms <<
> > +				(PIN_GUC_ID_TRIES - tries - 2);
> > +			unsigned int max = min_t(unsigned int, 100,
> > +						 timeslice_shifted);
> > +
> > +			msleep(max_t(unsigned int, max, 1));
> > +		}
> > +		intel_gt_retire_requests(guc_to_gt(guc));
> > +		goto try_again;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > +{
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> > +
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> > +	    !atomic_read(&ce->guc_id_ref))
> > +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +}
> > +
> > +static int __guc_action_register_context(struct intel_guc *guc,
> > +					 u32 guc_id,
> > +					 u32 offset)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > +		guc_id,
> > +		offset,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int register_context(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > +		ce->guc_id * sizeof(struct guc_lrc_desc);
> > +
> > +	return __guc_action_register_context(guc, ce->guc_id, offset);
> > +}
> > +
> > +static int __guc_action_deregister_context(struct intel_guc *guc,
> > +					   u32 guc_id)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > +		guc_id,
> > +	};
> > +
> > +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
> > +}
> > +
> > +static int deregister_context(struct intel_context *ce, u32 guc_id)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	return __guc_action_deregister_context(guc, guc_id);
> > +}
> > +
> > +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > +{
> > +	switch (class) {
> > +	case RENDER_CLASS:
> > +		return mask >> RCS0;
> > +	case VIDEO_ENHANCEMENT_CLASS:
> > +		return mask >> VECS0;
> > +	case VIDEO_DECODE_CLASS:
> > +		return mask >> VCS0;
> > +	case COPY_ENGINE_CLASS:
> > +		return mask >> BCS0;
> > +	default:
> > +		GEM_BUG_ON("Invalid Class");
> 
> we usually use MISSING_CASE for this type of errors.

As soon as start talking in logical space (series immediatly after this
gets merged) this function gets deleted but will fix.

> 
> > +		return 0;
> > +	}
> > +}
> > +
> > +static void guc_context_policy_init(struct intel_engine_cs *engine,
> > +				    struct guc_lrc_desc *desc)
> > +{
> > +	desc->policy_flags = 0;
> > +
> > +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> > +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> > +}
> > +
> > +static int guc_lrc_desc_pin(struct intel_context *ce)
> > +{
> > +	struct intel_runtime_pm *runtime_pm =
> > +		&ce->engine->gt->i915->runtime_pm;
> 
> If you move this after the engine var below you can skip the ce->engine
> jump. Also, you can shorten the pointer chasing by using
> engine->uncore->rpm.
> 

Sure.

> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	u32 desc_idx = ce->guc_id;
> > +	struct guc_lrc_desc *desc;
> > +	bool context_registered;
> > +	intel_wakeref_t wakeref;
> > +	int ret = 0;
> > +
> > +	GEM_BUG_ON(!engine->mask);
> > +
> > +	/*
> > +	 * Ensure LRC + CT vmas are is same region as write barrier is done
> > +	 * based on CT vma region.
> > +	 */
> > +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> > +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
> > +
> > +	context_registered = lrc_desc_registered(guc, desc_idx);
> > +
> > +	reset_lrc_desc(guc, desc_idx);
> > +	set_lrc_desc_registered(guc, desc_idx, ce);
> > +
> > +	desc = __get_lrc_desc(guc, desc_idx);
> > +	desc->engine_class = engine_class_to_guc_class(engine->class);
> > +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> > +						      engine->mask);
> > +	desc->hw_context_desc = ce->lrc.lrca;
> > +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> > +	guc_context_policy_init(engine, desc);
> > +	init_sched_state(ce);
> > +
> > +	/*
> > +	 * The context_lookup xarray is used to determine if the hardware
> > +	 * context is currently registered. There are two cases in which it
> > +	 * could be regisgered either the guc_id has been stole from from
> 
> typo regisgered
>

John got this one. Fixed.
 
> > +	 * another context or the lrc descriptor address of this context has
> > +	 * changed. In either case the context needs to be deregistered with the
> > +	 * GuC before registering this context.
> > +	 */
> > +	if (context_registered) {
> > +		set_context_wait_for_deregister_to_register(ce);
> > +		intel_context_get(ce);
> > +
> > +		/*
> > +		 * If stealing the guc_id, this ce has the same guc_id as the
> > +		 * context whos guc_id was stole.
> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = deregister_context(ce, ce->guc_id);
> > +	} else {
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			ret = register_context(ce);
> > +	}
> > +
> > +	return ret;
> >   }
> >   static int guc_context_pre_pin(struct intel_context *ce,
> > @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
> >   static int guc_context_pin(struct intel_context *ce, void *vaddr)
> >   {
> > +	if (i915_ggtt_offset(ce->state) !=
> > +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> > +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> 
> Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being
> re-assigned in there; it looks like it can only change if the context is
> new, but for future-proofing IMO better to re-order here.
> 

No this is right. ce->state is assigned in lr_pre_pin and ce->lrc.lrca is
assigned in lrc_pin. To catch the change you have to check between
those two functions.

> > +
> >   	return lrc_pin(ce, ce->engine, vaddr);
> 
> Could use a comment to say that the GuC context pinning call is delayed
> until request alloc and to look at the comment in the latter function for
> details (or move the explanation here and refer here from request alloc).
>

Yes.
 
> >   }
> > +static void guc_context_unpin(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +
> > +	unpin_guc_id(guc, ce);
> > +	lrc_unpin(ce);
> > +}
> > +
> > +static void guc_context_post_unpin(struct intel_context *ce)
> > +{
> > +	lrc_post_unpin(ce);
> 
> why do we need this function? we can just pass lrc_post_unpin to the ops
> (like we already do).
> 

We don't before style reasons I think having a GuC specific wrapper is
correct.

> > +}
> > +
> > +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = ce->engine;
> > +	struct intel_guc *guc = &engine->gt->uc.guc;
> > +	unsigned long flags;
> > +
> > +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> > +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> > +
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	set_context_destroyed(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +	deregister_context(ce, ce->guc_id);
> > +}
> > +
> > +static void guc_context_destroy(struct kref *kref)
> > +{
> > +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> > +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> 
> same as above, going through engine->uncore->rpm is shorter
>

Sure.
 
> > +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> > +	intel_wakeref_t wakeref;
> > +	unsigned long flags;
> > +
> > +	/*
> > +	 * If the guc_id is invalid this context has been stolen and we can free
> > +	 * it immediately. Also can be freed immediately if the context is not
> > +	 * registered with the GuC.
> > +	 */
> > +	if (context_guc_id_invalid(ce) ||
> > +	    !lrc_desc_registered(guc, ce->guc_id)) {
> > +		release_guc_id(guc, ce);
> 
> it feels a bit weird that we call release_guc_id in the case where the id is
> invalid. The code handles it fine, but still the flow doesn't feel clean.
> Not a blocker.
> 

I understand. Let's see if this can get cleaned up at some point.

> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * We have to acquire the context spinlock and check guc_id again, if it
> > +	 * is valid it hasn't been stolen and needs to be deregistered. We
> > +	 * delete this context from the list of unpinned guc_ids available to
> > +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> > +	 * returns indicating this context has been deregistered the guc_id is
> > +	 * returned to the pool of available guc_ids.
> > +	 */
> > +	spin_lock_irqsave(&guc->contexts_lock, flags);
> > +	if (context_guc_id_invalid(ce)) {
> > +		__release_guc_id(guc, ce);
> 
> But here the call to __release_guc_id is unneded, right? the ce doesn't own
> the ID anymore and both the steal and the release function already clean
> ce->guc_id_link, so there should be nothing left to clean.
>

This is dead code. Will delete.

> > +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +		lrc_destroy(kref);
> > +		return;
> > +	}
> > +
> > +	if (!list_empty(&ce->guc_id_link))
> > +		list_del_init(&ce->guc_id_link);
> > +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +
> > +	/*
> > +	 * We defer GuC context deregistration until the context is destroyed
> > +	 * in order to save on CTBs. With this optimization ideally we only need
> > +	 * 1 CTB to register the context during the first pin and 1 CTB to
> > +	 * deregister the context when the context is destroyed. Without this
> > +	 * optimization, a CTB would be needed every pin & unpin.
> > +	 *
> > +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
> > +	 * from context_free_worker when not runtime wakeref is held.
> > +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
> > +	 * in H2G CTB to deregister the context. A future patch may defer this
> > +	 * H2G CTB if the runtime wakeref is zero.
> > +	 */
> > +	with_intel_runtime_pm(runtime_pm, wakeref)
> > +		guc_lrc_desc_unpin(ce);
> > +}
> > +
> > +static int guc_context_alloc(struct intel_context *ce)
> > +{
> > +	return lrc_alloc(ce, ce->engine);
> > +}
> > +
> >   static const struct intel_context_ops guc_context_ops = {
> >   	.alloc = guc_context_alloc,
> >   	.pre_pin = guc_context_pre_pin,
> >   	.pin = guc_context_pin,
> > -	.unpin = lrc_unpin,
> > -	.post_unpin = lrc_post_unpin,
> > +	.unpin = guc_context_unpin,
> > +	.post_unpin = guc_context_post_unpin,
> >   	.enter = intel_context_enter_engine,
> >   	.exit = intel_context_exit_engine,
> >   	.reset = lrc_reset,
> > -	.destroy = lrc_destroy,
> > +	.destroy = guc_context_destroy,
> >   };
> > -static int guc_request_alloc(struct i915_request *request)
> > +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> >   {
> > +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > +}
> > +
> > +static int guc_request_alloc(struct i915_request *rq)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +	struct intel_guc *guc = ce_to_guc(ce);
> >   	int ret;
> > -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
> > +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
> >   	/*
> >   	 * Flush enough space to reduce the likelihood of waiting after
> >   	 * we start building the request - in which case we will just
> >   	 * have to repeat work.
> >   	 */
> > -	request->reserved_space += GUC_REQUEST_SIZE;
> > +	rq->reserved_space += GUC_REQUEST_SIZE;
> >   	/*
> >   	 * Note that after this point, we have committed to using
> > @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
> >   	 */
> >   	/* Unconditionally invalidate GPU caches and TLBs. */
> > -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> > +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> >   	if (ret)
> >   		return ret;
> > -	request->reserved_space -= GUC_REQUEST_SIZE;
> > -	return 0;
> > -}
> > +	rq->reserved_space -= GUC_REQUEST_SIZE;
> > -static inline void queue_request(struct i915_sched_engine *sched_engine,
> > -				 struct i915_request *rq,
> > -				 int prio)
> > -{
> > -	GEM_BUG_ON(!list_empty(&rq->sched.link));
> > -	list_add_tail(&rq->sched.link,
> > -		      i915_sched_lookup_priolist(sched_engine, prio));
> > -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > -}
> > -
> > -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> > -				     struct i915_request *rq)
> > -{
> > -	int ret;
> > -
> > -	__i915_request_submit(rq);
> > -
> > -	trace_i915_request_in(rq, 0);
> > -
> > -	guc_set_lrc_tail(rq);
> > -	ret = guc_add_request(guc, rq);
> > -	if (ret == -EBUSY)
> > -		guc->stalled_request = rq;
> > -
> > -	return ret;
> > -}
> > -
> > -static void guc_submit_request(struct i915_request *rq)
> > -{
> > -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> > -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
> > -	unsigned long flags;
> > +	/*
> > +	 * Call pin_guc_id here rather than in the pinning step as with
> > +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
> > +	 * guc_ids and creating horrible race conditions. This is especially bad
> > +	 * when guc_ids are being stolen due to over subscription. By the time
> > +	 * this function is reached, it is guaranteed that the guc_id will be
> > +	 * persistent until the generated request is retired. Thus, sealing these
> > +	 * race conditions. It is still safe to fail here if guc_ids are
> > +	 * exhausted and return -EAGAIN to the user indicating that they can try
> > +	 * again in the future.
> > +	 *
> > +	 * There is no need for a lock here as the timeline mutex ensures at
> > +	 * most one context can be executing this code path at once. The
> > +	 * guc_id_ref is incremented once for every request in flight and
> > +	 * decremented on each retire. When it is zero, a lock around the
> > +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
> > +	 */
> > +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> > +		return 0;
> > -	/* Will be called from irq-context when using foreign fences. */
> > -	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
> > +	if (unlikely(ret < 0))
> > +		return ret;;
> 
> typo ";;"
> 

Yep.

> > +	if (context_needs_register(ce, !!ret)) {
> > +		ret = guc_lrc_desc_pin(ce);
> > +		if (unlikely(ret)) {	/* unwind */
> > +			atomic_dec(&ce->guc_id_ref);
> > +			unpin_guc_id(guc, ce);
> > +			return ret;
> > +		}
> > +	}
> > -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > -		queue_request(sched_engine, rq, rq_prio(rq));
> > -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> > -		tasklet_hi_schedule(&sched_engine->tasklet);
> > +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> 
> 
> Might be worth moving the pinning to a separate function to keep the
> request_alloc focused on the request. Can be done as a follow up.
>

Sure. Let me see how that looks in a follow up.
 
> > -	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +	return 0;
> >   }
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> > @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
> >   	engine->submit_request = guc_submit_request;
> >   }
> > +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> > +					  struct intel_context *ce)
> > +{
> > +	if (context_guc_id_invalid(ce))
> > +		pin_guc_id(guc, ce);
> > +	guc_lrc_desc_pin(ce);
> > +}
> > +
> > +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > +{
> > +	struct intel_gt *gt = guc_to_gt(guc);
> > +	struct intel_engine_cs *engine;
> > +	enum intel_engine_id id;
> > +
> > +	/* make sure all descriptors are clean... */
> > +	xa_destroy(&guc->context_lookup);
> > +
> > +	/*
> > +	 * Some contexts might have been pinned before we enabled GuC
> > +	 * submission, so we need to add them to the GuC bookeeping.
> > +	 * Also, after a reset the GuC we want to make sure that the information
> > +	 * shared with GuC is properly reset. The kernel lrcs are not attached
> > +	 * to the gem_context, so they need to be added separately.
> > +	 *
> > +	 * Note: we purposely do not check the error return of
> > +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
> > +	 * One, if there aren't enough free IDs, but we're guaranteed to have
> > +	 * enough here (we're either only pinning a handful of lrc on first boot
> > +	 * or we're re-pinning lrcs that were already pinned before the reset).
> > +	 * Two, if the GuC has died and CTBs can't make forward progress.
> > +	 * Presumably, the GuC should be alive as this function is called on
> > +	 * driver load or after a reset. Even if it is dead, another full GPU
> > +	 * reset will be triggered and this function would be called again.
> > +	 */
> > +
> > +	for_each_engine(engine, gt, id)
> > +		if (engine->kernel_context)
> > +			guc_kernel_context_pin(guc, engine->kernel_context);
> > +}
> > +
> >   static void guc_release(struct intel_engine_cs *engine)
> >   {
> >   	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
> > @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   void intel_guc_submission_enable(struct intel_guc *guc)
> >   {
> > +	guc_init_lrc_mapping(guc);
> >   }
> >   void intel_guc_submission_disable(struct intel_guc *guc)
> > @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
> >   {
> >   	guc->submission_selected = __guc_submission_selected(guc);
> >   }
> > +
> > +static inline struct intel_context *
> > +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> > +{
> > +	struct intel_context *ce;
> > +
> > +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Invalid desc_idx %u", desc_idx);
> > +		return NULL;
> > +	}
> > +
> > +	ce = __get_context(guc, desc_idx);
> > +	if (unlikely(!ce)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm,
> > +			"Context is NULL, desc_idx %u", desc_idx);
> > +		return NULL;
> > +	}
> > +
> > +	return ce;
> > +}
> > +
> > +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> > +					  const u32 *msg,
> > +					  u32 len)
> > +{
> > +	struct intel_context *ce;
> > +	u32 desc_idx = msg[0];
> > +
> > +	if (unlikely(len < 1)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> > +		return -EPROTO;
> > +	}
> > +
> > +	ce = g2h_context_lookup(guc, desc_idx);
> > +	if (unlikely(!ce))
> > +		return -EPROTO;
> > +
> > +	if (context_wait_for_deregister_to_register(ce)) {
> > +		struct intel_runtime_pm *runtime_pm =
> > +			&ce->engine->gt->i915->runtime_pm;
> > +		intel_wakeref_t wakeref;
> > +
> > +		/*
> > +		 * Previous owner of this guc_id has been deregistered, now safe
> > +		 * register this context.
> > +		 */
> > +		with_intel_runtime_pm(runtime_pm, wakeref)
> > +			register_context(ce);
> > +		clr_context_wait_for_deregister_to_register(ce);
> > +		intel_context_put(ce);
> > +	} else if (context_destroyed(ce)) {
> > +		/* Context has been destroyed */
> > +		release_guc_id(guc, ce);
> > +		lrc_destroy(&ce->ref);
> > +	}
> > +
> > +	return 0;
> > +}
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index 943fe485c662..204c95c39353 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -4142,6 +4142,7 @@ enum {
> >   	FAULT_AND_CONTINUE /* Unsupported */
> >   };
> > +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
> >   #define GEN8_CTX_VALID (1 << 0)
> >   #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
> >   #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 86b4c9f2613d..b48c4905d3fc 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
> >   	 */
> >   	if (!list_empty(&rq->sched.link))
> >   		remove_from_engine(rq);
> > +	atomic_dec(&rq->context->guc_id_ref);
> 
> Does this work/make sense  if GuC is disabled?
> 

It doesn't matter if the GuC is disabled as this var isn't used if it
is. A following patch adds a vfunc here and pushes this in the GuC
backend.

Matt

> Daniele
> 
> >   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> >   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines
  2021-07-20  1:54       ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 16:47         ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 16:47 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On Mon, Jul 19, 2021 at 06:54:00PM -0700, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 06:28:47PM -0700, John Harrison wrote:
> > On 7/16/2021 13:16, Matthew Brost wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > The serial number tracking of engines happens at the backend of
> > > request submission and was expecting to only be given physical
> > > engines. However, in GuC submission mode, the decomposition of virtual
> > > to physical engines does not happen in i915. Instead, requests are
> > > submitted to their virtual engine mask all the way through to the
> > > hardware (i.e. to GuC). This would mean that the heart beat code
> > > thinks the physical engines are idle due to the serial number not
> > > incrementing.
> > > 
> > > This patch updates the tracking to decompose virtual engines into
> > > their physical constituents and tracks the request against each. This
> > > is not entirely accurate as the GuC will only be issuing the request
> > > to one physical engine. However, it is the best that i915 can do given
> > > that it has no knowledge of the GuC's scheduling decisions.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Still needs to pull in Tvrtko's updated subject and description.
> > 
> 
> Forgot this in the post, already have it fixed locally.
> 

In addition to updated comment, I took his advise to have a default
behavior without the vfunc to just increment engine->serial.

With those changes:
Reviewed-by: Matthew Brost <matthew.brost@intel.com> 

> Matt
> 
> > John.
> > 
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
> > >   .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
> > >   drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
> > >   drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
> > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
> > >   drivers/gpu/drm/i915/i915_request.c              |  4 +++-
> > >   6 files changed, 39 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > index 1cb9c3b70b29..8ad304b2f2e4 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > @@ -388,6 +388,8 @@ struct intel_engine_cs {
> > >   	void		(*park)(struct intel_engine_cs *engine);
> > >   	void		(*unpark)(struct intel_engine_cs *engine);
> > > +	void		(*bump_serial)(struct intel_engine_cs *engine);
> > > +
> > >   	void		(*set_default_submission)(struct intel_engine_cs *engine);
> > >   	const struct intel_context_ops *cops;
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > index 28492cdce706..920707e22eb0 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > @@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs *engine)
> > >   	lrc_fini_wa_ctx(engine);
> > >   }
> > > +static void execlist_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	engine->serial++;
> > > +}
> > > +
> > >   static void
> > >   logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> > >   {
> > > @@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> > >   	engine->cops = &execlists_context_ops;
> > >   	engine->request_alloc = execlists_request_alloc;
> > > +	engine->bump_serial = execlist_bump_serial;
> > >   	engine->reset.prepare = execlists_reset_prepare;
> > >   	engine->reset.rewind = execlists_reset_rewind;
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > index 5c4d204d07cc..61469c631057 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > @@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine)
> > >   	}
> > >   }
> > > +static void ring_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	engine->serial++;
> > > +}
> > > +
> > >   static void setup_common(struct intel_engine_cs *engine)
> > >   {
> > >   	struct drm_i915_private *i915 = engine->i915;
> > > @@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine)
> > >   	engine->cops = &ring_context_ops;
> > >   	engine->request_alloc = ring_request_alloc;
> > > +	engine->bump_serial = ring_bump_serial;
> > >   	/*
> > >   	 * Using a global execution timeline; the previous final breadcrumb is
> > > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > > index 68970398e4ef..9203c766db80 100644
> > > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > > @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
> > >   	intel_engine_fini_retire(engine);
> > >   }
> > > +static void mock_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	engine->serial++;
> > > +}
> > > +
> > >   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> > >   				    const char *name,
> > >   				    int id)
> > > @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> > >   	engine->base.cops = &mock_context_ops;
> > >   	engine->base.request_alloc = mock_request_alloc;
> > > +	engine->base.bump_serial = mock_bump_serial;
> > >   	engine->base.emit_flush = mock_emit_flush;
> > >   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
> > >   	engine->base.submit_request = mock_submit_request;
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 7b3e1c91e689..372e0dc7617a 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -1485,6 +1485,20 @@ static void guc_release(struct intel_engine_cs *engine)
> > >   	lrc_fini_wa_ctx(engine);
> > >   }
> > > +static void guc_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	engine->serial++;
> > > +}
> > > +
> > > +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	struct intel_engine_cs *e;
> > > +	intel_engine_mask_t tmp, mask = engine->mask;
> > > +
> > > +	for_each_engine_masked(e, engine->gt, mask, tmp)
> > > +		e->serial++;
> > > +}
> > > +
> > >   static void guc_default_vfuncs(struct intel_engine_cs *engine)
> > >   {
> > >   	/* Default vfuncs which can be overridden by each engine. */
> > > @@ -1493,6 +1507,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> > >   	engine->cops = &guc_context_ops;
> > >   	engine->request_alloc = guc_request_alloc;
> > > +	engine->bump_serial = guc_bump_serial;
> > >   	engine->sched_engine->schedule = i915_schedule;
> > > @@ -1828,6 +1843,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > >   	ve->base.cops = &virtual_guc_context_ops;
> > >   	ve->base.request_alloc = guc_request_alloc;
> > > +	ve->base.bump_serial = virtual_guc_bump_serial;
> > >   	ve->base.submit_request = guc_submit_request;
> > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > > index 01aa3d1ee2b1..30ecdc46a12f 100644
> > > --- a/drivers/gpu/drm/i915/i915_request.c
> > > +++ b/drivers/gpu/drm/i915/i915_request.c
> > > @@ -669,7 +669,9 @@ bool __i915_request_submit(struct i915_request *request)
> > >   				     request->ring->vaddr + request->postfix);
> > >   	trace_i915_request_execute(request);
> > > -	engine->serial++;
> > > +	if (engine->bump_serial)
> > > +		engine->bump_serial(engine);
> > > +
> > >   	result = true;
> > >   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> > 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 20/51] drm/i915: Track 'serial' counts for virtual engines
@ 2021-07-20 16:47         ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 16:47 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Mon, Jul 19, 2021 at 06:54:00PM -0700, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 06:28:47PM -0700, John Harrison wrote:
> > On 7/16/2021 13:16, Matthew Brost wrote:
> > > From: John Harrison <John.C.Harrison@Intel.com>
> > > 
> > > The serial number tracking of engines happens at the backend of
> > > request submission and was expecting to only be given physical
> > > engines. However, in GuC submission mode, the decomposition of virtual
> > > to physical engines does not happen in i915. Instead, requests are
> > > submitted to their virtual engine mask all the way through to the
> > > hardware (i.e. to GuC). This would mean that the heart beat code
> > > thinks the physical engines are idle due to the serial number not
> > > incrementing.
> > > 
> > > This patch updates the tracking to decompose virtual engines into
> > > their physical constituents and tracks the request against each. This
> > > is not entirely accurate as the GuC will only be issuing the request
> > > to one physical engine. However, it is the best that i915 can do given
> > > that it has no knowledge of the GuC's scheduling decisions.
> > > 
> > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Still needs to pull in Tvrtko's updated subject and description.
> > 
> 
> Forgot this in the post, already have it fixed locally.
> 

In addition to updated comment, I took his advise to have a default
behavior without the vfunc to just increment engine->serial.

With those changes:
Reviewed-by: Matthew Brost <matthew.brost@intel.com> 

> Matt
> 
> > John.
> > 
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_engine_types.h     |  2 ++
> > >   .../gpu/drm/i915/gt/intel_execlists_submission.c |  6 ++++++
> > >   drivers/gpu/drm/i915/gt/intel_ring_submission.c  |  6 ++++++
> > >   drivers/gpu/drm/i915/gt/mock_engine.c            |  6 ++++++
> > >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c    | 16 ++++++++++++++++
> > >   drivers/gpu/drm/i915/i915_request.c              |  4 +++-
> > >   6 files changed, 39 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > index 1cb9c3b70b29..8ad304b2f2e4 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > > @@ -388,6 +388,8 @@ struct intel_engine_cs {
> > >   	void		(*park)(struct intel_engine_cs *engine);
> > >   	void		(*unpark)(struct intel_engine_cs *engine);
> > > +	void		(*bump_serial)(struct intel_engine_cs *engine);
> > > +
> > >   	void		(*set_default_submission)(struct intel_engine_cs *engine);
> > >   	const struct intel_context_ops *cops;
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > index 28492cdce706..920707e22eb0 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > > @@ -3191,6 +3191,11 @@ static void execlists_release(struct intel_engine_cs *engine)
> > >   	lrc_fini_wa_ctx(engine);
> > >   }
> > > +static void execlist_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	engine->serial++;
> > > +}
> > > +
> > >   static void
> > >   logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> > >   {
> > > @@ -3200,6 +3205,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
> > >   	engine->cops = &execlists_context_ops;
> > >   	engine->request_alloc = execlists_request_alloc;
> > > +	engine->bump_serial = execlist_bump_serial;
> > >   	engine->reset.prepare = execlists_reset_prepare;
> > >   	engine->reset.rewind = execlists_reset_rewind;
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > index 5c4d204d07cc..61469c631057 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > > @@ -1047,6 +1047,11 @@ static void setup_irq(struct intel_engine_cs *engine)
> > >   	}
> > >   }
> > > +static void ring_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	engine->serial++;
> > > +}
> > > +
> > >   static void setup_common(struct intel_engine_cs *engine)
> > >   {
> > >   	struct drm_i915_private *i915 = engine->i915;
> > > @@ -1066,6 +1071,7 @@ static void setup_common(struct intel_engine_cs *engine)
> > >   	engine->cops = &ring_context_ops;
> > >   	engine->request_alloc = ring_request_alloc;
> > > +	engine->bump_serial = ring_bump_serial;
> > >   	/*
> > >   	 * Using a global execution timeline; the previous final breadcrumb is
> > > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > > index 68970398e4ef..9203c766db80 100644
> > > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > > @@ -292,6 +292,11 @@ static void mock_engine_release(struct intel_engine_cs *engine)
> > >   	intel_engine_fini_retire(engine);
> > >   }
> > > +static void mock_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	engine->serial++;
> > > +}
> > > +
> > >   struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> > >   				    const char *name,
> > >   				    int id)
> > > @@ -318,6 +323,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
> > >   	engine->base.cops = &mock_context_ops;
> > >   	engine->base.request_alloc = mock_request_alloc;
> > > +	engine->base.bump_serial = mock_bump_serial;
> > >   	engine->base.emit_flush = mock_emit_flush;
> > >   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
> > >   	engine->base.submit_request = mock_submit_request;
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index 7b3e1c91e689..372e0dc7617a 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -1485,6 +1485,20 @@ static void guc_release(struct intel_engine_cs *engine)
> > >   	lrc_fini_wa_ctx(engine);
> > >   }
> > > +static void guc_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	engine->serial++;
> > > +}
> > > +
> > > +static void virtual_guc_bump_serial(struct intel_engine_cs *engine)
> > > +{
> > > +	struct intel_engine_cs *e;
> > > +	intel_engine_mask_t tmp, mask = engine->mask;
> > > +
> > > +	for_each_engine_masked(e, engine->gt, mask, tmp)
> > > +		e->serial++;
> > > +}
> > > +
> > >   static void guc_default_vfuncs(struct intel_engine_cs *engine)
> > >   {
> > >   	/* Default vfuncs which can be overridden by each engine. */
> > > @@ -1493,6 +1507,7 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> > >   	engine->cops = &guc_context_ops;
> > >   	engine->request_alloc = guc_request_alloc;
> > > +	engine->bump_serial = guc_bump_serial;
> > >   	engine->sched_engine->schedule = i915_schedule;
> > > @@ -1828,6 +1843,7 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> > >   	ve->base.cops = &virtual_guc_context_ops;
> > >   	ve->base.request_alloc = guc_request_alloc;
> > > +	ve->base.bump_serial = virtual_guc_bump_serial;
> > >   	ve->base.submit_request = guc_submit_request;
> > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > > index 01aa3d1ee2b1..30ecdc46a12f 100644
> > > --- a/drivers/gpu/drm/i915/i915_request.c
> > > +++ b/drivers/gpu/drm/i915/i915_request.c
> > > @@ -669,7 +669,9 @@ bool __i915_request_submit(struct i915_request *request)
> > >   				     request->ring->vaddr + request->postfix);
> > >   	trace_i915_request_execute(request);
> > > -	engine->serial++;
> > > +	if (engine->bump_serial)
> > > +		engine->bump_serial(engine);
> > > +
> > >   	result = true;
> > >   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 45/51] drm/i915/selftest: Fix workarounds selftest for GuC submission
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 17:14     ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 17:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:18PM -0700, Matthew Brost wrote:
> From: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
> 
> When GuC submission is enabled, the GuC controls engine resets. Rather
> than explicitly triggering a reset, the driver must submit a hanging
> context to GuC and wait for the reset to occur.
> 
> Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/Makefile                 |   1 +
>  .../gpu/drm/i915/gt/selftest_workarounds.c    | 130 +++++++++++++-----
>  .../i915/selftests/intel_scheduler_helpers.c  |  76 ++++++++++
>  .../i915/selftests/intel_scheduler_helpers.h  |  28 ++++
>  4 files changed, 201 insertions(+), 34 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
>  create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 10b3bb6207ba..ab7679957623 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -280,6 +280,7 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
>  i915-$(CONFIG_DRM_I915_SELFTEST) += \
>  	gem/selftests/i915_gem_client_blt.o \
>  	gem/selftests/igt_gem_utils.o \
> +	selftests/intel_scheduler_helpers.o \
>  	selftests/i915_random.o \
>  	selftests/i915_selftest.o \
>  	selftests/igt_atomic.o \
> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> index 7ebc4edb8ecf..7727bc531ea9 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> @@ -12,6 +12,7 @@
>  #include "selftests/igt_flush_test.h"
>  #include "selftests/igt_reset.h"
>  #include "selftests/igt_spinner.h"
> +#include "selftests/intel_scheduler_helpers.h"
>  #include "selftests/mock_drm.h"
>  
>  #include "gem/selftests/igt_gem_utils.h"
> @@ -261,28 +262,34 @@ static int do_engine_reset(struct intel_engine_cs *engine)
>  	return intel_engine_reset(engine, "live_workarounds");
>  }
>  
> +static int do_guc_reset(struct intel_engine_cs *engine)
> +{
> +	/* Currently a no-op as the reset is handled by GuC */
> +	return 0;
> +}
> +
>  static int
>  switch_to_scratch_context(struct intel_engine_cs *engine,
> -			  struct igt_spinner *spin)
> +			  struct igt_spinner *spin,
> +			  struct i915_request **rq)
>  {
>  	struct intel_context *ce;
> -	struct i915_request *rq;
>  	int err = 0;
>  
>  	ce = intel_context_create(engine);
>  	if (IS_ERR(ce))
>  		return PTR_ERR(ce);
>  
> -	rq = igt_spinner_create_request(spin, ce, MI_NOOP);
> +	*rq = igt_spinner_create_request(spin, ce, MI_NOOP);
>  	intel_context_put(ce);
>  
> -	if (IS_ERR(rq)) {
> +	if (IS_ERR(*rq)) {
>  		spin = NULL;
> -		err = PTR_ERR(rq);
> +		err = PTR_ERR(*rq);
>  		goto err;
>  	}
>  
> -	err = request_add_spin(rq, spin);
> +	err = request_add_spin(*rq, spin);
>  err:
>  	if (err && spin)
>  		igt_spinner_end(spin);
> @@ -296,6 +303,7 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
>  {
>  	struct intel_context *ce, *tmp;
>  	struct igt_spinner spin;
> +	struct i915_request *rq;
>  	intel_wakeref_t wakeref;
>  	int err;
>  
> @@ -316,13 +324,24 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
>  		goto out_spin;
>  	}
>  
> -	err = switch_to_scratch_context(engine, &spin);
> +	err = switch_to_scratch_context(engine, &spin, &rq);
>  	if (err)
>  		goto out_spin;
>  
> +	/* Ensure the spinner hasn't aborted */
> +	if (i915_request_completed(rq)) {
> +		pr_err("%s spinner failed to start\n", name);
> +		err = -ETIMEDOUT;
> +		goto out_spin;
> +	}
> +
>  	with_intel_runtime_pm(engine->uncore->rpm, wakeref)
>  		err = reset(engine);
>  
> +	/* Ensure the reset happens and kills the engine */
> +	if (err == 0)
> +		err = intel_selftest_wait_for_rq(rq);
> +
>  	igt_spinner_end(&spin);
>  
>  	if (err) {
> @@ -787,9 +806,26 @@ static int live_reset_whitelist(void *arg)
>  			continue;
>  
>  		if (intel_has_reset_engine(gt)) {
> -			err = check_whitelist_across_reset(engine,
> -							   do_engine_reset,
> -							   "engine");
> +			if (intel_engine_uses_guc(engine)) {
> +				struct intel_selftest_saved_policy saved;
> +				int err2;
> +
> +				err = intel_selftest_modify_policy(engine, &saved);
> +				if(err)
> +					goto out;
> +
> +				err = check_whitelist_across_reset(engine,
> +								   do_guc_reset,
> +								   "guc");
> +
> +				err2 = intel_selftest_restore_policy(engine, &saved);
> +				if (err == 0)
> +					err = err2;
> +			} else
> +				err = check_whitelist_across_reset(engine,
> +								   do_engine_reset,
> +								   "engine");
> +
>  			if (err)
>  				goto out;
>  		}
> @@ -1226,31 +1262,41 @@ live_engine_reset_workarounds(void *arg)
>  	reference_lists_init(gt, &lists);
>  
>  	for_each_engine(engine, gt, id) {
> +		struct intel_selftest_saved_policy saved;
> +		bool using_guc = intel_engine_uses_guc(engine);
>  		bool ok;
> +		int ret2;
>  
>  		pr_info("Verifying after %s reset...\n", engine->name);
> +		ret = intel_selftest_modify_policy(engine, &saved);
> +		if (ret)
> +			break;
> +
> +
>  		ce = intel_context_create(engine);
>  		if (IS_ERR(ce)) {
>  			ret = PTR_ERR(ce);
> -			break;
> +			goto restore;
>  		}
>  
> -		ok = verify_wa_lists(gt, &lists, "before reset");
> -		if (!ok) {
> -			ret = -ESRCH;
> -			goto err;
> -		}
> +		if (!using_guc) {
> +			ok = verify_wa_lists(gt, &lists, "before reset");
> +			if (!ok) {
> +				ret = -ESRCH;
> +				goto err;
> +			}
>  
> -		ret = intel_engine_reset(engine, "live_workarounds:idle");
> -		if (ret) {
> -			pr_err("%s: Reset failed while idle\n", engine->name);
> -			goto err;
> -		}
> +			ret = intel_engine_reset(engine, "live_workarounds:idle");
> +			if (ret) {
> +				pr_err("%s: Reset failed while idle\n", engine->name);
> +				goto err;
> +			}
>  
> -		ok = verify_wa_lists(gt, &lists, "after idle reset");
> -		if (!ok) {
> -			ret = -ESRCH;
> -			goto err;
> +			ok = verify_wa_lists(gt, &lists, "after idle reset");
> +			if (!ok) {
> +				ret = -ESRCH;
> +				goto err;
> +			}
>  		}
>  
>  		ret = igt_spinner_init(&spin, engine->gt);
> @@ -1271,25 +1317,41 @@ live_engine_reset_workarounds(void *arg)
>  			goto err;
>  		}
>  
> -		ret = intel_engine_reset(engine, "live_workarounds:active");
> -		if (ret) {
> -			pr_err("%s: Reset failed on an active spinner\n",
> -			       engine->name);
> -			igt_spinner_fini(&spin);
> -			goto err;
> +		/* Ensure the spinner hasn't aborted */
> +		if (i915_request_completed(rq)) {
> +			ret = -ETIMEDOUT;
> +			goto skip;
> +		}
> +
> +		if (!using_guc) {
> +			ret = intel_engine_reset(engine, "live_workarounds:active");
> +			if (ret) {
> +				pr_err("%s: Reset failed on an active spinner\n",
> +				       engine->name);
> +				igt_spinner_fini(&spin);
> +				goto err;
> +			}
>  		}
>  
> +		/* Ensure the reset happens and kills the engine */
> +		if (ret == 0)
> +			ret = intel_selftest_wait_for_rq(rq);
> +
> +skip:
>  		igt_spinner_end(&spin);
>  		igt_spinner_fini(&spin);
>  
>  		ok = verify_wa_lists(gt, &lists, "after busy reset");
> -		if (!ok) {
> +		if (!ok)
>  			ret = -ESRCH;
> -			goto err;
> -		}
>  
>  err:
>  		intel_context_put(ce);
> +
> +restore:
> +		ret2 = intel_selftest_restore_policy(engine, &saved);
> +		if (ret == 0)
> +			ret = ret2;
>  		if (ret)
>  			break;
>  	}
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> new file mode 100644
> index 000000000000..91ecd8a1bd21
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> @@ -0,0 +1,76 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +//#include "gt/intel_engine_user.h"
> +#include "gt/intel_gt.h"
> +#include "i915_drv.h"
> +#include "i915_selftest.h"
> +
> +#include "selftests/intel_scheduler_helpers.h"
> +
> +#define REDUCED_TIMESLICE	5
> +#define REDUCED_PREEMPT		10
> +#define WAIT_FOR_RESET_TIME	1000
> +
> +int intel_selftest_modify_policy(struct intel_engine_cs *engine,
> +				 struct intel_selftest_saved_policy *saved)
> +
> +{
> +	int err;
> +
> +	saved->reset = engine->i915->params.reset;
> +	saved->flags = engine->flags;
> +	saved->timeslice = engine->props.timeslice_duration_ms;
> +	saved->preempt_timeout = engine->props.preempt_timeout_ms;
> +
> +	/*
> +	 * Enable force pre-emption on time slice expiration
> +	 * together with engine reset on pre-emption timeout.
> +	 * This is required to make the GuC notice and reset
> +	 * the single hanging context.
> +	 * Also, reduce the preemption timeout to something
> +	 * small to speed the test up.
> +	 */
> +	engine->i915->params.reset = 2;
> +	engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
> +	engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
> +	engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
> +
> +	if (!intel_engine_uses_guc(engine))
> +		return 0;
> +
> +	err = intel_guc_global_policies_update(&engine->gt->uc.guc);
> +	if (err)
> +		intel_selftest_restore_policy(engine, saved);
> +
> +	return err;
> +}
> +
> +int intel_selftest_restore_policy(struct intel_engine_cs *engine,
> +				  struct intel_selftest_saved_policy *saved)
> +{
> +	/* Restore the original policies */
> +	engine->i915->params.reset = saved->reset;
> +	engine->flags = saved->flags;
> +	engine->props.timeslice_duration_ms = saved->timeslice;
> +	engine->props.preempt_timeout_ms = saved->preempt_timeout;
> +
> +	if (!intel_engine_uses_guc(engine))
> +		return 0;
> +
> +	return intel_guc_global_policies_update(&engine->gt->uc.guc);
> +}
> +
> +int intel_selftest_wait_for_rq(struct i915_request *rq)
> +{
> +	long ret;
> +
> +	ret = i915_request_wait(rq, 0, WAIT_FOR_RESET_TIME);
> +	if (ret < 0)
> +		return ret;
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> new file mode 100644
> index 000000000000..f30e96f0ba95
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2014-2019 Intel Corporation
> + */
> +
> +#ifndef _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
> +#define _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
> +
> +#include <linux/types.h>
> +
> +struct i915_request;
> +struct intel_engine_cs;
> +
> +struct intel_selftest_saved_policy
> +{
> +	u32 flags;
> +	u32 reset;
> +	u64 timeslice;
> +	u64 preempt_timeout;
> +};
> +
> +int intel_selftest_modify_policy(struct intel_engine_cs *engine,
> +				 struct intel_selftest_saved_policy *saved);
> +int intel_selftest_restore_policy(struct intel_engine_cs *engine,
> +				  struct intel_selftest_saved_policy *saved);
> +int intel_selftest_wait_for_rq( struct i915_request *rq);
> +
> +#endif
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 45/51] drm/i915/selftest: Fix workarounds selftest for GuC submission
@ 2021-07-20 17:14     ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 17:14 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:18PM -0700, Matthew Brost wrote:
> From: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
> 
> When GuC submission is enabled, the GuC controls engine resets. Rather
> than explicitly triggering a reset, the driver must submit a hanging
> context to GuC and wait for the reset to occur.
> 
> Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/i915/Makefile                 |   1 +
>  .../gpu/drm/i915/gt/selftest_workarounds.c    | 130 +++++++++++++-----
>  .../i915/selftests/intel_scheduler_helpers.c  |  76 ++++++++++
>  .../i915/selftests/intel_scheduler_helpers.h  |  28 ++++
>  4 files changed, 201 insertions(+), 34 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
>  create mode 100644 drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 10b3bb6207ba..ab7679957623 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -280,6 +280,7 @@ i915-$(CONFIG_DRM_I915_CAPTURE_ERROR) += i915_gpu_error.o
>  i915-$(CONFIG_DRM_I915_SELFTEST) += \
>  	gem/selftests/i915_gem_client_blt.o \
>  	gem/selftests/igt_gem_utils.o \
> +	selftests/intel_scheduler_helpers.o \
>  	selftests/i915_random.o \
>  	selftests/i915_selftest.o \
>  	selftests/igt_atomic.o \
> diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> index 7ebc4edb8ecf..7727bc531ea9 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c
> @@ -12,6 +12,7 @@
>  #include "selftests/igt_flush_test.h"
>  #include "selftests/igt_reset.h"
>  #include "selftests/igt_spinner.h"
> +#include "selftests/intel_scheduler_helpers.h"
>  #include "selftests/mock_drm.h"
>  
>  #include "gem/selftests/igt_gem_utils.h"
> @@ -261,28 +262,34 @@ static int do_engine_reset(struct intel_engine_cs *engine)
>  	return intel_engine_reset(engine, "live_workarounds");
>  }
>  
> +static int do_guc_reset(struct intel_engine_cs *engine)
> +{
> +	/* Currently a no-op as the reset is handled by GuC */
> +	return 0;
> +}
> +
>  static int
>  switch_to_scratch_context(struct intel_engine_cs *engine,
> -			  struct igt_spinner *spin)
> +			  struct igt_spinner *spin,
> +			  struct i915_request **rq)
>  {
>  	struct intel_context *ce;
> -	struct i915_request *rq;
>  	int err = 0;
>  
>  	ce = intel_context_create(engine);
>  	if (IS_ERR(ce))
>  		return PTR_ERR(ce);
>  
> -	rq = igt_spinner_create_request(spin, ce, MI_NOOP);
> +	*rq = igt_spinner_create_request(spin, ce, MI_NOOP);
>  	intel_context_put(ce);
>  
> -	if (IS_ERR(rq)) {
> +	if (IS_ERR(*rq)) {
>  		spin = NULL;
> -		err = PTR_ERR(rq);
> +		err = PTR_ERR(*rq);
>  		goto err;
>  	}
>  
> -	err = request_add_spin(rq, spin);
> +	err = request_add_spin(*rq, spin);
>  err:
>  	if (err && spin)
>  		igt_spinner_end(spin);
> @@ -296,6 +303,7 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
>  {
>  	struct intel_context *ce, *tmp;
>  	struct igt_spinner spin;
> +	struct i915_request *rq;
>  	intel_wakeref_t wakeref;
>  	int err;
>  
> @@ -316,13 +324,24 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
>  		goto out_spin;
>  	}
>  
> -	err = switch_to_scratch_context(engine, &spin);
> +	err = switch_to_scratch_context(engine, &spin, &rq);
>  	if (err)
>  		goto out_spin;
>  
> +	/* Ensure the spinner hasn't aborted */
> +	if (i915_request_completed(rq)) {
> +		pr_err("%s spinner failed to start\n", name);
> +		err = -ETIMEDOUT;
> +		goto out_spin;
> +	}
> +
>  	with_intel_runtime_pm(engine->uncore->rpm, wakeref)
>  		err = reset(engine);
>  
> +	/* Ensure the reset happens and kills the engine */
> +	if (err == 0)
> +		err = intel_selftest_wait_for_rq(rq);
> +
>  	igt_spinner_end(&spin);
>  
>  	if (err) {
> @@ -787,9 +806,26 @@ static int live_reset_whitelist(void *arg)
>  			continue;
>  
>  		if (intel_has_reset_engine(gt)) {
> -			err = check_whitelist_across_reset(engine,
> -							   do_engine_reset,
> -							   "engine");
> +			if (intel_engine_uses_guc(engine)) {
> +				struct intel_selftest_saved_policy saved;
> +				int err2;
> +
> +				err = intel_selftest_modify_policy(engine, &saved);
> +				if(err)
> +					goto out;
> +
> +				err = check_whitelist_across_reset(engine,
> +								   do_guc_reset,
> +								   "guc");
> +
> +				err2 = intel_selftest_restore_policy(engine, &saved);
> +				if (err == 0)
> +					err = err2;
> +			} else
> +				err = check_whitelist_across_reset(engine,
> +								   do_engine_reset,
> +								   "engine");
> +
>  			if (err)
>  				goto out;
>  		}
> @@ -1226,31 +1262,41 @@ live_engine_reset_workarounds(void *arg)
>  	reference_lists_init(gt, &lists);
>  
>  	for_each_engine(engine, gt, id) {
> +		struct intel_selftest_saved_policy saved;
> +		bool using_guc = intel_engine_uses_guc(engine);
>  		bool ok;
> +		int ret2;
>  
>  		pr_info("Verifying after %s reset...\n", engine->name);
> +		ret = intel_selftest_modify_policy(engine, &saved);
> +		if (ret)
> +			break;
> +
> +
>  		ce = intel_context_create(engine);
>  		if (IS_ERR(ce)) {
>  			ret = PTR_ERR(ce);
> -			break;
> +			goto restore;
>  		}
>  
> -		ok = verify_wa_lists(gt, &lists, "before reset");
> -		if (!ok) {
> -			ret = -ESRCH;
> -			goto err;
> -		}
> +		if (!using_guc) {
> +			ok = verify_wa_lists(gt, &lists, "before reset");
> +			if (!ok) {
> +				ret = -ESRCH;
> +				goto err;
> +			}
>  
> -		ret = intel_engine_reset(engine, "live_workarounds:idle");
> -		if (ret) {
> -			pr_err("%s: Reset failed while idle\n", engine->name);
> -			goto err;
> -		}
> +			ret = intel_engine_reset(engine, "live_workarounds:idle");
> +			if (ret) {
> +				pr_err("%s: Reset failed while idle\n", engine->name);
> +				goto err;
> +			}
>  
> -		ok = verify_wa_lists(gt, &lists, "after idle reset");
> -		if (!ok) {
> -			ret = -ESRCH;
> -			goto err;
> +			ok = verify_wa_lists(gt, &lists, "after idle reset");
> +			if (!ok) {
> +				ret = -ESRCH;
> +				goto err;
> +			}
>  		}
>  
>  		ret = igt_spinner_init(&spin, engine->gt);
> @@ -1271,25 +1317,41 @@ live_engine_reset_workarounds(void *arg)
>  			goto err;
>  		}
>  
> -		ret = intel_engine_reset(engine, "live_workarounds:active");
> -		if (ret) {
> -			pr_err("%s: Reset failed on an active spinner\n",
> -			       engine->name);
> -			igt_spinner_fini(&spin);
> -			goto err;
> +		/* Ensure the spinner hasn't aborted */
> +		if (i915_request_completed(rq)) {
> +			ret = -ETIMEDOUT;
> +			goto skip;
> +		}
> +
> +		if (!using_guc) {
> +			ret = intel_engine_reset(engine, "live_workarounds:active");
> +			if (ret) {
> +				pr_err("%s: Reset failed on an active spinner\n",
> +				       engine->name);
> +				igt_spinner_fini(&spin);
> +				goto err;
> +			}
>  		}
>  
> +		/* Ensure the reset happens and kills the engine */
> +		if (ret == 0)
> +			ret = intel_selftest_wait_for_rq(rq);
> +
> +skip:
>  		igt_spinner_end(&spin);
>  		igt_spinner_fini(&spin);
>  
>  		ok = verify_wa_lists(gt, &lists, "after busy reset");
> -		if (!ok) {
> +		if (!ok)
>  			ret = -ESRCH;
> -			goto err;
> -		}
>  
>  err:
>  		intel_context_put(ce);
> +
> +restore:
> +		ret2 = intel_selftest_restore_policy(engine, &saved);
> +		if (ret == 0)
> +			ret = ret2;
>  		if (ret)
>  			break;
>  	}
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> new file mode 100644
> index 000000000000..91ecd8a1bd21
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> @@ -0,0 +1,76 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +//#include "gt/intel_engine_user.h"
> +#include "gt/intel_gt.h"
> +#include "i915_drv.h"
> +#include "i915_selftest.h"
> +
> +#include "selftests/intel_scheduler_helpers.h"
> +
> +#define REDUCED_TIMESLICE	5
> +#define REDUCED_PREEMPT		10
> +#define WAIT_FOR_RESET_TIME	1000
> +
> +int intel_selftest_modify_policy(struct intel_engine_cs *engine,
> +				 struct intel_selftest_saved_policy *saved)
> +
> +{
> +	int err;
> +
> +	saved->reset = engine->i915->params.reset;
> +	saved->flags = engine->flags;
> +	saved->timeslice = engine->props.timeslice_duration_ms;
> +	saved->preempt_timeout = engine->props.preempt_timeout_ms;
> +
> +	/*
> +	 * Enable force pre-emption on time slice expiration
> +	 * together with engine reset on pre-emption timeout.
> +	 * This is required to make the GuC notice and reset
> +	 * the single hanging context.
> +	 * Also, reduce the preemption timeout to something
> +	 * small to speed the test up.
> +	 */
> +	engine->i915->params.reset = 2;
> +	engine->flags |= I915_ENGINE_WANT_FORCED_PREEMPTION;
> +	engine->props.timeslice_duration_ms = REDUCED_TIMESLICE;
> +	engine->props.preempt_timeout_ms = REDUCED_PREEMPT;
> +
> +	if (!intel_engine_uses_guc(engine))
> +		return 0;
> +
> +	err = intel_guc_global_policies_update(&engine->gt->uc.guc);
> +	if (err)
> +		intel_selftest_restore_policy(engine, saved);
> +
> +	return err;
> +}
> +
> +int intel_selftest_restore_policy(struct intel_engine_cs *engine,
> +				  struct intel_selftest_saved_policy *saved)
> +{
> +	/* Restore the original policies */
> +	engine->i915->params.reset = saved->reset;
> +	engine->flags = saved->flags;
> +	engine->props.timeslice_duration_ms = saved->timeslice;
> +	engine->props.preempt_timeout_ms = saved->preempt_timeout;
> +
> +	if (!intel_engine_uses_guc(engine))
> +		return 0;
> +
> +	return intel_guc_global_policies_update(&engine->gt->uc.guc);
> +}
> +
> +int intel_selftest_wait_for_rq(struct i915_request *rq)
> +{
> +	long ret;
> +
> +	ret = i915_request_wait(rq, 0, WAIT_FOR_RESET_TIME);
> +	if (ret < 0)
> +		return ret;
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> new file mode 100644
> index 000000000000..f30e96f0ba95
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2014-2019 Intel Corporation
> + */
> +
> +#ifndef _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
> +#define _INTEL_SELFTEST_SCHEDULER_HELPERS_H_
> +
> +#include <linux/types.h>
> +
> +struct i915_request;
> +struct intel_engine_cs;
> +
> +struct intel_selftest_saved_policy
> +{
> +	u32 flags;
> +	u32 reset;
> +	u64 timeslice;
> +	u64 preempt_timeout;
> +};
> +
> +int intel_selftest_modify_policy(struct intel_engine_cs *engine,
> +				 struct intel_selftest_saved_policy *saved);
> +int intel_selftest_restore_policy(struct intel_engine_cs *engine,
> +				  struct intel_selftest_saved_policy *saved);
> +int intel_selftest_wait_for_rq( struct i915_request *rq);
> +
> +#endif
> -- 
> 2.28.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 19:45     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 19:45 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> With GuC virtual engines the physical engine which a request executes
> and completes on isn't known to the i915. Therefore we can't attach a
> request to a physical engines breadcrumbs. To work around this we create
> a single breadcrumbs per engine class when using GuC submission and
> direct all physical engine interrupts to this breadcrumbs.
>
> v2:
>   (John H)
>    - Rework header file structure so intel_engine_mask_t can be in
>      intel_engine_types.h
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> CC: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 ++++-
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
>   drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
>   .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
>   drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
>   9 files changed, 133 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 38cc42783dfb..2007dc6f6b99 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -15,28 +15,14 @@
>   #include "intel_gt_pm.h"
>   #include "intel_gt_requests.h"
>   
> -static bool irq_enable(struct intel_engine_cs *engine)
> +static bool irq_enable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_enable)
> -		return false;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_enable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> -
> -	return true;
> +	return intel_engine_irq_enable(b->irq_engine);
>   }
>   
> -static void irq_disable(struct intel_engine_cs *engine)
> +static void irq_disable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_disable)
> -		return;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_disable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> +	intel_engine_irq_disable(b->irq_engine);
>   }
>   
>   static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   	WRITE_ONCE(b->irq_armed, true);
>   
>   	/* Requests may have completed before we could enable the interrupt. */
> -	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
> +	if (!b->irq_enabled++ && b->irq_enable(b))
>   		irq_work_queue(&b->irq_work);
>   }
>   
> @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
>   {
>   	GEM_BUG_ON(!b->irq_enabled);
>   	if (!--b->irq_enabled)
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	WRITE_ONCE(b->irq_armed, false);
>   	intel_gt_pm_put_async(b->irq_engine->gt);
> @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	if (!b)
>   		return NULL;
>   
> -	b->irq_engine = irq_engine;
> +	kref_init(&b->ref);
>   
>   	spin_lock_init(&b->signalers_lock);
>   	INIT_LIST_HEAD(&b->signalers);
> @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	spin_lock_init(&b->irq_lock);
>   	init_irq_work(&b->irq_work, signal_irq_work);
>   
> +	b->irq_engine = irq_engine;
> +	b->irq_enable = irq_enable;
> +	b->irq_disable = irq_disable;
> +
>   	return b;
>   }
>   
> @@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
>   	spin_lock_irqsave(&b->irq_lock, flags);
>   
>   	if (b->irq_enabled)
> -		irq_enable(b->irq_engine);
> +		b->irq_enable(b);
>   	else
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	spin_unlock_irqrestore(&b->irq_lock, flags);
>   }
> @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
>   	}
>   }
>   
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
> +void intel_breadcrumbs_free(struct kref *kref)
>   {
> +	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
> +
>   	irq_work_sync(&b->irq_work);
>   	GEM_BUG_ON(!list_empty(&b->signalers));
>   	GEM_BUG_ON(b->irq_armed);
> +
>   	kfree(b);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> index 3ce5ce270b04..be0d4f379a85 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> @@ -9,7 +9,7 @@
>   #include <linux/atomic.h>
>   #include <linux/irq_work.h>
>   
> -#include "intel_engine_types.h"
> +#include "intel_breadcrumbs_types.h"
>   
>   struct drm_printer;
>   struct i915_request;
> @@ -17,7 +17,7 @@ struct intel_breadcrumbs;
>   
>   struct intel_breadcrumbs *
>   intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
> +void intel_breadcrumbs_free(struct kref *kref);
>   
>   void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
>   void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
> @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
>   void intel_context_remove_breadcrumbs(struct intel_context *ce,
>   				      struct intel_breadcrumbs *b);
>   
> +static inline struct intel_breadcrumbs *
> +intel_breadcrumbs_get(struct intel_breadcrumbs *b)
> +{
> +	kref_get(&b->ref);
> +	return b;
> +}
> +
> +static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
> +{
> +	kref_put(&b->ref, intel_breadcrumbs_free);
> +}
> +
>   #endif /* __INTEL_BREADCRUMBS__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> index 3a084ce8ff5e..72dfd3748c4c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> @@ -7,10 +7,13 @@
>   #define __INTEL_BREADCRUMBS_TYPES__
>   
>   #include <linux/irq_work.h>
> +#include <linux/kref.h>
>   #include <linux/list.h>
>   #include <linux/spinlock.h>
>   #include <linux/types.h>
>   
> +#include "intel_engine_types.h"
> +
>   /*
>    * Rather than have every client wait upon all user interrupts,
>    * with the herd waking after every interrupt and each doing the
> @@ -29,6 +32,7 @@
>    * the overhead of waking that client is much preferred.
>    */
>   struct intel_breadcrumbs {
> +	struct kref ref;
>   	atomic_t active;
>   
>   	spinlock_t signalers_lock; /* protects the list of signalers */
> @@ -42,7 +46,10 @@ struct intel_breadcrumbs {
>   	bool irq_armed;
>   
>   	/* Not all breadcrumbs are attached to physical HW */
> +	intel_engine_mask_t	engine_mask;
>   	struct intel_engine_cs *irq_engine;
> +	bool	(*irq_enable)(struct intel_breadcrumbs *b);
> +	void	(*irq_disable)(struct intel_breadcrumbs *b);
>   };
>   
>   #endif /* __INTEL_BREADCRUMBS_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 9fec0aca5f4b..edbde6171bca 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
>   
>   void intel_engine_init_execlists(struct intel_engine_cs *engine);
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine);
> +void intel_engine_irq_disable(struct intel_engine_cs *engine);
> +
>   static inline void __intel_engine_reset(struct intel_engine_cs *engine,
>   					bool stalled)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index b7292d5cb7da..d95d666407f5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
>   err_cmd_parser:
>   	i915_sched_engine_put(engine->sched_engine);
>   err_sched_engine:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_status:
>   	cleanup_status_page(engine);
>   	return err;
> @@ -948,7 +948,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
>   
>   	i915_sched_engine_put(engine->sched_engine);
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   
>   	intel_engine_fini_retire(engine);
>   	intel_engine_cleanup_cmd_parser(engine);
> @@ -1265,6 +1265,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
>   	return true;
>   }
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_enable)
> +		return false;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_enable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +
> +	return true;
> +}
> +
> +void intel_engine_irq_disable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_disable)
> +		return;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_disable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +}
> +
>   void intel_engines_reset_default_submission(struct intel_gt *gt)
>   {
>   	struct intel_engine_cs *engine;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 8ad304b2f2e4..03a81e8d87f4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -21,7 +21,6 @@
>   #include "i915_pmu.h"
>   #include "i915_priolist_types.h"
>   #include "i915_selftest.h"
> -#include "intel_breadcrumbs_types.h"
>   #include "intel_sseu.h"
>   #include "intel_timeline_types.h"
>   #include "intel_uncore.h"
> @@ -63,6 +62,7 @@ struct i915_sched_engine;
>   struct intel_gt;
>   struct intel_ring;
>   struct intel_uncore;
> +struct intel_breadcrumbs;
>   
>   typedef u8 intel_engine_mask_t;
>   #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 920707e22eb0..abe48421fd7a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3407,7 +3407,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>   	intel_context_fini(&ve->context);
>   
>   	if (ve->base.breadcrumbs)
> -		intel_breadcrumbs_free(ve->base.breadcrumbs);
> +		intel_breadcrumbs_put(ve->base.breadcrumbs);
>   	if (ve->base.sched_engine)
>   		i915_sched_engine_put(ve->base.sched_engine);
>   	intel_engine_free_request_pool(&ve->base);
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 9203c766db80..fc5a65ab1937 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(timer_pending(&mock->hw_delay));
>   
>   	i915_sched_engine_put(engine->sched_engine);
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   
>   	intel_context_unpin(engine->kernel_context);
>   	intel_context_put(engine->kernel_context);
> @@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
>   	return 0;
>   
>   err_breadcrumbs:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_schedule:
>   	i915_sched_engine_put(engine->sched_engine);
>   	return -ENOMEM;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 372e0dc7617a..9f28899ff17f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1076,6 +1076,9 @@ static void __guc_context_destroy(struct intel_context *ce)
>   		struct guc_virtual_engine *ve =
>   			container_of(ce, typeof(*ve), context);
>   
> +		if (ve->base.breadcrumbs)
> +			intel_breadcrumbs_put(ve->base.breadcrumbs);
> +
>   		kfree(ve);
>   	} else {
>   		intel_context_free(ce);
> @@ -1366,6 +1369,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>   	.get_sibling = guc_virtual_get_sibling,
>   };
>   
> +static bool
> +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +	bool result = false;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		result |= intel_engine_irq_enable(sibling);
> +
> +	return result;
> +}
> +
> +static void
> +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		intel_engine_irq_disable(sibling);
> +}
> +
> +static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	int i;
> +
> +       /*
> +        * In GuC submission mode we do not know which physical engine a request
> +        * will be scheduled on, this creates a problem because the breadcrumb
> +        * interrupt is per physical engine. To work around this we attach
> +        * requests and direct all breadcrumb interrupts to the first instance
> +        * of an engine per class. In addition all breadcrumb interrupts are
> +	* enabled / disabled across an engine class in unison.
> +        */
> +	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
> +		struct intel_engine_cs *sibling =
> +			engine->gt->engine_class[engine->class][i];
> +
> +		if (sibling) {
> +			if (engine->breadcrumbs != sibling->breadcrumbs) {
> +				intel_breadcrumbs_put(engine->breadcrumbs);
> +				engine->breadcrumbs =
> +					intel_breadcrumbs_get(sibling->breadcrumbs);
> +			}
> +			break;
> +		}
> +	}
> +
> +	if (engine->breadcrumbs) {
> +		engine->breadcrumbs->engine_mask |= engine->mask;
> +		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
> +		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
> +	}
> +}
> +
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
>   {
>   	struct intel_timeline *tl;
> @@ -1589,6 +1648,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   	guc_default_vfuncs(engine);
>   	guc_default_irqs(engine);
> +	guc_init_breadcrumbs(engine);
>   
>   	if (engine->class == RENDER_CLASS)
>   		rcs_submission_override(engine);
> @@ -1831,11 +1891,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.saturated = ALL_ENGINES;
> -	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> -	if (!ve->base.breadcrumbs) {
> -		kfree(ve);
> -		return ERR_PTR(-ENOMEM);
> -	}
>   
>   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>   
> @@ -1884,6 +1939,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   				sibling->emit_fini_breadcrumb;
>   			ve->base.emit_fini_breadcrumb_dw =
>   				sibling->emit_fini_breadcrumb_dw;
> +			ve->base.breadcrumbs =
> +				intel_breadcrumbs_get(sibling->breadcrumbs);
>   
>   			ve->base.flags |= sibling->flags;
>   


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
@ 2021-07-20 19:45     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 19:45 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> With GuC virtual engines the physical engine which a request executes
> and completes on isn't known to the i915. Therefore we can't attach a
> request to a physical engines breadcrumbs. To work around this we create
> a single breadcrumbs per engine class when using GuC submission and
> direct all physical engine interrupts to this breadcrumbs.
>
> v2:
>   (John H)
>    - Rework header file structure so intel_engine_mask_t can be in
>      intel_engine_types.h
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> CC: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 ++++-
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
>   drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
>   .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
>   drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
>   9 files changed, 133 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 38cc42783dfb..2007dc6f6b99 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -15,28 +15,14 @@
>   #include "intel_gt_pm.h"
>   #include "intel_gt_requests.h"
>   
> -static bool irq_enable(struct intel_engine_cs *engine)
> +static bool irq_enable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_enable)
> -		return false;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_enable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> -
> -	return true;
> +	return intel_engine_irq_enable(b->irq_engine);
>   }
>   
> -static void irq_disable(struct intel_engine_cs *engine)
> +static void irq_disable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_disable)
> -		return;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_disable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> +	intel_engine_irq_disable(b->irq_engine);
>   }
>   
>   static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   	WRITE_ONCE(b->irq_armed, true);
>   
>   	/* Requests may have completed before we could enable the interrupt. */
> -	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
> +	if (!b->irq_enabled++ && b->irq_enable(b))
>   		irq_work_queue(&b->irq_work);
>   }
>   
> @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
>   {
>   	GEM_BUG_ON(!b->irq_enabled);
>   	if (!--b->irq_enabled)
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	WRITE_ONCE(b->irq_armed, false);
>   	intel_gt_pm_put_async(b->irq_engine->gt);
> @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	if (!b)
>   		return NULL;
>   
> -	b->irq_engine = irq_engine;
> +	kref_init(&b->ref);
>   
>   	spin_lock_init(&b->signalers_lock);
>   	INIT_LIST_HEAD(&b->signalers);
> @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	spin_lock_init(&b->irq_lock);
>   	init_irq_work(&b->irq_work, signal_irq_work);
>   
> +	b->irq_engine = irq_engine;
> +	b->irq_enable = irq_enable;
> +	b->irq_disable = irq_disable;
> +
>   	return b;
>   }
>   
> @@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
>   	spin_lock_irqsave(&b->irq_lock, flags);
>   
>   	if (b->irq_enabled)
> -		irq_enable(b->irq_engine);
> +		b->irq_enable(b);
>   	else
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	spin_unlock_irqrestore(&b->irq_lock, flags);
>   }
> @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
>   	}
>   }
>   
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
> +void intel_breadcrumbs_free(struct kref *kref)
>   {
> +	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
> +
>   	irq_work_sync(&b->irq_work);
>   	GEM_BUG_ON(!list_empty(&b->signalers));
>   	GEM_BUG_ON(b->irq_armed);
> +
>   	kfree(b);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> index 3ce5ce270b04..be0d4f379a85 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> @@ -9,7 +9,7 @@
>   #include <linux/atomic.h>
>   #include <linux/irq_work.h>
>   
> -#include "intel_engine_types.h"
> +#include "intel_breadcrumbs_types.h"
>   
>   struct drm_printer;
>   struct i915_request;
> @@ -17,7 +17,7 @@ struct intel_breadcrumbs;
>   
>   struct intel_breadcrumbs *
>   intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
> +void intel_breadcrumbs_free(struct kref *kref);
>   
>   void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
>   void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
> @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
>   void intel_context_remove_breadcrumbs(struct intel_context *ce,
>   				      struct intel_breadcrumbs *b);
>   
> +static inline struct intel_breadcrumbs *
> +intel_breadcrumbs_get(struct intel_breadcrumbs *b)
> +{
> +	kref_get(&b->ref);
> +	return b;
> +}
> +
> +static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
> +{
> +	kref_put(&b->ref, intel_breadcrumbs_free);
> +}
> +
>   #endif /* __INTEL_BREADCRUMBS__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> index 3a084ce8ff5e..72dfd3748c4c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> @@ -7,10 +7,13 @@
>   #define __INTEL_BREADCRUMBS_TYPES__
>   
>   #include <linux/irq_work.h>
> +#include <linux/kref.h>
>   #include <linux/list.h>
>   #include <linux/spinlock.h>
>   #include <linux/types.h>
>   
> +#include "intel_engine_types.h"
> +
>   /*
>    * Rather than have every client wait upon all user interrupts,
>    * with the herd waking after every interrupt and each doing the
> @@ -29,6 +32,7 @@
>    * the overhead of waking that client is much preferred.
>    */
>   struct intel_breadcrumbs {
> +	struct kref ref;
>   	atomic_t active;
>   
>   	spinlock_t signalers_lock; /* protects the list of signalers */
> @@ -42,7 +46,10 @@ struct intel_breadcrumbs {
>   	bool irq_armed;
>   
>   	/* Not all breadcrumbs are attached to physical HW */
> +	intel_engine_mask_t	engine_mask;
>   	struct intel_engine_cs *irq_engine;
> +	bool	(*irq_enable)(struct intel_breadcrumbs *b);
> +	void	(*irq_disable)(struct intel_breadcrumbs *b);
>   };
>   
>   #endif /* __INTEL_BREADCRUMBS_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 9fec0aca5f4b..edbde6171bca 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
>   
>   void intel_engine_init_execlists(struct intel_engine_cs *engine);
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine);
> +void intel_engine_irq_disable(struct intel_engine_cs *engine);
> +
>   static inline void __intel_engine_reset(struct intel_engine_cs *engine,
>   					bool stalled)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index b7292d5cb7da..d95d666407f5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
>   err_cmd_parser:
>   	i915_sched_engine_put(engine->sched_engine);
>   err_sched_engine:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_status:
>   	cleanup_status_page(engine);
>   	return err;
> @@ -948,7 +948,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
>   
>   	i915_sched_engine_put(engine->sched_engine);
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   
>   	intel_engine_fini_retire(engine);
>   	intel_engine_cleanup_cmd_parser(engine);
> @@ -1265,6 +1265,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
>   	return true;
>   }
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_enable)
> +		return false;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_enable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +
> +	return true;
> +}
> +
> +void intel_engine_irq_disable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_disable)
> +		return;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_disable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +}
> +
>   void intel_engines_reset_default_submission(struct intel_gt *gt)
>   {
>   	struct intel_engine_cs *engine;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 8ad304b2f2e4..03a81e8d87f4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -21,7 +21,6 @@
>   #include "i915_pmu.h"
>   #include "i915_priolist_types.h"
>   #include "i915_selftest.h"
> -#include "intel_breadcrumbs_types.h"
>   #include "intel_sseu.h"
>   #include "intel_timeline_types.h"
>   #include "intel_uncore.h"
> @@ -63,6 +62,7 @@ struct i915_sched_engine;
>   struct intel_gt;
>   struct intel_ring;
>   struct intel_uncore;
> +struct intel_breadcrumbs;
>   
>   typedef u8 intel_engine_mask_t;
>   #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 920707e22eb0..abe48421fd7a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3407,7 +3407,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>   	intel_context_fini(&ve->context);
>   
>   	if (ve->base.breadcrumbs)
> -		intel_breadcrumbs_free(ve->base.breadcrumbs);
> +		intel_breadcrumbs_put(ve->base.breadcrumbs);
>   	if (ve->base.sched_engine)
>   		i915_sched_engine_put(ve->base.sched_engine);
>   	intel_engine_free_request_pool(&ve->base);
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 9203c766db80..fc5a65ab1937 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(timer_pending(&mock->hw_delay));
>   
>   	i915_sched_engine_put(engine->sched_engine);
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   
>   	intel_context_unpin(engine->kernel_context);
>   	intel_context_put(engine->kernel_context);
> @@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
>   	return 0;
>   
>   err_breadcrumbs:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_schedule:
>   	i915_sched_engine_put(engine->sched_engine);
>   	return -ENOMEM;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 372e0dc7617a..9f28899ff17f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1076,6 +1076,9 @@ static void __guc_context_destroy(struct intel_context *ce)
>   		struct guc_virtual_engine *ve =
>   			container_of(ce, typeof(*ve), context);
>   
> +		if (ve->base.breadcrumbs)
> +			intel_breadcrumbs_put(ve->base.breadcrumbs);
> +
>   		kfree(ve);
>   	} else {
>   		intel_context_free(ce);
> @@ -1366,6 +1369,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>   	.get_sibling = guc_virtual_get_sibling,
>   };
>   
> +static bool
> +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +	bool result = false;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		result |= intel_engine_irq_enable(sibling);
> +
> +	return result;
> +}
> +
> +static void
> +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		intel_engine_irq_disable(sibling);
> +}
> +
> +static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	int i;
> +
> +       /*
> +        * In GuC submission mode we do not know which physical engine a request
> +        * will be scheduled on, this creates a problem because the breadcrumb
> +        * interrupt is per physical engine. To work around this we attach
> +        * requests and direct all breadcrumb interrupts to the first instance
> +        * of an engine per class. In addition all breadcrumb interrupts are
> +	* enabled / disabled across an engine class in unison.
> +        */
> +	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
> +		struct intel_engine_cs *sibling =
> +			engine->gt->engine_class[engine->class][i];
> +
> +		if (sibling) {
> +			if (engine->breadcrumbs != sibling->breadcrumbs) {
> +				intel_breadcrumbs_put(engine->breadcrumbs);
> +				engine->breadcrumbs =
> +					intel_breadcrumbs_get(sibling->breadcrumbs);
> +			}
> +			break;
> +		}
> +	}
> +
> +	if (engine->breadcrumbs) {
> +		engine->breadcrumbs->engine_mask |= engine->mask;
> +		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
> +		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
> +	}
> +}
> +
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
>   {
>   	struct intel_timeline *tl;
> @@ -1589,6 +1648,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   	guc_default_vfuncs(engine);
>   	guc_default_irqs(engine);
> +	guc_init_breadcrumbs(engine);
>   
>   	if (engine->class == RENDER_CLASS)
>   		rcs_submission_override(engine);
> @@ -1831,11 +1891,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.saturated = ALL_ENGINES;
> -	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> -	if (!ve->base.breadcrumbs) {
> -		kfree(ve);
> -		return ERR_PTR(-ENOMEM);
> -	}
>   
>   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>   
> @@ -1884,6 +1939,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   				sibling->emit_fini_breadcrumb;
>   			ve->base.emit_fini_breadcrumb_dw =
>   				sibling->emit_fini_breadcrumb_dw;
> +			ve->base.breadcrumbs =
> +				intel_breadcrumbs_get(sibling->breadcrumbs);
>   
>   			ve->base.flags |= sibling->flags;
>   

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
  2021-07-20  1:53       ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 19:49         ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 19:49 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On 7/19/2021 18:53, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 06:03:05PM -0700, John Harrison wrote:
>> On 7/16/2021 13:16, Matthew Brost wrote:
>>> When running the GuC the GPU can't be considered idle if the GuC still
>>> has contexts pinned. As such, a call has been added in
>>> intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
>>> the number of unpinned contexts to go to zero.
>>>
>>> v2: rtimeout -> remaining_timeout
>>> v3: Drop unnecessary includes, guc_submission_busy_loop ->
>>> guc_submission_send_busy_loop, drop negatie timeout trick, move a
>>> refactor of guc_context_unpin to earlier path (John H)
>>>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>>>    drivers/gpu/drm/i915/gt/intel_gt.c            | 19 +++++
>>>    drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
>>>    drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
>>>    drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +++++++++++++++++--
>>>    drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 ++
>>>    drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
>>>    .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
>>>    .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
>>>    13 files changed, 129 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> index a90f796e85c0..6fffd4d377c2 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> @@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>>>    		goto insert;
>>>    	/* Attempt to reap some mmap space from dead objects */
>>> -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
>>> +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
>>> +					       NULL);
>>>    	if (err)
>>>    		goto err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> index e714e21c0a4d..acfdd53b2678 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
>>>    	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
>>>    }
>>> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
>>> +{
>>> +	long remaining_timeout;
>>> +
>>> +	/* If the device is asleep, we have no requests outstanding */
>>> +	if (!intel_gt_pm_is_awake(gt))
>>> +		return 0;
>>> +
>>> +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
>>> +							   &remaining_timeout)) > 0) {
>>> +		cond_resched();
>>> +		if (signal_pending(current))
>>> +			return -EINTR;
>>> +	}
>>> +
>>> +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
>>> +							  remaining_timeout);
>>> +}
>>> +
>>>    int intel_gt_init(struct intel_gt *gt)
>>>    {
>>>    	int err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
>>> index e7aabe0cc5bf..74e771871a9b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>>> @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
>>>    void intel_gt_driver_late_release(struct intel_gt *gt);
>>> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
>>> +
>>>    void intel_gt_check_and_clear_faults(struct intel_gt *gt);
>>>    void intel_gt_clear_error_registers(struct intel_gt *gt,
>>>    				    intel_engine_mask_t engine_mask);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> index 647eca9d867a..edb881d75630 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> @@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
>>>    	GEM_BUG_ON(engine->retire);
>>>    }
>>> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>>> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>>> +				      long *remaining_timeout)
>>>    {
>>>    	struct intel_gt_timelines *timelines = &gt->timelines;
>>>    	struct intel_timeline *tl, *tn;
>>> @@ -195,22 +196,10 @@ out_active:	spin_lock(&timelines->lock);
>>>    	if (flush_submission(gt, timeout)) /* Wait, there's more! */
>>>    		active_count++;
>>> -	return active_count ? timeout : 0;
>>> -}
>>> -
>>> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
>>> -{
>>> -	/* If the device is asleep, we have no requests outstanding */
>>> -	if (!intel_gt_pm_is_awake(gt))
>>> -		return 0;
>>> -
>>> -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
>>> -		cond_resched();
>>> -		if (signal_pending(current))
>>> -			return -EINTR;
>>> -	}
>>> +	if (remaining_timeout)
>>> +		*remaining_timeout = timeout;
>>> -	return timeout;
>>> +	return active_count ? timeout : 0;
>>>    }
>>>    static void retire_work_handler(struct work_struct *work)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>> index fcc30a6e4fe9..83ff5280c06e 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>> You were saying the the include of stddef is needed here?
>>
> Yes, HDRTEST [1] complains otherwise.
>
> [1] https://patchwork.freedesktop.org/series/91840/#rev3
>
>>> @@ -10,10 +10,11 @@ struct intel_engine_cs;
>>>    struct intel_gt;
>>>    struct intel_timeline;
>>> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
>>> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>>> +				      long *remaining_timeout);
>>>    static inline void intel_gt_retire_requests(struct intel_gt *gt)
>>>    {
>>> -	intel_gt_retire_requests_timeout(gt, 0);
>>> +	intel_gt_retire_requests_timeout(gt, 0, NULL);
>>>    }
>>>    void intel_engine_init_retire(struct intel_engine_cs *engine);
>>> @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
>>>    			     struct intel_timeline *tl);
>>>    void intel_engine_fini_retire(struct intel_engine_cs *engine);
>>> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
>>> -
>>>    void intel_gt_init_requests(struct intel_gt *gt);
>>>    void intel_gt_park_requests(struct intel_gt *gt);
>>>    void intel_gt_unpark_requests(struct intel_gt *gt);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index 80b88bae5f24..3cc566565224 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -39,6 +39,8 @@ struct intel_guc {
>>>    	spinlock_t irq_lock;
>>>    	unsigned int msg_enabled_mask;
>>> +	atomic_t outstanding_submission_g2h;
>>> +
>>>    	struct {
>>>    		void (*reset)(struct intel_guc *guc);
>>>    		void (*enable)(struct intel_guc *guc);
>>> @@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>>>    	spin_unlock_irq(&guc->irq_lock);
>>>    }
>>> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
>>> +
>>>    int intel_guc_reset_engine(struct intel_guc *guc,
>>>    			   struct intel_engine_cs *engine);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index c33906ec478d..f1cbed6b9f0a 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>>    	INIT_LIST_HEAD(&ct->requests.incoming);
>>>    	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
>>>    	tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
>>> +	init_waitqueue_head(&ct->wq);
>>>    }
>>>    static inline const char *guc_ct_buffer_type_to_str(u32 type)
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index 785dfc5c6efb..4b30a562ae63 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -10,6 +10,7 @@
>>>    #include <linux/spinlock.h>
>>>    #include <linux/workqueue.h>
>>>    #include <linux/ktime.h>
>>> +#include <linux/wait.h>
>>>    #include "intel_guc_fwif.h"
>>> @@ -68,6 +69,9 @@ struct intel_guc_ct {
>>>    	struct tasklet_struct receive_tasklet;
>>> +	/** @wq: wait queue for g2h chanenl */
>>> +	wait_queue_head_t wq;
>>> +
>>>    	struct {
>>>    		u16 last_fence; /* last fence used to send request */
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index f7e34baa9506..088d11e2e497 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -254,6 +254,69 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>>>    	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>>>    }
>>> +static int guc_submission_send_busy_loop(struct intel_guc* guc,
>>> +					 const u32 *action,
>>> +					 u32 len,
>>> +					 u32 g2h_len_dw,
>>> +					 bool loop)
>>> +{
>>> +	int err;
>>> +
>>> +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
>>> +
>>> +	if (!err && g2h_len_dw)
>>> +		atomic_inc(&guc->outstanding_submission_g2h);
>>> +
>>> +	return err;
>>> +}
>>> +
>>> +static int guc_wait_for_pending_msg(struct intel_guc *guc,
>>> +				    atomic_t *wait_var,
>>> +				    bool interruptible,
>>> +				    long timeout)
>>> +{
>>> +	const int state = interruptible ?
>>> +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>>> +	DEFINE_WAIT(wait);
>>> +
>>> +	might_sleep();
>>> +	GEM_BUG_ON(timeout < 0);
>>> +
>>> +	if (!atomic_read(wait_var))
>>> +		return 0;
>>> +
>>> +	if (!timeout)
>>> +		return -ETIME;
>>> +
>>> +	for (;;) {
>>> +		prepare_to_wait(&guc->ct.wq, &wait, state);
>>> +
>>> +		if (!atomic_read(wait_var))
>>> +			break;
>>> +
>>> +		if (signal_pending_state(state, current)) {
>>> +			timeout = -EINTR;
>>> +			break;
>>> +		}
>>> +
>>> +		if (!timeout) {
>>> +			timeout = -ETIME;
>>> +			break;
>>> +		}
>>> +
>>> +		timeout = io_schedule_timeout(timeout);
>>> +	}
>>> +	finish_wait(&guc->ct.wq, &wait);
>>> +
>>> +	return (timeout < 0) ? timeout : 0;
>>> +}
>>> +
>>> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
>>> +{
>>> +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
>>> +					true, timeout);
>>> +}
>>> +
>>>    static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>    {
>>>    	int err;
>>> @@ -280,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>    	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
>>>    	if (!enabled && !err) {
>>> +		atomic_inc(&guc->outstanding_submission_g2h);
>>>    		set_context_enabled(ce);
>>>    	} else if (!enabled) {
>>>    		clr_context_pending_enable(ce);
>>> @@ -731,7 +795,8 @@ static int __guc_action_register_context(struct intel_guc *guc,
>>>    		offset,
>>>    	};
>>> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>>> +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> +					     0, true);
>>>    }
>>>    static int register_context(struct intel_context *ce)
>>> @@ -751,8 +816,9 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>>>    		guc_id,
>>>    	};
>>> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> -					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
>>> +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> +					     G2H_LEN_DW_DEREGISTER_CONTEXT,
>>> +					     true);
>>>    }
>>>    static int deregister_context(struct intel_context *ce, u32 guc_id)
>>> @@ -893,8 +959,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>>>    	intel_context_get(ce);
>>> -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> -				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>>> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>>>    }
>>>    static u16 prep_context_pending_disable(struct intel_context *ce)
>>> @@ -1440,6 +1506,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>>>    	return ce;
>>>    }
>>> +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
>>> +{
>>> +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
>>> +		wake_up_all(&guc->ct.wq);
>>> +}
>>> +
>>>    int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>    					  const u32 *msg,
>>>    					  u32 len)
>>> @@ -1475,6 +1547,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>    		lrc_destroy(&ce->ref);
>>>    	}
>>> +	decr_outstanding_submission_g2h(guc);
>>> +
>>>    	return 0;
>>>    }
>>> @@ -1523,6 +1597,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>>>    		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>>    	}
>>> +	decr_outstanding_submission_g2h(guc);
>>>    	intel_context_put(ce);
>>>    	return 0;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>>> index 9c954c589edf..c4cef885e984 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>>> @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
>>>    #undef uc_state_checkers
>>>    #undef __uc_state_checker
>>> +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
>>> +{
>>> +	return intel_guc_wait_for_idle(&uc->guc, timeout);
>>> +}
>>> +
>>>    #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
>>>    static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
>>>    { \
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
>>> index 4d2d59a9942b..2b73ddb11c66 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
>>> @@ -27,6 +27,7 @@
>>>     */
>>>    #include "gem/i915_gem_context.h"
>>> +#include "gt/intel_gt.h"
>> Still not seeing a need for this.
>>
>>>    #include "gt/intel_gt_requests.h"
>>>    #include "i915_drv.h"
>>> diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
>>> index c130010a7033..1c721542e277 100644
>>> --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
>>> +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
>>> @@ -5,7 +5,7 @@
>>>     */
>>>    #include "i915_drv.h"
>>> -#include "gt/intel_gt_requests.h"
>>> +#include "gt/intel_gt.h"
>> Nor this.
>>
> We need these because intel_gt_wait_for_idle which moved from
> "gt/intel_gt_requests.h" to "gt/intel_gt.h".
>
> Matt
Ah, okay. That makes sense.

With the return of stddef.h above...
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

>
>> John.
>>
>>>    #include "../i915_selftest.h"
>>>    #include "igt_flush_test.h"
>>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> index d189c4bd4bef..4f8180146888 100644
>>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915)
>>>    	do {
>>>    		for_each_engine(engine, gt, id)
>>>    			mock_engine_flush(engine);
>>> -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
>>> +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
>>> +						  NULL));
>>>    }
>>>    static void mock_device_release(struct drm_device *dev)


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC
@ 2021-07-20 19:49         ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 19:49 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel

On 7/19/2021 18:53, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 06:03:05PM -0700, John Harrison wrote:
>> On 7/16/2021 13:16, Matthew Brost wrote:
>>> When running the GuC the GPU can't be considered idle if the GuC still
>>> has contexts pinned. As such, a call has been added in
>>> intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
>>> the number of unpinned contexts to go to zero.
>>>
>>> v2: rtimeout -> remaining_timeout
>>> v3: Drop unnecessary includes, guc_submission_busy_loop ->
>>> guc_submission_send_busy_loop, drop negatie timeout trick, move a
>>> refactor of guc_context_unpin to earlier path (John H)
>>>
>>> Cc: John Harrison <john.c.harrison@intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>>>    drivers/gpu/drm/i915/gt/intel_gt.c            | 19 +++++
>>>    drivers/gpu/drm/i915/gt/intel_gt.h            |  2 +
>>>    drivers/gpu/drm/i915/gt/intel_gt_requests.c   | 21 ++---
>>>    drivers/gpu/drm/i915/gt/intel_gt_requests.h   |  7 +-
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  4 +
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  1 +
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h     |  4 +
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 85 +++++++++++++++++--
>>>    drivers/gpu/drm/i915/gt/uc/intel_uc.h         |  5 ++
>>>    drivers/gpu/drm/i915/i915_gem_evict.c         |  1 +
>>>    .../gpu/drm/i915/selftests/igt_live_test.c    |  2 +-
>>>    .../gpu/drm/i915/selftests/mock_gem_device.c  |  3 +-
>>>    13 files changed, 129 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> index a90f796e85c0..6fffd4d377c2 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>> @@ -645,7 +645,8 @@ mmap_offset_attach(struct drm_i915_gem_object *obj,
>>>    		goto insert;
>>>    	/* Attempt to reap some mmap space from dead objects */
>>> -	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT);
>>> +	err = intel_gt_retire_requests_timeout(&i915->gt, MAX_SCHEDULE_TIMEOUT,
>>> +					       NULL);
>>>    	if (err)
>>>    		goto err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> index e714e21c0a4d..acfdd53b2678 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
>>> @@ -585,6 +585,25 @@ static void __intel_gt_disable(struct intel_gt *gt)
>>>    	GEM_BUG_ON(intel_gt_pm_is_awake(gt));
>>>    }
>>> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
>>> +{
>>> +	long remaining_timeout;
>>> +
>>> +	/* If the device is asleep, we have no requests outstanding */
>>> +	if (!intel_gt_pm_is_awake(gt))
>>> +		return 0;
>>> +
>>> +	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout,
>>> +							   &remaining_timeout)) > 0) {
>>> +		cond_resched();
>>> +		if (signal_pending(current))
>>> +			return -EINTR;
>>> +	}
>>> +
>>> +	return timeout ? timeout : intel_uc_wait_for_idle(&gt->uc,
>>> +							  remaining_timeout);
>>> +}
>>> +
>>>    int intel_gt_init(struct intel_gt *gt)
>>>    {
>>>    	int err;
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h b/drivers/gpu/drm/i915/gt/intel_gt.h
>>> index e7aabe0cc5bf..74e771871a9b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt.h
>>> @@ -48,6 +48,8 @@ void intel_gt_driver_release(struct intel_gt *gt);
>>>    void intel_gt_driver_late_release(struct intel_gt *gt);
>>> +int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
>>> +
>>>    void intel_gt_check_and_clear_faults(struct intel_gt *gt);
>>>    void intel_gt_clear_error_registers(struct intel_gt *gt,
>>>    				    intel_engine_mask_t engine_mask);
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> index 647eca9d867a..edb881d75630 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>>> @@ -130,7 +130,8 @@ void intel_engine_fini_retire(struct intel_engine_cs *engine)
>>>    	GEM_BUG_ON(engine->retire);
>>>    }
>>> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>>> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>>> +				      long *remaining_timeout)
>>>    {
>>>    	struct intel_gt_timelines *timelines = &gt->timelines;
>>>    	struct intel_timeline *tl, *tn;
>>> @@ -195,22 +196,10 @@ out_active:	spin_lock(&timelines->lock);
>>>    	if (flush_submission(gt, timeout)) /* Wait, there's more! */
>>>    		active_count++;
>>> -	return active_count ? timeout : 0;
>>> -}
>>> -
>>> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout)
>>> -{
>>> -	/* If the device is asleep, we have no requests outstanding */
>>> -	if (!intel_gt_pm_is_awake(gt))
>>> -		return 0;
>>> -
>>> -	while ((timeout = intel_gt_retire_requests_timeout(gt, timeout)) > 0) {
>>> -		cond_resched();
>>> -		if (signal_pending(current))
>>> -			return -EINTR;
>>> -	}
>>> +	if (remaining_timeout)
>>> +		*remaining_timeout = timeout;
>>> -	return timeout;
>>> +	return active_count ? timeout : 0;
>>>    }
>>>    static void retire_work_handler(struct work_struct *work)
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>> index fcc30a6e4fe9..83ff5280c06e 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
>> You were saying the the include of stddef is needed here?
>>
> Yes, HDRTEST [1] complains otherwise.
>
> [1] https://patchwork.freedesktop.org/series/91840/#rev3
>
>>> @@ -10,10 +10,11 @@ struct intel_engine_cs;
>>>    struct intel_gt;
>>>    struct intel_timeline;
>>> -long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout);
>>> +long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout,
>>> +				      long *remaining_timeout);
>>>    static inline void intel_gt_retire_requests(struct intel_gt *gt)
>>>    {
>>> -	intel_gt_retire_requests_timeout(gt, 0);
>>> +	intel_gt_retire_requests_timeout(gt, 0, NULL);
>>>    }
>>>    void intel_engine_init_retire(struct intel_engine_cs *engine);
>>> @@ -21,8 +22,6 @@ void intel_engine_add_retire(struct intel_engine_cs *engine,
>>>    			     struct intel_timeline *tl);
>>>    void intel_engine_fini_retire(struct intel_engine_cs *engine);
>>> -int intel_gt_wait_for_idle(struct intel_gt *gt, long timeout);
>>> -
>>>    void intel_gt_init_requests(struct intel_gt *gt);
>>>    void intel_gt_park_requests(struct intel_gt *gt);
>>>    void intel_gt_unpark_requests(struct intel_gt *gt);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index 80b88bae5f24..3cc566565224 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -39,6 +39,8 @@ struct intel_guc {
>>>    	spinlock_t irq_lock;
>>>    	unsigned int msg_enabled_mask;
>>> +	atomic_t outstanding_submission_g2h;
>>> +
>>>    	struct {
>>>    		void (*reset)(struct intel_guc *guc);
>>>    		void (*enable)(struct intel_guc *guc);
>>> @@ -238,6 +240,8 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>>>    	spin_unlock_irq(&guc->irq_lock);
>>>    }
>>> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
>>> +
>>>    int intel_guc_reset_engine(struct intel_guc *guc,
>>>    			   struct intel_engine_cs *engine);
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index c33906ec478d..f1cbed6b9f0a 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -109,6 +109,7 @@ void intel_guc_ct_init_early(struct intel_guc_ct *ct)
>>>    	INIT_LIST_HEAD(&ct->requests.incoming);
>>>    	INIT_WORK(&ct->requests.worker, ct_incoming_request_worker_func);
>>>    	tasklet_setup(&ct->receive_tasklet, ct_receive_tasklet_func);
>>> +	init_waitqueue_head(&ct->wq);
>>>    }
>>>    static inline const char *guc_ct_buffer_type_to_str(u32 type)
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> index 785dfc5c6efb..4b30a562ae63 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.h
>>> @@ -10,6 +10,7 @@
>>>    #include <linux/spinlock.h>
>>>    #include <linux/workqueue.h>
>>>    #include <linux/ktime.h>
>>> +#include <linux/wait.h>
>>>    #include "intel_guc_fwif.h"
>>> @@ -68,6 +69,9 @@ struct intel_guc_ct {
>>>    	struct tasklet_struct receive_tasklet;
>>> +	/** @wq: wait queue for g2h chanenl */
>>> +	wait_queue_head_t wq;
>>> +
>>>    	struct {
>>>    		u16 last_fence; /* last fence used to send request */
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index f7e34baa9506..088d11e2e497 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -254,6 +254,69 @@ static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>>>    	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
>>>    }
>>> +static int guc_submission_send_busy_loop(struct intel_guc* guc,
>>> +					 const u32 *action,
>>> +					 u32 len,
>>> +					 u32 g2h_len_dw,
>>> +					 bool loop)
>>> +{
>>> +	int err;
>>> +
>>> +	err = intel_guc_send_busy_loop(guc, action, len, g2h_len_dw, loop);
>>> +
>>> +	if (!err && g2h_len_dw)
>>> +		atomic_inc(&guc->outstanding_submission_g2h);
>>> +
>>> +	return err;
>>> +}
>>> +
>>> +static int guc_wait_for_pending_msg(struct intel_guc *guc,
>>> +				    atomic_t *wait_var,
>>> +				    bool interruptible,
>>> +				    long timeout)
>>> +{
>>> +	const int state = interruptible ?
>>> +		TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>>> +	DEFINE_WAIT(wait);
>>> +
>>> +	might_sleep();
>>> +	GEM_BUG_ON(timeout < 0);
>>> +
>>> +	if (!atomic_read(wait_var))
>>> +		return 0;
>>> +
>>> +	if (!timeout)
>>> +		return -ETIME;
>>> +
>>> +	for (;;) {
>>> +		prepare_to_wait(&guc->ct.wq, &wait, state);
>>> +
>>> +		if (!atomic_read(wait_var))
>>> +			break;
>>> +
>>> +		if (signal_pending_state(state, current)) {
>>> +			timeout = -EINTR;
>>> +			break;
>>> +		}
>>> +
>>> +		if (!timeout) {
>>> +			timeout = -ETIME;
>>> +			break;
>>> +		}
>>> +
>>> +		timeout = io_schedule_timeout(timeout);
>>> +	}
>>> +	finish_wait(&guc->ct.wq, &wait);
>>> +
>>> +	return (timeout < 0) ? timeout : 0;
>>> +}
>>> +
>>> +int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
>>> +{
>>> +	return guc_wait_for_pending_msg(guc, &guc->outstanding_submission_g2h,
>>> +					true, timeout);
>>> +}
>>> +
>>>    static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>    {
>>>    	int err;
>>> @@ -280,6 +343,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>    	err = intel_guc_send_nb(guc, action, len, g2h_len_dw);
>>>    	if (!enabled && !err) {
>>> +		atomic_inc(&guc->outstanding_submission_g2h);
>>>    		set_context_enabled(ce);
>>>    	} else if (!enabled) {
>>>    		clr_context_pending_enable(ce);
>>> @@ -731,7 +795,8 @@ static int __guc_action_register_context(struct intel_guc *guc,
>>>    		offset,
>>>    	};
>>> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>>> +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> +					     0, true);
>>>    }
>>>    static int register_context(struct intel_context *ce)
>>> @@ -751,8 +816,9 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>>>    		guc_id,
>>>    	};
>>> -	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> -					G2H_LEN_DW_DEREGISTER_CONTEXT, true);
>>> +	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> +					     G2H_LEN_DW_DEREGISTER_CONTEXT,
>>> +					     true);
>>>    }
>>>    static int deregister_context(struct intel_context *ce, u32 guc_id)
>>> @@ -893,8 +959,8 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>>>    	intel_context_get(ce);
>>> -	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> -				 G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>>> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>>> +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>>>    }
>>>    static u16 prep_context_pending_disable(struct intel_context *ce)
>>> @@ -1440,6 +1506,12 @@ g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>>>    	return ce;
>>>    }
>>> +static void decr_outstanding_submission_g2h(struct intel_guc *guc)
>>> +{
>>> +	if (atomic_dec_and_test(&guc->outstanding_submission_g2h))
>>> +		wake_up_all(&guc->ct.wq);
>>> +}
>>> +
>>>    int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>    					  const u32 *msg,
>>>    					  u32 len)
>>> @@ -1475,6 +1547,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>    		lrc_destroy(&ce->ref);
>>>    	}
>>> +	decr_outstanding_submission_g2h(guc);
>>> +
>>>    	return 0;
>>>    }
>>> @@ -1523,6 +1597,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>>>    		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>>    	}
>>> +	decr_outstanding_submission_g2h(guc);
>>>    	intel_context_put(ce);
>>>    	return 0;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>>> index 9c954c589edf..c4cef885e984 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
>>> @@ -81,6 +81,11 @@ uc_state_checkers(guc, guc_submission);
>>>    #undef uc_state_checkers
>>>    #undef __uc_state_checker
>>> +static inline int intel_uc_wait_for_idle(struct intel_uc *uc, long timeout)
>>> +{
>>> +	return intel_guc_wait_for_idle(&uc->guc, timeout);
>>> +}
>>> +
>>>    #define intel_uc_ops_function(_NAME, _OPS, _TYPE, _RET) \
>>>    static inline _TYPE intel_uc_##_NAME(struct intel_uc *uc) \
>>>    { \
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
>>> index 4d2d59a9942b..2b73ddb11c66 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
>>> @@ -27,6 +27,7 @@
>>>     */
>>>    #include "gem/i915_gem_context.h"
>>> +#include "gt/intel_gt.h"
>> Still not seeing a need for this.
>>
>>>    #include "gt/intel_gt_requests.h"
>>>    #include "i915_drv.h"
>>> diff --git a/drivers/gpu/drm/i915/selftests/igt_live_test.c b/drivers/gpu/drm/i915/selftests/igt_live_test.c
>>> index c130010a7033..1c721542e277 100644
>>> --- a/drivers/gpu/drm/i915/selftests/igt_live_test.c
>>> +++ b/drivers/gpu/drm/i915/selftests/igt_live_test.c
>>> @@ -5,7 +5,7 @@
>>>     */
>>>    #include "i915_drv.h"
>>> -#include "gt/intel_gt_requests.h"
>>> +#include "gt/intel_gt.h"
>> Nor this.
>>
> We need these because intel_gt_wait_for_idle which moved from
> "gt/intel_gt_requests.h" to "gt/intel_gt.h".
>
> Matt
Ah, okay. That makes sense.

With the return of stddef.h above...
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

>
>> John.
>>
>>>    #include "../i915_selftest.h"
>>>    #include "igt_flush_test.h"
>>> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> index d189c4bd4bef..4f8180146888 100644
>>> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
>>> @@ -52,7 +52,8 @@ void mock_device_flush(struct drm_i915_private *i915)
>>>    	do {
>>>    		for_each_engine(engine, gt, id)
>>>    			mock_engine_flush(engine);
>>> -	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT));
>>> +	} while (intel_gt_retire_requests_timeout(gt, MAX_SCHEDULE_TIMEOUT,
>>> +						  NULL));
>>>    }
>>>    static void mock_device_release(struct drm_device *dev)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc
  2021-07-20 19:55     ` [Intel-gfx] " John Harrison
@ 2021-07-20 19:53       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 19:53 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On Tue, Jul 20, 2021 at 12:55:08PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > This help the backends clean up when the schedule engine object gets
> help -> helps. Although, I would say it's more like 'this is required to
> allow backend specific cleanup'. It doesn't just make life a bit easier, it
> allows us to not leak stuff and/or dereference null pointers!
> 
> Either way...
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

I'll reword it.

Matt

> 
> > destroyed.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_scheduler.c       | 3 ++-
> >   drivers/gpu/drm/i915/i915_scheduler.h       | 4 +---
> >   drivers/gpu/drm/i915/i915_scheduler_types.h | 5 +++++
> >   3 files changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> > index 3a58a9130309..4fceda96deed 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.c
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > @@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m,
> >   	rcu_read_unlock();
> >   }
> > -void i915_sched_engine_free(struct kref *kref)
> > +static void default_destroy(struct kref *kref)
> >   {
> >   	struct i915_sched_engine *sched_engine =
> >   		container_of(kref, typeof(*sched_engine), ref);
> > @@ -453,6 +453,7 @@ i915_sched_engine_create(unsigned int subclass)
> >   	sched_engine->queue = RB_ROOT_CACHED;
> >   	sched_engine->queue_priority_hint = INT_MIN;
> > +	sched_engine->destroy = default_destroy;
> >   	INIT_LIST_HEAD(&sched_engine->requests);
> >   	INIT_LIST_HEAD(&sched_engine->hold);
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> > index 650ab8e0db9f..3c9504e9f409 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.h
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> > @@ -51,8 +51,6 @@ static inline void i915_priolist_free(struct i915_priolist *p)
> >   struct i915_sched_engine *
> >   i915_sched_engine_create(unsigned int subclass);
> > -void i915_sched_engine_free(struct kref *kref);
> > -
> >   static inline struct i915_sched_engine *
> >   i915_sched_engine_get(struct i915_sched_engine *sched_engine)
> >   {
> > @@ -63,7 +61,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine)
> >   static inline void
> >   i915_sched_engine_put(struct i915_sched_engine *sched_engine)
> >   {
> > -	kref_put(&sched_engine->ref, i915_sched_engine_free);
> > +	kref_put(&sched_engine->ref, sched_engine->destroy);
> >   }
> >   static inline bool
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > index 5935c3152bdc..00384e2c5273 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> > +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > @@ -163,6 +163,11 @@ struct i915_sched_engine {
> >   	 */
> >   	void *private_data;
> > +	/**
> > +	 * @destroy: destroy schedule engine / cleanup in backend
> > +	 */
> > +	void	(*destroy)(struct kref *kref);
> > +
> >   	/**
> >   	 * @kick_backend: kick backend after a request's priority has changed
> >   	 */
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc
@ 2021-07-20 19:53       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 19:53 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Tue, Jul 20, 2021 at 12:55:08PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > This help the backends clean up when the schedule engine object gets
> help -> helps. Although, I would say it's more like 'this is required to
> allow backend specific cleanup'. It doesn't just make life a bit easier, it
> allows us to not leak stuff and/or dereference null pointers!
> 
> Either way...
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

I'll reword it.

Matt

> 
> > destroyed.
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_scheduler.c       | 3 ++-
> >   drivers/gpu/drm/i915/i915_scheduler.h       | 4 +---
> >   drivers/gpu/drm/i915/i915_scheduler_types.h | 5 +++++
> >   3 files changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> > index 3a58a9130309..4fceda96deed 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.c
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > @@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m,
> >   	rcu_read_unlock();
> >   }
> > -void i915_sched_engine_free(struct kref *kref)
> > +static void default_destroy(struct kref *kref)
> >   {
> >   	struct i915_sched_engine *sched_engine =
> >   		container_of(kref, typeof(*sched_engine), ref);
> > @@ -453,6 +453,7 @@ i915_sched_engine_create(unsigned int subclass)
> >   	sched_engine->queue = RB_ROOT_CACHED;
> >   	sched_engine->queue_priority_hint = INT_MIN;
> > +	sched_engine->destroy = default_destroy;
> >   	INIT_LIST_HEAD(&sched_engine->requests);
> >   	INIT_LIST_HEAD(&sched_engine->hold);
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> > index 650ab8e0db9f..3c9504e9f409 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.h
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> > @@ -51,8 +51,6 @@ static inline void i915_priolist_free(struct i915_priolist *p)
> >   struct i915_sched_engine *
> >   i915_sched_engine_create(unsigned int subclass);
> > -void i915_sched_engine_free(struct kref *kref);
> > -
> >   static inline struct i915_sched_engine *
> >   i915_sched_engine_get(struct i915_sched_engine *sched_engine)
> >   {
> > @@ -63,7 +61,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine)
> >   static inline void
> >   i915_sched_engine_put(struct i915_sched_engine *sched_engine)
> >   {
> > -	kref_put(&sched_engine->ref, i915_sched_engine_free);
> > +	kref_put(&sched_engine->ref, sched_engine->destroy);
> >   }
> >   static inline bool
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > index 5935c3152bdc..00384e2c5273 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> > +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > @@ -163,6 +163,11 @@ struct i915_sched_engine {
> >   	 */
> >   	void *private_data;
> > +	/**
> > +	 * @destroy: destroy schedule engine / cleanup in backend
> > +	 */
> > +	void	(*destroy)(struct kref *kref);
> > +
> >   	/**
> >   	 * @kick_backend: kick backend after a request's priority has changed
> >   	 */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 19:55     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 19:55 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> This help the backends clean up when the schedule engine object gets
help -> helps. Although, I would say it's more like 'this is required to 
allow backend specific cleanup'. It doesn't just make life a bit easier, 
it allows us to not leak stuff and/or dereference null pointers!

Either way...
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> destroyed.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_scheduler.c       | 3 ++-
>   drivers/gpu/drm/i915/i915_scheduler.h       | 4 +---
>   drivers/gpu/drm/i915/i915_scheduler_types.h | 5 +++++
>   3 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 3a58a9130309..4fceda96deed 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m,
>   	rcu_read_unlock();
>   }
>   
> -void i915_sched_engine_free(struct kref *kref)
> +static void default_destroy(struct kref *kref)
>   {
>   	struct i915_sched_engine *sched_engine =
>   		container_of(kref, typeof(*sched_engine), ref);
> @@ -453,6 +453,7 @@ i915_sched_engine_create(unsigned int subclass)
>   
>   	sched_engine->queue = RB_ROOT_CACHED;
>   	sched_engine->queue_priority_hint = INT_MIN;
> +	sched_engine->destroy = default_destroy;
>   
>   	INIT_LIST_HEAD(&sched_engine->requests);
>   	INIT_LIST_HEAD(&sched_engine->hold);
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 650ab8e0db9f..3c9504e9f409 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -51,8 +51,6 @@ static inline void i915_priolist_free(struct i915_priolist *p)
>   struct i915_sched_engine *
>   i915_sched_engine_create(unsigned int subclass);
>   
> -void i915_sched_engine_free(struct kref *kref);
> -
>   static inline struct i915_sched_engine *
>   i915_sched_engine_get(struct i915_sched_engine *sched_engine)
>   {
> @@ -63,7 +61,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine)
>   static inline void
>   i915_sched_engine_put(struct i915_sched_engine *sched_engine)
>   {
> -	kref_put(&sched_engine->ref, i915_sched_engine_free);
> +	kref_put(&sched_engine->ref, sched_engine->destroy);
>   }
>   
>   static inline bool
> diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> index 5935c3152bdc..00384e2c5273 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> @@ -163,6 +163,11 @@ struct i915_sched_engine {
>   	 */
>   	void *private_data;
>   
> +	/**
> +	 * @destroy: destroy schedule engine / cleanup in backend
> +	 */
> +	void	(*destroy)(struct kref *kref);
> +
>   	/**
>   	 * @kick_backend: kick backend after a request's priority has changed
>   	 */


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc
@ 2021-07-20 19:55     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 19:55 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> This help the backends clean up when the schedule engine object gets
help -> helps. Although, I would say it's more like 'this is required to 
allow backend specific cleanup'. It doesn't just make life a bit easier, 
it allows us to not leak stuff and/or dereference null pointers!

Either way...
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> destroyed.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_scheduler.c       | 3 ++-
>   drivers/gpu/drm/i915/i915_scheduler.h       | 4 +---
>   drivers/gpu/drm/i915/i915_scheduler_types.h | 5 +++++
>   3 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 3a58a9130309..4fceda96deed 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -431,7 +431,7 @@ void i915_request_show_with_schedule(struct drm_printer *m,
>   	rcu_read_unlock();
>   }
>   
> -void i915_sched_engine_free(struct kref *kref)
> +static void default_destroy(struct kref *kref)
>   {
>   	struct i915_sched_engine *sched_engine =
>   		container_of(kref, typeof(*sched_engine), ref);
> @@ -453,6 +453,7 @@ i915_sched_engine_create(unsigned int subclass)
>   
>   	sched_engine->queue = RB_ROOT_CACHED;
>   	sched_engine->queue_priority_hint = INT_MIN;
> +	sched_engine->destroy = default_destroy;
>   
>   	INIT_LIST_HEAD(&sched_engine->requests);
>   	INIT_LIST_HEAD(&sched_engine->hold);
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 650ab8e0db9f..3c9504e9f409 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -51,8 +51,6 @@ static inline void i915_priolist_free(struct i915_priolist *p)
>   struct i915_sched_engine *
>   i915_sched_engine_create(unsigned int subclass);
>   
> -void i915_sched_engine_free(struct kref *kref);
> -
>   static inline struct i915_sched_engine *
>   i915_sched_engine_get(struct i915_sched_engine *sched_engine)
>   {
> @@ -63,7 +61,7 @@ i915_sched_engine_get(struct i915_sched_engine *sched_engine)
>   static inline void
>   i915_sched_engine_put(struct i915_sched_engine *sched_engine)
>   {
> -	kref_put(&sched_engine->ref, i915_sched_engine_free);
> +	kref_put(&sched_engine->ref, sched_engine->destroy);
>   }
>   
>   static inline bool
> diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> index 5935c3152bdc..00384e2c5273 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> @@ -163,6 +163,11 @@ struct i915_sched_engine {
>   	 */
>   	void *private_data;
>   
> +	/**
> +	 * @destroy: destroy schedule engine / cleanup in backend
> +	 */
> +	void	(*destroy)(struct kref *kref);
> +
>   	/**
>   	 * @kick_backend: kick backend after a request's priority has changed
>   	 */

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 25/51] drm/i915: Move active request tracking to a vfunc
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 20:05     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 20:05 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> Move active request tracking to a backend vfunc rather than assuming all
> backends want to do this in the maner. In the case execlists /
maner -> manner.
In the case *of* execlists

With those fixed...
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>


> ring submission the tracking is on the physical engine while with GuC
> submission it is on the context.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |  3 ++
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  7 ++++
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  6 +++
>   .../drm/i915/gt/intel_execlists_submission.c  | 40 ++++++++++++++++++
>   .../gpu/drm/i915/gt/intel_ring_submission.c   | 22 ++++++++++
>   drivers/gpu/drm/i915/gt/mock_engine.c         | 30 ++++++++++++++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 33 +++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c           | 41 ++-----------------
>   drivers/gpu/drm/i915/i915_request.h           |  2 +
>   9 files changed, 147 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 251ff7eea22d..bfb05d8697d1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -393,6 +393,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   	spin_lock_init(&ce->guc_state.lock);
>   	INIT_LIST_HEAD(&ce->guc_state.fences);
>   
> +	spin_lock_init(&ce->guc_active.lock);
> +	INIT_LIST_HEAD(&ce->guc_active.requests);
> +
>   	ce->guc_id = GUC_INVALID_LRC_ID;
>   	INIT_LIST_HEAD(&ce->guc_id_link);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 542c98418771..035108c10b2c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -162,6 +162,13 @@ struct intel_context {
>   		struct list_head fences;
>   	} guc_state;
>   
> +	struct {
> +		/** lock: protects everything in guc_active */
> +		spinlock_t lock;
> +		/** requests: active requests on this context */
> +		struct list_head requests;
> +	} guc_active;
> +
>   	/* GuC scheduling state flags that do not require a lock. */
>   	atomic_t guc_sched_state_no_lock;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 03a81e8d87f4..950fc73ed6af 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -420,6 +420,12 @@ struct intel_engine_cs {
>   
>   	void		(*release)(struct intel_engine_cs *engine);
>   
> +	/*
> +	 * Add / remove request from engine active tracking
> +	 */
> +	void		(*add_active_request)(struct i915_request *rq);
> +	void		(*remove_active_request)(struct i915_request *rq);
> +
>   	struct intel_engine_execlists execlists;
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index abe48421fd7a..f9b5f54a5abe 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3106,6 +3106,42 @@ static void execlists_park(struct intel_engine_cs *engine)
>   	cancel_timer(&engine->execlists.preempt);
>   }
>   
> +static void add_to_engine(struct i915_request *rq)
> +{
> +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> +}
> +
> +static void remove_from_engine(struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine, *locked;
> +
> +	/*
> +	 * Virtual engines complicate acquiring the engine timeline lock,
> +	 * as their rq->engine pointer is not stable until under that
> +	 * engine lock. The simple ploy we use is to take the lock then
> +	 * check that the rq still belongs to the newly locked engine.
> +	 */
> +	locked = READ_ONCE(rq->engine);
> +	spin_lock_irq(&locked->sched_engine->lock);
> +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> +		spin_unlock(&locked->sched_engine->lock);
> +		spin_lock(&engine->sched_engine->lock);
> +		locked = engine;
> +	}
> +	list_del_init(&rq->sched.link);
> +
> +	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> +
> +	/* Prevent further __await_execution() registering a cb, then flush */
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +
> +	spin_unlock_irq(&locked->sched_engine->lock);
> +
> +	i915_request_notify_execute_cb_imm(rq);
> +}
> +
>   static bool can_preempt(struct intel_engine_cs *engine)
>   {
>   	if (GRAPHICS_VER(engine->i915) > 8)
> @@ -3206,6 +3242,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->cops = &execlists_context_ops;
>   	engine->request_alloc = execlists_request_alloc;
>   	engine->bump_serial = execlist_bump_serial;
> +	engine->add_active_request = add_to_engine;
> +	engine->remove_active_request = remove_from_engine;
>   
>   	engine->reset.prepare = execlists_reset_prepare;
>   	engine->reset.rewind = execlists_reset_rewind;
> @@ -3797,6 +3835,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   			 "v%dx%d", ve->base.class, count);
>   		ve->base.context_size = sibling->context_size;
>   
> +		ve->base.add_active_request = sibling->add_active_request;
> +		ve->base.remove_active_request = sibling->remove_active_request;
>   		ve->base.emit_bb_start = sibling->emit_bb_start;
>   		ve->base.emit_flush = sibling->emit_flush;
>   		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 61469c631057..3b1471c50d40 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -1052,6 +1052,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
>   	engine->serial++;
>   }
>   
> +static void add_to_engine(struct i915_request *rq)
> +{
> +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> +}
> +
> +static void remove_from_engine(struct i915_request *rq)
> +{
> +	spin_lock_irq(&rq->engine->sched_engine->lock);
> +	list_del_init(&rq->sched.link);
> +
> +	/* Prevent further __await_execution() registering a cb, then flush */
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +
> +	spin_unlock_irq(&rq->engine->sched_engine->lock);
> +
> +	i915_request_notify_execute_cb_imm(rq);
> +}
> +
>   static void setup_common(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> @@ -1069,6 +1088,9 @@ static void setup_common(struct intel_engine_cs *engine)
>   	engine->reset.cancel = reset_cancel;
>   	engine->reset.finish = reset_finish;
>   
> +	engine->add_active_request = add_to_engine;
> +	engine->remove_active_request = remove_from_engine;
> +
>   	engine->cops = &ring_context_ops;
>   	engine->request_alloc = ring_request_alloc;
>   	engine->bump_serial = ring_bump_serial;
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index fc5a65ab1937..5f86ff79c98c 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -235,6 +235,34 @@ static void mock_submit_request(struct i915_request *request)
>   	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
>   
> +static void mock_add_to_engine(struct i915_request *rq)
> +{
> +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> +}
> +
> +static void mock_remove_from_engine(struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine, *locked;
> +
> +	/*
> +	 * Virtual engines complicate acquiring the engine timeline lock,
> +	 * as their rq->engine pointer is not stable until under that
> +	 * engine lock. The simple ploy we use is to take the lock then
> +	 * check that the rq still belongs to the newly locked engine.
> +	 */
> +
> +	locked = READ_ONCE(rq->engine);
> +	spin_lock_irq(&locked->sched_engine->lock);
> +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> +		spin_unlock(&locked->sched_engine->lock);
> +		spin_lock(&engine->sched_engine->lock);
> +		locked = engine;
> +	}
> +	list_del_init(&rq->sched.link);
> +	spin_unlock_irq(&locked->sched_engine->lock);
> +}
> +
>   static void mock_reset_prepare(struct intel_engine_cs *engine)
>   {
>   }
> @@ -327,6 +355,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.emit_flush = mock_emit_flush;
>   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>   	engine->base.submit_request = mock_submit_request;
> +	engine->base.add_active_request = mock_add_to_engine;
> +	engine->base.remove_active_request = mock_remove_from_engine;
>   
>   	engine->base.reset.prepare = mock_reset_prepare;
>   	engine->base.reset.rewind = mock_reset_rewind;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 9f28899ff17f..f8a6077fa3f9 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1147,6 +1147,33 @@ static int guc_context_alloc(struct intel_context *ce)
>   	return lrc_alloc(ce, ce->engine);
>   }
>   
> +static void add_to_context(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +
> +	spin_lock(&ce->guc_active.lock);
> +	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> +	spin_unlock(&ce->guc_active.lock);
> +}
> +
> +static void remove_from_context(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +
> +	spin_lock_irq(&ce->guc_active.lock);
> +
> +	list_del_init(&rq->sched.link);
> +	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +
> +	/* Prevent further __await_execution() registering a cb, then flush */
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +
> +	spin_unlock_irq(&ce->guc_active.lock);
> +
> +	atomic_dec(&ce->guc_id_ref);
> +	i915_request_notify_execute_cb_imm(rq);
> +}
> +
>   static const struct intel_context_ops guc_context_ops = {
>   	.alloc = guc_context_alloc,
>   
> @@ -1567,6 +1594,8 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->cops = &guc_context_ops;
>   	engine->request_alloc = guc_request_alloc;
>   	engine->bump_serial = guc_bump_serial;
> +	engine->add_active_request = add_to_context;
> +	engine->remove_active_request = remove_from_context;
>   
>   	engine->sched_engine->schedule = i915_schedule;
>   
> @@ -1931,6 +1960,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   				 "v%dx%d", ve->base.class, count);
>   			ve->base.context_size = sibling->context_size;
>   
> +			ve->base.add_active_request =
> +				sibling->add_active_request;
> +			ve->base.remove_active_request =
> +				sibling->remove_active_request;
>   			ve->base.emit_bb_start = sibling->emit_bb_start;
>   			ve->base.emit_flush = sibling->emit_flush;
>   			ve->base.emit_init_breadcrumb =
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index b3c792d55321..4eba848b20e3 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -182,7 +182,7 @@ static bool irq_work_imm(struct irq_work *wrk)
>   	return false;
>   }
>   
> -static void __notify_execute_cb_imm(struct i915_request *rq)
> +void i915_request_notify_execute_cb_imm(struct i915_request *rq)
>   {
>   	__notify_execute_cb(rq, irq_work_imm);
>   }
> @@ -256,37 +256,6 @@ i915_request_active_engine(struct i915_request *rq,
>   	return ret;
>   }
>   
> -
> -static void remove_from_engine(struct i915_request *rq)
> -{
> -	struct intel_engine_cs *engine, *locked;
> -
> -	/*
> -	 * Virtual engines complicate acquiring the engine timeline lock,
> -	 * as their rq->engine pointer is not stable until under that
> -	 * engine lock. The simple ploy we use is to take the lock then
> -	 * check that the rq still belongs to the newly locked engine.
> -	 */
> -	locked = READ_ONCE(rq->engine);
> -	spin_lock_irq(&locked->sched_engine->lock);
> -	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> -		spin_unlock(&locked->sched_engine->lock);
> -		spin_lock(&engine->sched_engine->lock);
> -		locked = engine;
> -	}
> -	list_del_init(&rq->sched.link);
> -
> -	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> -
> -	/* Prevent further __await_execution() registering a cb, then flush */
> -	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> -
> -	spin_unlock_irq(&locked->sched_engine->lock);
> -
> -	__notify_execute_cb_imm(rq);
> -}
> -
>   static void __rq_init_watchdog(struct i915_request *rq)
>   {
>   	rq->watchdog.timer.function = NULL;
> @@ -383,9 +352,7 @@ bool i915_request_retire(struct i915_request *rq)
>   	 * after removing the breadcrumb and signaling it, so that we do not
>   	 * inadvertently attach the breadcrumb to a completed request.
>   	 */
> -	if (!list_empty(&rq->sched.link))
> -		remove_from_engine(rq);
> -	atomic_dec(&rq->context->guc_id_ref);
> +	rq->engine->remove_active_request(rq);
>   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>   
>   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> @@ -516,7 +483,7 @@ __await_execution(struct i915_request *rq,
>   	if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
>   		if (i915_request_is_active(signal) ||
>   		    __request_in_flight(signal))
> -			__notify_execute_cb_imm(signal);
> +			i915_request_notify_execute_cb_imm(signal);
>   	}
>   
>   	return 0;
> @@ -653,7 +620,7 @@ bool __i915_request_submit(struct i915_request *request)
>   	result = true;
>   
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> -	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
> +	engine->add_active_request(request);
>   active:
>   	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
>   	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 717e5b292046..128030f43bbf 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -647,4 +647,6 @@ bool
>   i915_request_active_engine(struct i915_request *rq,
>   			   struct intel_engine_cs **active);
>   
> +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
> +
>   #endif /* I915_REQUEST_H */


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 25/51] drm/i915: Move active request tracking to a vfunc
@ 2021-07-20 20:05     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 20:05 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> Move active request tracking to a backend vfunc rather than assuming all
> backends want to do this in the maner. In the case execlists /
maner -> manner.
In the case *of* execlists

With those fixed...
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>


> ring submission the tracking is on the physical engine while with GuC
> submission it is on the context.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |  3 ++
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  7 ++++
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  6 +++
>   .../drm/i915/gt/intel_execlists_submission.c  | 40 ++++++++++++++++++
>   .../gpu/drm/i915/gt/intel_ring_submission.c   | 22 ++++++++++
>   drivers/gpu/drm/i915/gt/mock_engine.c         | 30 ++++++++++++++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 33 +++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c           | 41 ++-----------------
>   drivers/gpu/drm/i915/i915_request.h           |  2 +
>   9 files changed, 147 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 251ff7eea22d..bfb05d8697d1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -393,6 +393,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   	spin_lock_init(&ce->guc_state.lock);
>   	INIT_LIST_HEAD(&ce->guc_state.fences);
>   
> +	spin_lock_init(&ce->guc_active.lock);
> +	INIT_LIST_HEAD(&ce->guc_active.requests);
> +
>   	ce->guc_id = GUC_INVALID_LRC_ID;
>   	INIT_LIST_HEAD(&ce->guc_id_link);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 542c98418771..035108c10b2c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -162,6 +162,13 @@ struct intel_context {
>   		struct list_head fences;
>   	} guc_state;
>   
> +	struct {
> +		/** lock: protects everything in guc_active */
> +		spinlock_t lock;
> +		/** requests: active requests on this context */
> +		struct list_head requests;
> +	} guc_active;
> +
>   	/* GuC scheduling state flags that do not require a lock. */
>   	atomic_t guc_sched_state_no_lock;
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 03a81e8d87f4..950fc73ed6af 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -420,6 +420,12 @@ struct intel_engine_cs {
>   
>   	void		(*release)(struct intel_engine_cs *engine);
>   
> +	/*
> +	 * Add / remove request from engine active tracking
> +	 */
> +	void		(*add_active_request)(struct i915_request *rq);
> +	void		(*remove_active_request)(struct i915_request *rq);
> +
>   	struct intel_engine_execlists execlists;
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index abe48421fd7a..f9b5f54a5abe 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3106,6 +3106,42 @@ static void execlists_park(struct intel_engine_cs *engine)
>   	cancel_timer(&engine->execlists.preempt);
>   }
>   
> +static void add_to_engine(struct i915_request *rq)
> +{
> +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> +}
> +
> +static void remove_from_engine(struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine, *locked;
> +
> +	/*
> +	 * Virtual engines complicate acquiring the engine timeline lock,
> +	 * as their rq->engine pointer is not stable until under that
> +	 * engine lock. The simple ploy we use is to take the lock then
> +	 * check that the rq still belongs to the newly locked engine.
> +	 */
> +	locked = READ_ONCE(rq->engine);
> +	spin_lock_irq(&locked->sched_engine->lock);
> +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> +		spin_unlock(&locked->sched_engine->lock);
> +		spin_lock(&engine->sched_engine->lock);
> +		locked = engine;
> +	}
> +	list_del_init(&rq->sched.link);
> +
> +	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> +
> +	/* Prevent further __await_execution() registering a cb, then flush */
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +
> +	spin_unlock_irq(&locked->sched_engine->lock);
> +
> +	i915_request_notify_execute_cb_imm(rq);
> +}
> +
>   static bool can_preempt(struct intel_engine_cs *engine)
>   {
>   	if (GRAPHICS_VER(engine->i915) > 8)
> @@ -3206,6 +3242,8 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->cops = &execlists_context_ops;
>   	engine->request_alloc = execlists_request_alloc;
>   	engine->bump_serial = execlist_bump_serial;
> +	engine->add_active_request = add_to_engine;
> +	engine->remove_active_request = remove_from_engine;
>   
>   	engine->reset.prepare = execlists_reset_prepare;
>   	engine->reset.rewind = execlists_reset_rewind;
> @@ -3797,6 +3835,8 @@ execlists_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   			 "v%dx%d", ve->base.class, count);
>   		ve->base.context_size = sibling->context_size;
>   
> +		ve->base.add_active_request = sibling->add_active_request;
> +		ve->base.remove_active_request = sibling->remove_active_request;
>   		ve->base.emit_bb_start = sibling->emit_bb_start;
>   		ve->base.emit_flush = sibling->emit_flush;
>   		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 61469c631057..3b1471c50d40 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -1052,6 +1052,25 @@ static void ring_bump_serial(struct intel_engine_cs *engine)
>   	engine->serial++;
>   }
>   
> +static void add_to_engine(struct i915_request *rq)
> +{
> +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> +}
> +
> +static void remove_from_engine(struct i915_request *rq)
> +{
> +	spin_lock_irq(&rq->engine->sched_engine->lock);
> +	list_del_init(&rq->sched.link);
> +
> +	/* Prevent further __await_execution() registering a cb, then flush */
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +
> +	spin_unlock_irq(&rq->engine->sched_engine->lock);
> +
> +	i915_request_notify_execute_cb_imm(rq);
> +}
> +
>   static void setup_common(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> @@ -1069,6 +1088,9 @@ static void setup_common(struct intel_engine_cs *engine)
>   	engine->reset.cancel = reset_cancel;
>   	engine->reset.finish = reset_finish;
>   
> +	engine->add_active_request = add_to_engine;
> +	engine->remove_active_request = remove_from_engine;
> +
>   	engine->cops = &ring_context_ops;
>   	engine->request_alloc = ring_request_alloc;
>   	engine->bump_serial = ring_bump_serial;
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index fc5a65ab1937..5f86ff79c98c 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -235,6 +235,34 @@ static void mock_submit_request(struct i915_request *request)
>   	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
>   
> +static void mock_add_to_engine(struct i915_request *rq)
> +{
> +	lockdep_assert_held(&rq->engine->sched_engine->lock);
> +	list_move_tail(&rq->sched.link, &rq->engine->sched_engine->requests);
> +}
> +
> +static void mock_remove_from_engine(struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine, *locked;
> +
> +	/*
> +	 * Virtual engines complicate acquiring the engine timeline lock,
> +	 * as their rq->engine pointer is not stable until under that
> +	 * engine lock. The simple ploy we use is to take the lock then
> +	 * check that the rq still belongs to the newly locked engine.
> +	 */
> +
> +	locked = READ_ONCE(rq->engine);
> +	spin_lock_irq(&locked->sched_engine->lock);
> +	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> +		spin_unlock(&locked->sched_engine->lock);
> +		spin_lock(&engine->sched_engine->lock);
> +		locked = engine;
> +	}
> +	list_del_init(&rq->sched.link);
> +	spin_unlock_irq(&locked->sched_engine->lock);
> +}
> +
>   static void mock_reset_prepare(struct intel_engine_cs *engine)
>   {
>   }
> @@ -327,6 +355,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.emit_flush = mock_emit_flush;
>   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
>   	engine->base.submit_request = mock_submit_request;
> +	engine->base.add_active_request = mock_add_to_engine;
> +	engine->base.remove_active_request = mock_remove_from_engine;
>   
>   	engine->base.reset.prepare = mock_reset_prepare;
>   	engine->base.reset.rewind = mock_reset_rewind;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 9f28899ff17f..f8a6077fa3f9 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1147,6 +1147,33 @@ static int guc_context_alloc(struct intel_context *ce)
>   	return lrc_alloc(ce, ce->engine);
>   }
>   
> +static void add_to_context(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +
> +	spin_lock(&ce->guc_active.lock);
> +	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> +	spin_unlock(&ce->guc_active.lock);
> +}
> +
> +static void remove_from_context(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +
> +	spin_lock_irq(&ce->guc_active.lock);
> +
> +	list_del_init(&rq->sched.link);
> +	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +
> +	/* Prevent further __await_execution() registering a cb, then flush */
> +	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> +
> +	spin_unlock_irq(&ce->guc_active.lock);
> +
> +	atomic_dec(&ce->guc_id_ref);
> +	i915_request_notify_execute_cb_imm(rq);
> +}
> +
>   static const struct intel_context_ops guc_context_ops = {
>   	.alloc = guc_context_alloc,
>   
> @@ -1567,6 +1594,8 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->cops = &guc_context_ops;
>   	engine->request_alloc = guc_request_alloc;
>   	engine->bump_serial = guc_bump_serial;
> +	engine->add_active_request = add_to_context;
> +	engine->remove_active_request = remove_from_context;
>   
>   	engine->sched_engine->schedule = i915_schedule;
>   
> @@ -1931,6 +1960,10 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   				 "v%dx%d", ve->base.class, count);
>   			ve->base.context_size = sibling->context_size;
>   
> +			ve->base.add_active_request =
> +				sibling->add_active_request;
> +			ve->base.remove_active_request =
> +				sibling->remove_active_request;
>   			ve->base.emit_bb_start = sibling->emit_bb_start;
>   			ve->base.emit_flush = sibling->emit_flush;
>   			ve->base.emit_init_breadcrumb =
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index b3c792d55321..4eba848b20e3 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -182,7 +182,7 @@ static bool irq_work_imm(struct irq_work *wrk)
>   	return false;
>   }
>   
> -static void __notify_execute_cb_imm(struct i915_request *rq)
> +void i915_request_notify_execute_cb_imm(struct i915_request *rq)
>   {
>   	__notify_execute_cb(rq, irq_work_imm);
>   }
> @@ -256,37 +256,6 @@ i915_request_active_engine(struct i915_request *rq,
>   	return ret;
>   }
>   
> -
> -static void remove_from_engine(struct i915_request *rq)
> -{
> -	struct intel_engine_cs *engine, *locked;
> -
> -	/*
> -	 * Virtual engines complicate acquiring the engine timeline lock,
> -	 * as their rq->engine pointer is not stable until under that
> -	 * engine lock. The simple ploy we use is to take the lock then
> -	 * check that the rq still belongs to the newly locked engine.
> -	 */
> -	locked = READ_ONCE(rq->engine);
> -	spin_lock_irq(&locked->sched_engine->lock);
> -	while (unlikely(locked != (engine = READ_ONCE(rq->engine)))) {
> -		spin_unlock(&locked->sched_engine->lock);
> -		spin_lock(&engine->sched_engine->lock);
> -		locked = engine;
> -	}
> -	list_del_init(&rq->sched.link);
> -
> -	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> -	clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
> -
> -	/* Prevent further __await_execution() registering a cb, then flush */
> -	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> -
> -	spin_unlock_irq(&locked->sched_engine->lock);
> -
> -	__notify_execute_cb_imm(rq);
> -}
> -
>   static void __rq_init_watchdog(struct i915_request *rq)
>   {
>   	rq->watchdog.timer.function = NULL;
> @@ -383,9 +352,7 @@ bool i915_request_retire(struct i915_request *rq)
>   	 * after removing the breadcrumb and signaling it, so that we do not
>   	 * inadvertently attach the breadcrumb to a completed request.
>   	 */
> -	if (!list_empty(&rq->sched.link))
> -		remove_from_engine(rq);
> -	atomic_dec(&rq->context->guc_id_ref);
> +	rq->engine->remove_active_request(rq);
>   	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>   
>   	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
> @@ -516,7 +483,7 @@ __await_execution(struct i915_request *rq,
>   	if (llist_add(&cb->work.node.llist, &signal->execute_cb)) {
>   		if (i915_request_is_active(signal) ||
>   		    __request_in_flight(signal))
> -			__notify_execute_cb_imm(signal);
> +			i915_request_notify_execute_cb_imm(signal);
>   	}
>   
>   	return 0;
> @@ -653,7 +620,7 @@ bool __i915_request_submit(struct i915_request *request)
>   	result = true;
>   
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> -	list_move_tail(&request->sched.link, &engine->sched_engine->requests);
> +	engine->add_active_request(request);
>   active:
>   	clear_bit(I915_FENCE_FLAG_PQUEUE, &request->fence.flags);
>   	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 717e5b292046..128030f43bbf 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -647,4 +647,6 @@ bool
>   i915_request_active_engine(struct i915_request *rq,
>   			   struct intel_engine_cs **active);
>   
> +void i915_request_notify_execute_cb_imm(struct i915_request *rq);
> +
>   #endif /* I915_REQUEST_H */

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 20:19     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 20:19 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:16, Matthew Brost wrote:
> Reset implementation for new GuC interface. This is the legacy reset
> implementation which is called when the i915 owns the engine hang check.
> Future patches will offload the engine hang check to GuC but we will
> continue to maintain this legacy path as a fallback and this code path
> is also required if the GuC dies.
>
> With the new GuC interface it is not possible to reset individual
> engines - it is only possible to reset the GPU entirely. This patch
> forces an entire chip reset if any engine hangs.
>
> v2:
>   (Michal)
>    - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
> v3:
>   (John H)
>    - Split into a series of smaller patches
While the split happened, it doesn't look like any of the other comments 
were address. Repeated below for clarity. Also, Tvrtko has a bunch of 
outstanding comments too.

>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
>   drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  13 -
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 562 ++++++++++++++----
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  39 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
>   7 files changed, 515 insertions(+), 134 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index aef3084e8b16..463a6ae605a0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
>   	if (intel_gt_is_wedged(gt))
>   		intel_gt_unset_wedged(gt);
>   
> -	intel_uc_sanitize(&gt->uc);
> -
>   	for_each_engine(engine, gt, id)
>   		if (engine->reset.prepare)
>   			engine->reset.prepare(engine);
> @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
>   			__intel_engine_reset(engine, false);
>   	}
>   
> +	intel_uc_reset(&gt->uc, false);
> +
>   	for_each_engine(engine, gt, id)
>   		if (engine->reset.finish)
>   			engine->reset.finish(engine);
> @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
>   		goto err_wedged;
>   	}
>   
> +	intel_uc_reset_finish(&gt->uc);
> +
>   	intel_rps_enable(&gt->rps);
>   	intel_llc_enable(&gt->llc);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 72251638d4ea..2987282dff6d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
>   		__intel_engine_reset(engine, stalled_mask & engine->mask);
>   	local_bh_enable();
>   
> +	intel_uc_reset(&gt->uc, true);
> +
>   	intel_ggtt_restore_fences(gt->ggtt);
>   
>   	return err;
> @@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
>   		if (awake & engine->mask)
>   			intel_engine_pm_put(engine);
>   	}
> +
> +	intel_uc_reset_finish(&gt->uc);
>   }
>   
>   static void nop_submit_request(struct i915_request *request)
> @@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
>   	for_each_engine(engine, gt, id)
>   		if (engine->reset.cancel)
>   			engine->reset.cancel(engine);
> +	intel_uc_cancel_requests(&gt->uc);
>   	local_bh_enable();
>   
>   	reset_finish(gt, awake);
> @@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>   	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
>   	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
>   
> +	if (intel_engine_uses_guc(engine))
> +		return -ENODEV;
> +
>   	if (!intel_engine_pm_get_if_awake(engine))
>   		return 0;
>   
> @@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>   			   "Resetting %s for %s\n", engine->name, msg);
>   	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
>   
> -	if (intel_engine_uses_guc(engine))
> -		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> -	else
> -		ret = intel_gt_reset_engine(engine);
> +	ret = intel_gt_reset_engine(engine);
>   	if (ret) {
>   		/* If we fail here, we expect to fallback to a global reset */
> -		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> +		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
>   		goto out;
>   	}
>   
> @@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
>   	 * Try engine reset when available. We fall back to full reset if
>   	 * single reset fails.
>   	 */
> -	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> +	if (!intel_uc_uses_guc_submission(&gt->uc) &&
> +	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
>   		local_bh_disable();
>   		for_each_engine_masked(engine, gt, engine_mask, tmp) {
>   			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 6661dcb02239..9b09395b998f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc)
>   	return 0;
>   }
>   
> -/**
> - * intel_guc_reset_engine() - ask GuC to reset an engine
> - * @guc:	intel_guc structure
> - * @engine:	engine to be reset
> - */
> -int intel_guc_reset_engine(struct intel_guc *guc,
> -			   struct intel_engine_cs *engine)
> -{
> -	/* XXX: to be implemented with submission interface rework */
> -
> -	return -ENODEV;
> -}
> -
>   /**
>    * intel_guc_resume() - notify GuC resuming from suspend state
>    * @guc:	the guc
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 3cc566565224..d75a76882a44 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   
>   int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
>   
> -int intel_guc_reset_engine(struct intel_guc *guc,
> -			   struct intel_engine_cs *engine);
> -
>   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   					  const u32 *msg, u32 len);
>   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   				     const u32 *msg, u32 len);
>   
> +void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> +void intel_guc_submission_reset_finish(struct intel_guc *guc);
> +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
> +
>   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index f8a6077fa3f9..0f5823922369 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
>   static inline void
>   set_context_wait_for_deregister_to_register(struct intel_context *ce)
>   {
> -	/* Only should be called from guc_lrc_desc_pin() */
> +	/* Only should be called from guc_lrc_desc_pin() without lock */
>   	ce->guc_state.sched_state |=
>   		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
>   }
> @@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
>   
>   static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
>   {
> +	guc->lrc_desc_pool_vaddr = NULL;
>   	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
>   }
>   
> +static inline bool guc_submission_initialized(struct intel_guc *guc)
> +{
> +	return guc->lrc_desc_pool_vaddr != NULL;
> +}
> +
>   static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
>   {
> -	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> +	if (likely(guc_submission_initialized(guc))) {
> +		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> +		unsigned long flags;
>   
> -	memset(desc, 0, sizeof(*desc));
> -	xa_erase_irq(&guc->context_lookup, id);
> +		memset(desc, 0, sizeof(*desc));
> +
> +		/*
> +		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
> +		 * the lower level functions directly.
> +		 */
> +		xa_lock_irqsave(&guc->context_lookup, flags);
> +		__xa_erase(&guc->context_lookup, id);
> +		xa_unlock_irqrestore(&guc->context_lookup, flags);
> +	}
>   }
>   
>   static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> @@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
>   static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   					   struct intel_context *ce)
>   {
> -	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> +	unsigned long flags;
> +
> +	/*
> +	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
> +	 * lower level functions directly.
> +	 */
> +	xa_lock_irqsave(&guc->context_lookup, flags);
> +	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
> +	xa_unlock_irqrestore(&guc->context_lookup, flags);
>   }
>   
>   static int guc_submission_send_busy_loop(struct intel_guc* guc,
> @@ -326,6 +350,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
>   					true, timeout);
>   }
>   
> +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
> +
>   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
>   	int err;
> @@ -333,11 +359,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   	u32 action[3];
>   	int len = 0;
>   	u32 g2h_len_dw = 0;
> -	bool enabled = context_enabled(ce);
> +	bool enabled;
>   
>   	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
>   	GEM_BUG_ON(context_guc_id_invalid(ce));
>   
> +	/*
> +	 * Corner case where the GuC firmware was blown away and reloaded while
> +	 * this context was pinned.
> +	 */
> +	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
> +		err = guc_lrc_desc_pin(ce, false);
> +		if (unlikely(err))
> +			goto out;
> +	}
> +	enabled = context_enabled(ce);
> +
>   	if (!enabled) {
>   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>   		action[len++] = ce->guc_id;
> @@ -360,6 +397,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   		intel_context_put(ce);
>   	}
>   
> +out:
>   	return err;
>   }
>   
> @@ -414,15 +452,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   	if (submit) {
>   		guc_set_lrc_tail(last);
>   resubmit:
> -		/*
> -		 * We only check for -EBUSY here even though it is possible for
> -		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> -		 * died and a full GT reset needs to be done. The hangcheck will
> -		 * eventually detect that the GuC has died and trigger this
> -		 * reset so no need to handle -EDEADLK here.
> -		 */
>   		ret = guc_add_request(guc, last);
> -		if (ret == -EBUSY) {
> +		if (unlikely(ret == -EPIPE))
> +			goto deadlk;
> +		else if (ret == -EBUSY) {
>   			tasklet_schedule(&sched_engine->tasklet);
>   			guc->stalled_request = last;
>   			return false;
> @@ -432,6 +465,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   
>   	guc->stalled_request = NULL;
>   	return submit;
> +
> +deadlk:
> +	sched_engine->tasklet.callback = NULL;
> +	tasklet_disable_nosync(&sched_engine->tasklet);
> +	return false;
>   }
>   
>   static void guc_submission_tasklet(struct tasklet_struct *t)
> @@ -458,27 +496,166 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>   		intel_engine_signal_breadcrumbs(engine);
>   }
>   
> -static void guc_reset_prepare(struct intel_engine_cs *engine)
> +static void __guc_context_destroy(struct intel_context *ce);
> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> +static void guc_signal_context_fence(struct intel_context *ce);
> +
> +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> +{
> +	struct intel_context *ce;
> +	unsigned long index, flags;
> +	bool pending_disable, pending_enable, deregister, destroyed;
> +
> +	xa_for_each(&guc->context_lookup, index, ce) {
> +		/* Flush context */
> +		spin_lock_irqsave(&ce->guc_state.lock, flags);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +		/*
> +		 * Once we are at this point submission_disabled() is guaranteed
> +		 * to visible to all callers who set the below flags (see above
Still needs to be 'to be visible'

> +		 * flush and flushes in reset_prepare). If submission_disabled()
> +		 * is set, the caller shouldn't set these flags.
> +		 */
> +
> +		destroyed = context_destroyed(ce);
> +		pending_enable = context_pending_enable(ce);
> +		pending_disable = context_pending_disable(ce);
> +		deregister = context_wait_for_deregister_to_register(ce);
> +		init_sched_state(ce);
> +
> +		if (pending_enable || destroyed || deregister) {
> +			atomic_dec(&guc->outstanding_submission_g2h);
> +			if (deregister)
> +				guc_signal_context_fence(ce);
> +			if (destroyed) {
> +				release_guc_id(guc, ce);
> +				__guc_context_destroy(ce);
> +			}
> +			if (pending_enable|| deregister)
> +				intel_context_put(ce);
> +		}
> +
> +		/* Not mutualy exclusive with above if statement. */
> +		if (pending_disable) {
> +			guc_signal_context_fence(ce);
> +			intel_context_sched_disable_unpin(ce);
> +			atomic_dec(&guc->outstanding_submission_g2h);
> +			intel_context_put(ce);
> +		}
> +	}
> +}
> +
> +static inline bool
> +submission_disabled(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +
> +	return unlikely(!sched_engine ||
> +			!__tasklet_is_enabled(&sched_engine->tasklet));
> +}
> +
> +static void disable_submission(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +
> +	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
> +		GEM_BUG_ON(!guc->ct.enabled);
> +		__tasklet_disable_sync_once(&sched_engine->tasklet);
> +		sched_engine->tasklet.callback = NULL;
> +	}
> +}
> +
> +static void enable_submission(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->sched_engine->lock, flags);
> +	sched_engine->tasklet.callback = guc_submission_tasklet;
> +	wmb();
> +	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
> +	    __tasklet_enable(&sched_engine->tasklet)) {
> +		GEM_BUG_ON(!guc->ct.enabled);
> +
> +		/* And kick in case we missed a new request submission. */
> +		tasklet_hi_schedule(&sched_engine->tasklet);
> +	}
> +	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
> +}
> +
> +static void guc_flush_submissions(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +void intel_guc_submission_reset_prepare(struct intel_guc *guc)
>   {
> -	ENGINE_TRACE(engine, "\n");
> +	int i;
> +
> +	if (unlikely(!guc_submission_initialized(guc)))
> +		/* Reset called during driver load? GuC not yet initialised! */
> +		return;
Still think this should have braces. Maybe not syntactically required 
but just looks wrong.

> +
> +	disable_submission(guc);
> +	guc->interrupts.disable(guc);
> +
> +	/* Flush IRQ handler */
> +	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
> +	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
> +
> +	guc_flush_submissions(guc);
>   
>   	/*
> -	 * Prevent request submission to the hardware until we have
> -	 * completed the reset in i915_gem_reset_finish(). If a request
> -	 * is completed by one engine, it may then queue a request
> -	 * to a second via its execlists->tasklet *just* as we are
> -	 * calling engine->init_hw() and also writing the ELSP.
> -	 * Turning off the execlists->tasklet until the reset is over
> -	 * prevents the race.
> -	 */
> -	__tasklet_disable_sync_once(&engine->sched_engine->tasklet);
> +	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
> +	 * each pass as interrupt have been disabled. We always scrub for
> +	 * outstanding G2H as it is possible for outstanding_submission_g2h to
> +	 * be incremented after the context state update.
> + 	 */
> +	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
> +		intel_guc_to_host_event_handler(guc);
> +#define wait_for_reset(guc, wait_var) \
> +		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
> +		do {
> +			wait_for_reset(guc, &guc->outstanding_submission_g2h);
> +		} while (!list_empty(&guc->ct.requests.incoming));
> +	}
> +	scrub_guc_desc_for_outstanding_g2h(guc);
> +}
> +
> +static struct intel_engine_cs *
> +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> +{
> +	struct intel_engine_cs *engine;
> +	intel_engine_mask_t tmp, mask = ve->mask;
> +	unsigned int num_siblings = 0;
> +
> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> +		if (num_siblings++ == sibling)
> +			return engine;
> +
> +	return NULL;
> +}
> +
> +static inline struct intel_engine_cs *
> +__context_to_physical_engine(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = ce->engine;
> +
> +	if (intel_engine_is_virtual(engine))
> +		engine = guc_virtual_get_sibling(engine, 0);
> +
> +	return engine;
>   }
>   
> -static void guc_reset_state(struct intel_context *ce,
> -			    struct intel_engine_cs *engine,
> -			    u32 head,
> -			    bool scrub)
> +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
>   {
> +	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> +
>   	GEM_BUG_ON(!intel_context_is_pinned(ce));
>   
>   	/*
> @@ -496,42 +673,147 @@ static void guc_reset_state(struct intel_context *ce,
>   	lrc_update_regs(ce, engine, head);
>   }
>   
> -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> +static void guc_reset_nop(struct intel_engine_cs *engine)
>   {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_request *rq;
> +}
> +
> +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
> +{
> +}
> +
> +static void
> +__unwind_incomplete_requests(struct intel_context *ce)
> +{
> +	struct i915_request *rq, *rn;
> +	struct list_head *pl;
> +	int prio = I915_PRIORITY_INVALID;
> +	struct i915_sched_engine * const sched_engine =
> +		ce->engine->sched_engine;
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	spin_lock(&ce->guc_active.lock);
> +	list_for_each_entry_safe(rq, rn,
> +				 &ce->guc_active.requests,
> +				 sched.link) {
> +		if (i915_request_completed(rq))
> +			continue;
> +
> +		list_del_init(&rq->sched.link);
> +		spin_unlock(&ce->guc_active.lock);
> +
> +		__i915_request_unsubmit(rq);
> +
> +		/* Push the request back into the queue for later resubmission. */
> +		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> +		if (rq_prio(rq) != prio) {
> +			prio = rq_prio(rq);
> +			pl = i915_sched_lookup_priolist(sched_engine, prio);
> +		}
> +		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
>   
> -	/* Push back any incomplete requests for replay after the reset. */
> -	rq = execlists_unwind_incomplete_requests(execlists);
> -	if (!rq)
> -		goto out_unlock;
> +		list_add_tail(&rq->sched.link, pl);
> +		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +
> +		spin_lock(&ce->guc_active.lock);
> +	}
> +	spin_unlock(&ce->guc_active.lock);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +static struct i915_request *context_find_active_request(struct intel_context *ce)
> +{
> +	struct i915_request *rq, *active = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ce->guc_active.lock, flags);
> +	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> +				    sched.link) {
> +		if (i915_request_completed(rq))
> +			break;
> +
> +		active = rq;
> +	}
> +	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> +
> +	return active;
> +}
> +
> +static void __guc_reset_context(struct intel_context *ce, bool stalled)
> +{
> +	struct i915_request *rq;
> +	u32 head;
> +
> +	/*
> +	 * GuC will implicitly mark the context as non-schedulable
> +	 * when it sends the reset notification. Make sure our state
> +	 * reflects this change. The context will be marked enabled
> +	 * on resubmission.
> +	 */
> +	clr_context_enabled(ce);
> +
> +	rq = context_find_active_request(ce);
> +	if (!rq) {
> +		head = ce->ring->tail;
> +		stalled = false;
> +		goto out_replay;
> +	}
>   
>   	if (!i915_request_started(rq))
>   		stalled = false;
>   
> +	GEM_BUG_ON(i915_active_is_idle(&ce->active));
> +	head = intel_ring_wrap(ce->ring, rq->head);
>   	__i915_request_reset(rq, stalled);
> -	guc_reset_state(rq->context, engine, rq->head, stalled);
>   
> -out_unlock:
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +out_replay:
> +	guc_reset_state(ce, head, stalled);
> +	__unwind_incomplete_requests(ce);
> +}
> +
> +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> +{
> +	struct intel_context *ce;
> +	unsigned long index;
> +
> +	if (unlikely(!guc_submission_initialized(guc)))
> +		/* Reset called during driver load? GuC not yet initialised! */
> +		return;
And again.

> +
> +	xa_for_each(&guc->context_lookup, index, ce)
> +		if (intel_context_is_pinned(ce))
> +			__guc_reset_context(ce, stalled);
> +
> +	/* GuC is blown away, drop all references to contexts */
> +	xa_destroy(&guc->context_lookup);
> +}
> +
> +static void guc_cancel_context_requests(struct intel_context *ce)
> +{
> +	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
> +	struct i915_request *rq;
> +	unsigned long flags;
> +
> +	/* Mark all executing requests as skipped. */
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	spin_lock(&ce->guc_active.lock);
> +	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
> +		i915_request_put(i915_request_mark_eio(rq));
> +	spin_unlock(&ce->guc_active.lock);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
> -static void guc_reset_cancel(struct intel_engine_cs *engine)
> +static void
> +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
>   {
> -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>   	struct i915_request *rq, *rn;
>   	struct rb_node *rb;
>   	unsigned long flags;
>   
>   	/* Can be called during boot if GuC fails to load */
> -	if (!engine->gt)
> +	if (!sched_engine)
>   		return;
>   
> -	ENGINE_TRACE(engine, "\n");
> -
>   	/*
>   	 * Before we call engine->cancel_requests(), we should have exclusive
>   	 * access to the submission state. This is arranged for us by the
> @@ -548,21 +830,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	 */
>   	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	/* Mark all executing requests as skipped. */
> -	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
> -		i915_request_set_error_once(rq, -EIO);
> -		i915_request_mark_complete(rq);
> -	}
> -
>   	/* Flush the queued requests to the timeline list (for retiring). */
>   	while ((rb = rb_first_cached(&sched_engine->queue))) {
>   		struct i915_priolist *p = to_priolist(rb);
>   
>   		priolist_for_each_request_consume(rq, rn, p) {
>   			list_del_init(&rq->sched.link);
> +
>   			__i915_request_submit(rq);
> -			dma_fence_set_error(&rq->fence, -EIO);
> -			i915_request_mark_complete(rq);
> +
> +			i915_request_put(i915_request_mark_eio(rq));
>   		}
>   
>   		rb_erase_cached(&p->node, &sched_engine->queue);
> @@ -577,14 +854,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
> -static void guc_reset_finish(struct intel_engine_cs *engine)
> +void intel_guc_submission_cancel_requests(struct intel_guc *guc)
>   {
> -	if (__tasklet_enable(&engine->sched_engine->tasklet))
> -		/* And kick in case we missed a new request submission. */
> -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> +	struct intel_context *ce;
> +	unsigned long index;
>   
> -	ENGINE_TRACE(engine, "depth->%d\n",
> -		     atomic_read(&engine->sched_engine->tasklet.count));
> +	xa_for_each(&guc->context_lookup, index, ce)
> +		if (intel_context_is_pinned(ce))
> +			guc_cancel_context_requests(ce);
> +
> +	guc_cancel_sched_engine_requests(guc->sched_engine);
> +
> +	/* GuC is blown away, drop all references to contexts */
> +	xa_destroy(&guc->context_lookup);
And again, most of this code is common to 'intel_guc_submission_reset' 
so could be moved to a shared helper function.

John.

> +}
> +
> +void intel_guc_submission_reset_finish(struct intel_guc *guc)
> +{
> +	/* Reset called during driver load or during wedge? */
> +	if (unlikely(!guc_submission_initialized(guc) ||
> +		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
> +		return;
> +
> +	/*
> +	 * Technically possible for either of these values to be non-zero here,
> +	 * but very unlikely + harmless. Regardless let's add a warn so we can
> +	 * see in CI if this happens frequently / a precursor to taking down the
> +	 * machine.
> +	 */
> +	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
> +	atomic_set(&guc->outstanding_submission_g2h, 0);
> +
> +	enable_submission(guc);
>   }
>   
>   /*
> @@ -651,6 +952,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>   	else
>   		trace_i915_request_guc_submit(rq);
>   
> +	if (unlikely(ret == -EPIPE))
> +		disable_submission(guc);
> +
>   	return ret;
>   }
>   
> @@ -663,7 +967,8 @@ static void guc_submit_request(struct i915_request *rq)
>   	/* Will be called from irq-context when using foreign fences. */
>   	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +	if (submission_disabled(guc) || guc->stalled_request ||
> +	    !i915_sched_engine_is_empty(sched_engine))
>   		queue_request(sched_engine, rq, rq_prio(rq));
>   	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>   		tasklet_hi_schedule(&sched_engine->tasklet);
> @@ -800,7 +1105,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>   
>   static int __guc_action_register_context(struct intel_guc *guc,
>   					 u32 guc_id,
> -					 u32 offset)
> +					 u32 offset,
> +					 bool loop)
>   {
>   	u32 action[] = {
>   		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> @@ -809,10 +1115,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
>   	};
>   
>   	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> -					     0, true);
> +					     0, loop);
>   }
>   
> -static int register_context(struct intel_context *ce)
> +static int register_context(struct intel_context *ce, bool loop)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
>   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> @@ -820,11 +1126,12 @@ static int register_context(struct intel_context *ce)
>   
>   	trace_intel_context_register(ce);
>   
> -	return __guc_action_register_context(guc, ce->guc_id, offset);
> +	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
>   }
>   
>   static int __guc_action_deregister_context(struct intel_guc *guc,
> -					   u32 guc_id)
> +					   u32 guc_id,
> +					   bool loop)
>   {
>   	u32 action[] = {
>   		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> @@ -833,16 +1140,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>   
>   	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>   					     G2H_LEN_DW_DEREGISTER_CONTEXT,
> -					     true);
> +					     loop);
>   }
>   
> -static int deregister_context(struct intel_context *ce, u32 guc_id)
> +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
>   
>   	trace_intel_context_deregister(ce);
>   
> -	return __guc_action_deregister_context(guc, guc_id);
> +	return __guc_action_deregister_context(guc, guc_id, loop);
>   }
>   
>   static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> @@ -871,7 +1178,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>   	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
>   }
>   
> -static int guc_lrc_desc_pin(struct intel_context *ce)
> +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   {
>   	struct intel_runtime_pm *runtime_pm =
>   		&ce->engine->gt->i915->runtime_pm;
> @@ -917,18 +1224,45 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
>   	 */
>   	if (context_registered) {
>   		trace_intel_context_steal_guc_id(ce);
> -		set_context_wait_for_deregister_to_register(ce);
> -		intel_context_get(ce);
> +		if (!loop) {
> +			set_context_wait_for_deregister_to_register(ce);
> +			intel_context_get(ce);
> +		} else {
> +			bool disabled;
> +			unsigned long flags;
> +
> +			/* Seal race with Reset */
> +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> +			disabled = submission_disabled(guc);
> +			if (likely(!disabled)) {
> +				set_context_wait_for_deregister_to_register(ce);
> +				intel_context_get(ce);
> +			}
> +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +			if (unlikely(disabled)) {
> +				reset_lrc_desc(guc, desc_idx);
> +				return 0;	/* Will get registered later */
> +			}
> +		}
>   
>   		/*
>   		 * If stealing the guc_id, this ce has the same guc_id as the
>   		 * context whos guc_id was stole.
>   		 */
>   		with_intel_runtime_pm(runtime_pm, wakeref)
> -			ret = deregister_context(ce, ce->guc_id);
> +			ret = deregister_context(ce, ce->guc_id, loop);
> +		if (unlikely(ret == -EBUSY)) {
> +			clr_context_wait_for_deregister_to_register(ce);
> +			intel_context_put(ce);
> +		} else if (unlikely(ret == -ENODEV))
> +			ret = 0;	/* Will get registered later */
>   	} else {
>   		with_intel_runtime_pm(runtime_pm, wakeref)
> -			ret = register_context(ce);
> +			ret = register_context(ce, loop);
> +		if (unlikely(ret == -EBUSY))
> +			reset_lrc_desc(guc, desc_idx);
> +		else if (unlikely(ret == -ENODEV))
> +			ret = 0;	/* Will get registered later */
>   	}
>   
>   	return ret;
> @@ -991,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>   	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
>   
>   	trace_intel_context_sched_disable(ce);
> -	intel_context_get(ce);
>   
>   	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>   				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> @@ -1003,6 +1336,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
>   
>   	set_context_pending_disable(ce);
>   	clr_context_enabled(ce);
> +	intel_context_get(ce);
>   
>   	return ce->guc_id;
>   }
> @@ -1015,7 +1349,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   	u16 guc_id;
>   	intel_wakeref_t wakeref;
>   
> -	if (context_guc_id_invalid(ce) ||
> +	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
>   		clr_context_enabled(ce);
>   		goto unpin;
> @@ -1036,6 +1370,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   	 * a request before we set the 'context_pending_disable' flag here.
>   	 */
>   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   		return;
>   	}
>   	guc_id = prep_context_pending_disable(ce);
> @@ -1052,19 +1387,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   
>   static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>   {
> -	struct intel_engine_cs *engine = ce->engine;
> -	struct intel_guc *guc = &engine->gt->uc.guc;
> -	unsigned long flags;
> +	struct intel_guc *guc = ce_to_guc(ce);
>   
>   	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
>   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
>   	GEM_BUG_ON(context_enabled(ce));
>   
> -	spin_lock_irqsave(&ce->guc_state.lock, flags);
> -	set_context_destroyed(ce);
> -	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> -
> -	deregister_context(ce, ce->guc_id);
> +	deregister_context(ce, ce->guc_id, true);
>   }
>   
>   static void __guc_context_destroy(struct intel_context *ce)
> @@ -1092,13 +1421,15 @@ static void guc_context_destroy(struct kref *kref)
>   	struct intel_guc *guc = &ce->engine->gt->uc.guc;
>   	intel_wakeref_t wakeref;
>   	unsigned long flags;
> +	bool disabled;
>   
>   	/*
>   	 * If the guc_id is invalid this context has been stolen and we can free
>   	 * it immediately. Also can be freed immediately if the context is not
>   	 * registered with the GuC.
>   	 */
> -	if (context_guc_id_invalid(ce) ||
> +	if (submission_disabled(guc) ||
> +	    context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
>   		release_guc_id(guc, ce);
>   		__guc_context_destroy(ce);
> @@ -1125,6 +1456,18 @@ static void guc_context_destroy(struct kref *kref)
>   		list_del_init(&ce->guc_id_link);
>   	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>   
> +	/* Seal race with Reset */
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	disabled = submission_disabled(guc);
> +	if (likely(!disabled))
> +		set_context_destroyed(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +	if (unlikely(disabled)) {
> +		release_guc_id(guc, ce);
> +		__guc_context_destroy(ce);
> +		return;
> +	}
> +
>   	/*
>   	 * We defer GuC context deregistration until the context is destroyed
>   	 * in order to save on CTBs. With this optimization ideally we only need
> @@ -1212,8 +1555,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
>   {
>   	unsigned long flags;
>   
> -	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
> -
>   	spin_lock_irqsave(&ce->guc_state.lock, flags);
>   	clr_context_wait_for_deregister_to_register(ce);
>   	__guc_signal_context_fence(ce);
> @@ -1222,8 +1563,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
>   
>   static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>   {
> -	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> -		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> +	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
> +		!submission_disabled(ce_to_guc(ce));
>   }
>   
>   static int guc_request_alloc(struct i915_request *rq)
> @@ -1281,8 +1623,12 @@ static int guc_request_alloc(struct i915_request *rq)
>   	if (unlikely(ret < 0))
>   		return ret;;
>   	if (context_needs_register(ce, !!ret)) {
> -		ret = guc_lrc_desc_pin(ce);
> +		ret = guc_lrc_desc_pin(ce, true);
>   		if (unlikely(ret)) {	/* unwind */
> +			if (ret == -EPIPE) {
> +				disable_submission(guc);
> +				goto out;	/* GPU will be reset */
> +			}
>   			atomic_dec(&ce->guc_id_ref);
>   			unpin_guc_id(guc, ce);
>   			return ret;
> @@ -1319,20 +1665,6 @@ static int guc_request_alloc(struct i915_request *rq)
>   	return 0;
>   }
>   
> -static struct intel_engine_cs *
> -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> -{
> -	struct intel_engine_cs *engine;
> -	intel_engine_mask_t tmp, mask = ve->mask;
> -	unsigned int num_siblings = 0;
> -
> -	for_each_engine_masked(engine, ve->gt, mask, tmp)
> -		if (num_siblings++ == sibling)
> -			return engine;
> -
> -	return NULL;
> -}
> -
>   static int guc_virtual_context_pre_pin(struct intel_context *ce,
>   				       struct i915_gem_ww_ctx *ww,
>   				       void **vaddr)
> @@ -1528,7 +1860,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
>   {
>   	if (context_guc_id_invalid(ce))
>   		pin_guc_id(guc, ce);
> -	guc_lrc_desc_pin(ce);
> +	guc_lrc_desc_pin(ce, true);
>   }
>   
>   static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> @@ -1599,10 +1931,10 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->sched_engine->schedule = i915_schedule;
>   
> -	engine->reset.prepare = guc_reset_prepare;
> -	engine->reset.rewind = guc_reset_rewind;
> -	engine->reset.cancel = guc_reset_cancel;
> -	engine->reset.finish = guc_reset_finish;
> +	engine->reset.prepare = guc_reset_nop;
> +	engine->reset.rewind = guc_rewind_nop;
> +	engine->reset.cancel = guc_reset_nop;
> +	engine->reset.finish = guc_reset_nop;
>   
>   	engine->emit_flush = gen8_emit_flush_xcs;
>   	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
> @@ -1651,6 +1983,17 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
>   	intel_engine_set_irq_handler(engine, cs_irq_handler);
>   }
>   
> +static void guc_sched_engine_destroy(struct kref *kref)
> +{
> +	struct i915_sched_engine *sched_engine =
> +		container_of(kref, typeof(*sched_engine), ref);
> +	struct intel_guc *guc = sched_engine->private_data;
> +
> +	guc->sched_engine = NULL;
> +	tasklet_kill(&sched_engine->tasklet); /* flush the callback */
> +	kfree(sched_engine);
> +}
> +
>   int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> @@ -1669,6 +2012,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   		guc->sched_engine->schedule = i915_schedule;
>   		guc->sched_engine->private_data = guc;
> +		guc->sched_engine->destroy = guc_sched_engine_destroy;
>   		tasklet_setup(&guc->sched_engine->tasklet,
>   			      guc_submission_tasklet);
>   	}
> @@ -1775,7 +2119,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   		 * register this context.
>   		 */
>   		with_intel_runtime_pm(runtime_pm, wakeref)
> -			register_context(ce);
> +			register_context(ce, true);
>   		guc_signal_context_fence(ce);
>   		intel_context_put(ce);
>   	} else if (context_destroyed(ce)) {
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 6d8b9233214e..f0b02200aa01 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
>   {
>   	struct intel_guc *guc = &uc->guc;
>   
> -	if (!intel_guc_is_ready(guc))
> +
> +	/* Nothing to do if GuC isn't supported */
> +	if (!intel_uc_supports_guc(uc))
>   		return;
>   
> +	/* Firmware expected to be running when this function is called */
> +	if (!intel_guc_is_ready(guc))
> +		goto sanitize;
> +
> +	if (intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_reset_prepare(guc);
> +
> +sanitize:
>   	__uc_sanitize(uc);
>   }
>   
> +void intel_uc_reset(struct intel_uc *uc, bool stalled)
> +{
> +	struct intel_guc *guc = &uc->guc;
> +
> +	/* Firmware can not be running when this function is called  */
> +	if (intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_reset(guc, stalled);
> +}
> +
> +void intel_uc_reset_finish(struct intel_uc *uc)
> +{
> +	struct intel_guc *guc = &uc->guc;
> +
> +	/* Firmware expected to be running when this function is called */
> +	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_reset_finish(guc);
> +}
> +
> +void intel_uc_cancel_requests(struct intel_uc *uc)
> +{
> +	struct intel_guc *guc = &uc->guc;
> +
> +	/* Firmware can not be running when this function is called  */
> +	if (intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_cancel_requests(guc);
> +}
> +
>   void intel_uc_runtime_suspend(struct intel_uc *uc)
>   {
>   	struct intel_guc *guc = &uc->guc;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index c4cef885e984..eaa3202192ac 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
>   void intel_uc_driver_remove(struct intel_uc *uc);
>   void intel_uc_init_mmio(struct intel_uc *uc);
>   void intel_uc_reset_prepare(struct intel_uc *uc);
> +void intel_uc_reset(struct intel_uc *uc, bool stalled);
> +void intel_uc_reset_finish(struct intel_uc *uc);
> +void intel_uc_cancel_requests(struct intel_uc *uc);
>   void intel_uc_suspend(struct intel_uc *uc);
>   void intel_uc_runtime_suspend(struct intel_uc *uc);
>   int intel_uc_resume(struct intel_uc *uc);


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface
@ 2021-07-20 20:19     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 20:19 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:16, Matthew Brost wrote:
> Reset implementation for new GuC interface. This is the legacy reset
> implementation which is called when the i915 owns the engine hang check.
> Future patches will offload the engine hang check to GuC but we will
> continue to maintain this legacy path as a fallback and this code path
> is also required if the GuC dies.
>
> With the new GuC interface it is not possible to reset individual
> engines - it is only possible to reset the GPU entirely. This patch
> forces an entire chip reset if any engine hangs.
>
> v2:
>   (Michal)
>    - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
> v3:
>   (John H)
>    - Split into a series of smaller patches
While the split happened, it doesn't look like any of the other comments 
were address. Repeated below for clarity. Also, Tvrtko has a bunch of 
outstanding comments too.

>
> Cc: John Harrison <john.c.harrison@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
>   drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
>   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  13 -
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 562 ++++++++++++++----
>   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  39 +-
>   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
>   7 files changed, 515 insertions(+), 134 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index aef3084e8b16..463a6ae605a0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
>   	if (intel_gt_is_wedged(gt))
>   		intel_gt_unset_wedged(gt);
>   
> -	intel_uc_sanitize(&gt->uc);
> -
>   	for_each_engine(engine, gt, id)
>   		if (engine->reset.prepare)
>   			engine->reset.prepare(engine);
> @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
>   			__intel_engine_reset(engine, false);
>   	}
>   
> +	intel_uc_reset(&gt->uc, false);
> +
>   	for_each_engine(engine, gt, id)
>   		if (engine->reset.finish)
>   			engine->reset.finish(engine);
> @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
>   		goto err_wedged;
>   	}
>   
> +	intel_uc_reset_finish(&gt->uc);
> +
>   	intel_rps_enable(&gt->rps);
>   	intel_llc_enable(&gt->llc);
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 72251638d4ea..2987282dff6d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
>   		__intel_engine_reset(engine, stalled_mask & engine->mask);
>   	local_bh_enable();
>   
> +	intel_uc_reset(&gt->uc, true);
> +
>   	intel_ggtt_restore_fences(gt->ggtt);
>   
>   	return err;
> @@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
>   		if (awake & engine->mask)
>   			intel_engine_pm_put(engine);
>   	}
> +
> +	intel_uc_reset_finish(&gt->uc);
>   }
>   
>   static void nop_submit_request(struct i915_request *request)
> @@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
>   	for_each_engine(engine, gt, id)
>   		if (engine->reset.cancel)
>   			engine->reset.cancel(engine);
> +	intel_uc_cancel_requests(&gt->uc);
>   	local_bh_enable();
>   
>   	reset_finish(gt, awake);
> @@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>   	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
>   	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
>   
> +	if (intel_engine_uses_guc(engine))
> +		return -ENODEV;
> +
>   	if (!intel_engine_pm_get_if_awake(engine))
>   		return 0;
>   
> @@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
>   			   "Resetting %s for %s\n", engine->name, msg);
>   	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
>   
> -	if (intel_engine_uses_guc(engine))
> -		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> -	else
> -		ret = intel_gt_reset_engine(engine);
> +	ret = intel_gt_reset_engine(engine);
>   	if (ret) {
>   		/* If we fail here, we expect to fallback to a global reset */
> -		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> +		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
>   		goto out;
>   	}
>   
> @@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
>   	 * Try engine reset when available. We fall back to full reset if
>   	 * single reset fails.
>   	 */
> -	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> +	if (!intel_uc_uses_guc_submission(&gt->uc) &&
> +	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
>   		local_bh_disable();
>   		for_each_engine_masked(engine, gt, engine_mask, tmp) {
>   			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index 6661dcb02239..9b09395b998f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc)
>   	return 0;
>   }
>   
> -/**
> - * intel_guc_reset_engine() - ask GuC to reset an engine
> - * @guc:	intel_guc structure
> - * @engine:	engine to be reset
> - */
> -int intel_guc_reset_engine(struct intel_guc *guc,
> -			   struct intel_engine_cs *engine)
> -{
> -	/* XXX: to be implemented with submission interface rework */
> -
> -	return -ENODEV;
> -}
> -
>   /**
>    * intel_guc_resume() - notify GuC resuming from suspend state
>    * @guc:	the guc
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 3cc566565224..d75a76882a44 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>   
>   int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
>   
> -int intel_guc_reset_engine(struct intel_guc *guc,
> -			   struct intel_engine_cs *engine);
> -
>   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   					  const u32 *msg, u32 len);
>   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   				     const u32 *msg, u32 len);
>   
> +void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> +void intel_guc_submission_reset_finish(struct intel_guc *guc);
> +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
> +
>   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>   
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index f8a6077fa3f9..0f5823922369 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
>   static inline void
>   set_context_wait_for_deregister_to_register(struct intel_context *ce)
>   {
> -	/* Only should be called from guc_lrc_desc_pin() */
> +	/* Only should be called from guc_lrc_desc_pin() without lock */
>   	ce->guc_state.sched_state |=
>   		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
>   }
> @@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
>   
>   static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
>   {
> +	guc->lrc_desc_pool_vaddr = NULL;
>   	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
>   }
>   
> +static inline bool guc_submission_initialized(struct intel_guc *guc)
> +{
> +	return guc->lrc_desc_pool_vaddr != NULL;
> +}
> +
>   static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
>   {
> -	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> +	if (likely(guc_submission_initialized(guc))) {
> +		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> +		unsigned long flags;
>   
> -	memset(desc, 0, sizeof(*desc));
> -	xa_erase_irq(&guc->context_lookup, id);
> +		memset(desc, 0, sizeof(*desc));
> +
> +		/*
> +		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
> +		 * the lower level functions directly.
> +		 */
> +		xa_lock_irqsave(&guc->context_lookup, flags);
> +		__xa_erase(&guc->context_lookup, id);
> +		xa_unlock_irqrestore(&guc->context_lookup, flags);
> +	}
>   }
>   
>   static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> @@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
>   static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
>   					   struct intel_context *ce)
>   {
> -	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> +	unsigned long flags;
> +
> +	/*
> +	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
> +	 * lower level functions directly.
> +	 */
> +	xa_lock_irqsave(&guc->context_lookup, flags);
> +	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
> +	xa_unlock_irqrestore(&guc->context_lookup, flags);
>   }
>   
>   static int guc_submission_send_busy_loop(struct intel_guc* guc,
> @@ -326,6 +350,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
>   					true, timeout);
>   }
>   
> +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
> +
>   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
>   	int err;
> @@ -333,11 +359,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   	u32 action[3];
>   	int len = 0;
>   	u32 g2h_len_dw = 0;
> -	bool enabled = context_enabled(ce);
> +	bool enabled;
>   
>   	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
>   	GEM_BUG_ON(context_guc_id_invalid(ce));
>   
> +	/*
> +	 * Corner case where the GuC firmware was blown away and reloaded while
> +	 * this context was pinned.
> +	 */
> +	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
> +		err = guc_lrc_desc_pin(ce, false);
> +		if (unlikely(err))
> +			goto out;
> +	}
> +	enabled = context_enabled(ce);
> +
>   	if (!enabled) {
>   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>   		action[len++] = ce->guc_id;
> @@ -360,6 +397,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   		intel_context_put(ce);
>   	}
>   
> +out:
>   	return err;
>   }
>   
> @@ -414,15 +452,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   	if (submit) {
>   		guc_set_lrc_tail(last);
>   resubmit:
> -		/*
> -		 * We only check for -EBUSY here even though it is possible for
> -		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> -		 * died and a full GT reset needs to be done. The hangcheck will
> -		 * eventually detect that the GuC has died and trigger this
> -		 * reset so no need to handle -EDEADLK here.
> -		 */
>   		ret = guc_add_request(guc, last);
> -		if (ret == -EBUSY) {
> +		if (unlikely(ret == -EPIPE))
> +			goto deadlk;
> +		else if (ret == -EBUSY) {
>   			tasklet_schedule(&sched_engine->tasklet);
>   			guc->stalled_request = last;
>   			return false;
> @@ -432,6 +465,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   
>   	guc->stalled_request = NULL;
>   	return submit;
> +
> +deadlk:
> +	sched_engine->tasklet.callback = NULL;
> +	tasklet_disable_nosync(&sched_engine->tasklet);
> +	return false;
>   }
>   
>   static void guc_submission_tasklet(struct tasklet_struct *t)
> @@ -458,27 +496,166 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>   		intel_engine_signal_breadcrumbs(engine);
>   }
>   
> -static void guc_reset_prepare(struct intel_engine_cs *engine)
> +static void __guc_context_destroy(struct intel_context *ce);
> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> +static void guc_signal_context_fence(struct intel_context *ce);
> +
> +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> +{
> +	struct intel_context *ce;
> +	unsigned long index, flags;
> +	bool pending_disable, pending_enable, deregister, destroyed;
> +
> +	xa_for_each(&guc->context_lookup, index, ce) {
> +		/* Flush context */
> +		spin_lock_irqsave(&ce->guc_state.lock, flags);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +		/*
> +		 * Once we are at this point submission_disabled() is guaranteed
> +		 * to visible to all callers who set the below flags (see above
Still needs to be 'to be visible'

> +		 * flush and flushes in reset_prepare). If submission_disabled()
> +		 * is set, the caller shouldn't set these flags.
> +		 */
> +
> +		destroyed = context_destroyed(ce);
> +		pending_enable = context_pending_enable(ce);
> +		pending_disable = context_pending_disable(ce);
> +		deregister = context_wait_for_deregister_to_register(ce);
> +		init_sched_state(ce);
> +
> +		if (pending_enable || destroyed || deregister) {
> +			atomic_dec(&guc->outstanding_submission_g2h);
> +			if (deregister)
> +				guc_signal_context_fence(ce);
> +			if (destroyed) {
> +				release_guc_id(guc, ce);
> +				__guc_context_destroy(ce);
> +			}
> +			if (pending_enable|| deregister)
> +				intel_context_put(ce);
> +		}
> +
> +		/* Not mutualy exclusive with above if statement. */
> +		if (pending_disable) {
> +			guc_signal_context_fence(ce);
> +			intel_context_sched_disable_unpin(ce);
> +			atomic_dec(&guc->outstanding_submission_g2h);
> +			intel_context_put(ce);
> +		}
> +	}
> +}
> +
> +static inline bool
> +submission_disabled(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +
> +	return unlikely(!sched_engine ||
> +			!__tasklet_is_enabled(&sched_engine->tasklet));
> +}
> +
> +static void disable_submission(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +
> +	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
> +		GEM_BUG_ON(!guc->ct.enabled);
> +		__tasklet_disable_sync_once(&sched_engine->tasklet);
> +		sched_engine->tasklet.callback = NULL;
> +	}
> +}
> +
> +static void enable_submission(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&guc->sched_engine->lock, flags);
> +	sched_engine->tasklet.callback = guc_submission_tasklet;
> +	wmb();
> +	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
> +	    __tasklet_enable(&sched_engine->tasklet)) {
> +		GEM_BUG_ON(!guc->ct.enabled);
> +
> +		/* And kick in case we missed a new request submission. */
> +		tasklet_hi_schedule(&sched_engine->tasklet);
> +	}
> +	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
> +}
> +
> +static void guc_flush_submissions(struct intel_guc *guc)
> +{
> +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +void intel_guc_submission_reset_prepare(struct intel_guc *guc)
>   {
> -	ENGINE_TRACE(engine, "\n");
> +	int i;
> +
> +	if (unlikely(!guc_submission_initialized(guc)))
> +		/* Reset called during driver load? GuC not yet initialised! */
> +		return;
Still think this should have braces. Maybe not syntactically required 
but just looks wrong.

> +
> +	disable_submission(guc);
> +	guc->interrupts.disable(guc);
> +
> +	/* Flush IRQ handler */
> +	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
> +	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
> +
> +	guc_flush_submissions(guc);
>   
>   	/*
> -	 * Prevent request submission to the hardware until we have
> -	 * completed the reset in i915_gem_reset_finish(). If a request
> -	 * is completed by one engine, it may then queue a request
> -	 * to a second via its execlists->tasklet *just* as we are
> -	 * calling engine->init_hw() and also writing the ELSP.
> -	 * Turning off the execlists->tasklet until the reset is over
> -	 * prevents the race.
> -	 */
> -	__tasklet_disable_sync_once(&engine->sched_engine->tasklet);
> +	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
> +	 * each pass as interrupt have been disabled. We always scrub for
> +	 * outstanding G2H as it is possible for outstanding_submission_g2h to
> +	 * be incremented after the context state update.
> + 	 */
> +	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
> +		intel_guc_to_host_event_handler(guc);
> +#define wait_for_reset(guc, wait_var) \
> +		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
> +		do {
> +			wait_for_reset(guc, &guc->outstanding_submission_g2h);
> +		} while (!list_empty(&guc->ct.requests.incoming));
> +	}
> +	scrub_guc_desc_for_outstanding_g2h(guc);
> +}
> +
> +static struct intel_engine_cs *
> +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> +{
> +	struct intel_engine_cs *engine;
> +	intel_engine_mask_t tmp, mask = ve->mask;
> +	unsigned int num_siblings = 0;
> +
> +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> +		if (num_siblings++ == sibling)
> +			return engine;
> +
> +	return NULL;
> +}
> +
> +static inline struct intel_engine_cs *
> +__context_to_physical_engine(struct intel_context *ce)
> +{
> +	struct intel_engine_cs *engine = ce->engine;
> +
> +	if (intel_engine_is_virtual(engine))
> +		engine = guc_virtual_get_sibling(engine, 0);
> +
> +	return engine;
>   }
>   
> -static void guc_reset_state(struct intel_context *ce,
> -			    struct intel_engine_cs *engine,
> -			    u32 head,
> -			    bool scrub)
> +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
>   {
> +	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> +
>   	GEM_BUG_ON(!intel_context_is_pinned(ce));
>   
>   	/*
> @@ -496,42 +673,147 @@ static void guc_reset_state(struct intel_context *ce,
>   	lrc_update_regs(ce, engine, head);
>   }
>   
> -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> +static void guc_reset_nop(struct intel_engine_cs *engine)
>   {
> -	struct intel_engine_execlists * const execlists = &engine->execlists;
> -	struct i915_request *rq;
> +}
> +
> +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
> +{
> +}
> +
> +static void
> +__unwind_incomplete_requests(struct intel_context *ce)
> +{
> +	struct i915_request *rq, *rn;
> +	struct list_head *pl;
> +	int prio = I915_PRIORITY_INVALID;
> +	struct i915_sched_engine * const sched_engine =
> +		ce->engine->sched_engine;
>   	unsigned long flags;
>   
> -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	spin_lock(&ce->guc_active.lock);
> +	list_for_each_entry_safe(rq, rn,
> +				 &ce->guc_active.requests,
> +				 sched.link) {
> +		if (i915_request_completed(rq))
> +			continue;
> +
> +		list_del_init(&rq->sched.link);
> +		spin_unlock(&ce->guc_active.lock);
> +
> +		__i915_request_unsubmit(rq);
> +
> +		/* Push the request back into the queue for later resubmission. */
> +		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> +		if (rq_prio(rq) != prio) {
> +			prio = rq_prio(rq);
> +			pl = i915_sched_lookup_priolist(sched_engine, prio);
> +		}
> +		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
>   
> -	/* Push back any incomplete requests for replay after the reset. */
> -	rq = execlists_unwind_incomplete_requests(execlists);
> -	if (!rq)
> -		goto out_unlock;
> +		list_add_tail(&rq->sched.link, pl);
> +		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +
> +		spin_lock(&ce->guc_active.lock);
> +	}
> +	spin_unlock(&ce->guc_active.lock);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +static struct i915_request *context_find_active_request(struct intel_context *ce)
> +{
> +	struct i915_request *rq, *active = NULL;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ce->guc_active.lock, flags);
> +	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> +				    sched.link) {
> +		if (i915_request_completed(rq))
> +			break;
> +
> +		active = rq;
> +	}
> +	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> +
> +	return active;
> +}
> +
> +static void __guc_reset_context(struct intel_context *ce, bool stalled)
> +{
> +	struct i915_request *rq;
> +	u32 head;
> +
> +	/*
> +	 * GuC will implicitly mark the context as non-schedulable
> +	 * when it sends the reset notification. Make sure our state
> +	 * reflects this change. The context will be marked enabled
> +	 * on resubmission.
> +	 */
> +	clr_context_enabled(ce);
> +
> +	rq = context_find_active_request(ce);
> +	if (!rq) {
> +		head = ce->ring->tail;
> +		stalled = false;
> +		goto out_replay;
> +	}
>   
>   	if (!i915_request_started(rq))
>   		stalled = false;
>   
> +	GEM_BUG_ON(i915_active_is_idle(&ce->active));
> +	head = intel_ring_wrap(ce->ring, rq->head);
>   	__i915_request_reset(rq, stalled);
> -	guc_reset_state(rq->context, engine, rq->head, stalled);
>   
> -out_unlock:
> -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> +out_replay:
> +	guc_reset_state(ce, head, stalled);
> +	__unwind_incomplete_requests(ce);
> +}
> +
> +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> +{
> +	struct intel_context *ce;
> +	unsigned long index;
> +
> +	if (unlikely(!guc_submission_initialized(guc)))
> +		/* Reset called during driver load? GuC not yet initialised! */
> +		return;
And again.

> +
> +	xa_for_each(&guc->context_lookup, index, ce)
> +		if (intel_context_is_pinned(ce))
> +			__guc_reset_context(ce, stalled);
> +
> +	/* GuC is blown away, drop all references to contexts */
> +	xa_destroy(&guc->context_lookup);
> +}
> +
> +static void guc_cancel_context_requests(struct intel_context *ce)
> +{
> +	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
> +	struct i915_request *rq;
> +	unsigned long flags;
> +
> +	/* Mark all executing requests as skipped. */
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	spin_lock(&ce->guc_active.lock);
> +	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
> +		i915_request_put(i915_request_mark_eio(rq));
> +	spin_unlock(&ce->guc_active.lock);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
> -static void guc_reset_cancel(struct intel_engine_cs *engine)
> +static void
> +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
>   {
> -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
>   	struct i915_request *rq, *rn;
>   	struct rb_node *rb;
>   	unsigned long flags;
>   
>   	/* Can be called during boot if GuC fails to load */
> -	if (!engine->gt)
> +	if (!sched_engine)
>   		return;
>   
> -	ENGINE_TRACE(engine, "\n");
> -
>   	/*
>   	 * Before we call engine->cancel_requests(), we should have exclusive
>   	 * access to the submission state. This is arranged for us by the
> @@ -548,21 +830,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	 */
>   	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	/* Mark all executing requests as skipped. */
> -	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
> -		i915_request_set_error_once(rq, -EIO);
> -		i915_request_mark_complete(rq);
> -	}
> -
>   	/* Flush the queued requests to the timeline list (for retiring). */
>   	while ((rb = rb_first_cached(&sched_engine->queue))) {
>   		struct i915_priolist *p = to_priolist(rb);
>   
>   		priolist_for_each_request_consume(rq, rn, p) {
>   			list_del_init(&rq->sched.link);
> +
>   			__i915_request_submit(rq);
> -			dma_fence_set_error(&rq->fence, -EIO);
> -			i915_request_mark_complete(rq);
> +
> +			i915_request_put(i915_request_mark_eio(rq));
>   		}
>   
>   		rb_erase_cached(&p->node, &sched_engine->queue);
> @@ -577,14 +854,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
>   	spin_unlock_irqrestore(&sched_engine->lock, flags);
>   }
>   
> -static void guc_reset_finish(struct intel_engine_cs *engine)
> +void intel_guc_submission_cancel_requests(struct intel_guc *guc)
>   {
> -	if (__tasklet_enable(&engine->sched_engine->tasklet))
> -		/* And kick in case we missed a new request submission. */
> -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> +	struct intel_context *ce;
> +	unsigned long index;
>   
> -	ENGINE_TRACE(engine, "depth->%d\n",
> -		     atomic_read(&engine->sched_engine->tasklet.count));
> +	xa_for_each(&guc->context_lookup, index, ce)
> +		if (intel_context_is_pinned(ce))
> +			guc_cancel_context_requests(ce);
> +
> +	guc_cancel_sched_engine_requests(guc->sched_engine);
> +
> +	/* GuC is blown away, drop all references to contexts */
> +	xa_destroy(&guc->context_lookup);
And again, most of this code is common to 'intel_guc_submission_reset' 
so could be moved to a shared helper function.

John.

> +}
> +
> +void intel_guc_submission_reset_finish(struct intel_guc *guc)
> +{
> +	/* Reset called during driver load or during wedge? */
> +	if (unlikely(!guc_submission_initialized(guc) ||
> +		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
> +		return;
> +
> +	/*
> +	 * Technically possible for either of these values to be non-zero here,
> +	 * but very unlikely + harmless. Regardless let's add a warn so we can
> +	 * see in CI if this happens frequently / a precursor to taking down the
> +	 * machine.
> +	 */
> +	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
> +	atomic_set(&guc->outstanding_submission_g2h, 0);
> +
> +	enable_submission(guc);
>   }
>   
>   /*
> @@ -651,6 +952,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>   	else
>   		trace_i915_request_guc_submit(rq);
>   
> +	if (unlikely(ret == -EPIPE))
> +		disable_submission(guc);
> +
>   	return ret;
>   }
>   
> @@ -663,7 +967,8 @@ static void guc_submit_request(struct i915_request *rq)
>   	/* Will be called from irq-context when using foreign fences. */
>   	spin_lock_irqsave(&sched_engine->lock, flags);
>   
> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> +	if (submission_disabled(guc) || guc->stalled_request ||
> +	    !i915_sched_engine_is_empty(sched_engine))
>   		queue_request(sched_engine, rq, rq_prio(rq));
>   	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>   		tasklet_hi_schedule(&sched_engine->tasklet);
> @@ -800,7 +1105,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>   
>   static int __guc_action_register_context(struct intel_guc *guc,
>   					 u32 guc_id,
> -					 u32 offset)
> +					 u32 offset,
> +					 bool loop)
>   {
>   	u32 action[] = {
>   		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> @@ -809,10 +1115,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
>   	};
>   
>   	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> -					     0, true);
> +					     0, loop);
>   }
>   
> -static int register_context(struct intel_context *ce)
> +static int register_context(struct intel_context *ce, bool loop)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
>   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> @@ -820,11 +1126,12 @@ static int register_context(struct intel_context *ce)
>   
>   	trace_intel_context_register(ce);
>   
> -	return __guc_action_register_context(guc, ce->guc_id, offset);
> +	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
>   }
>   
>   static int __guc_action_deregister_context(struct intel_guc *guc,
> -					   u32 guc_id)
> +					   u32 guc_id,
> +					   bool loop)
>   {
>   	u32 action[] = {
>   		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> @@ -833,16 +1140,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
>   
>   	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>   					     G2H_LEN_DW_DEREGISTER_CONTEXT,
> -					     true);
> +					     loop);
>   }
>   
> -static int deregister_context(struct intel_context *ce, u32 guc_id)
> +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
>   
>   	trace_intel_context_deregister(ce);
>   
> -	return __guc_action_deregister_context(guc, guc_id);
> +	return __guc_action_deregister_context(guc, guc_id, loop);
>   }
>   
>   static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> @@ -871,7 +1178,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>   	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
>   }
>   
> -static int guc_lrc_desc_pin(struct intel_context *ce)
> +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   {
>   	struct intel_runtime_pm *runtime_pm =
>   		&ce->engine->gt->i915->runtime_pm;
> @@ -917,18 +1224,45 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
>   	 */
>   	if (context_registered) {
>   		trace_intel_context_steal_guc_id(ce);
> -		set_context_wait_for_deregister_to_register(ce);
> -		intel_context_get(ce);
> +		if (!loop) {
> +			set_context_wait_for_deregister_to_register(ce);
> +			intel_context_get(ce);
> +		} else {
> +			bool disabled;
> +			unsigned long flags;
> +
> +			/* Seal race with Reset */
> +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> +			disabled = submission_disabled(guc);
> +			if (likely(!disabled)) {
> +				set_context_wait_for_deregister_to_register(ce);
> +				intel_context_get(ce);
> +			}
> +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +			if (unlikely(disabled)) {
> +				reset_lrc_desc(guc, desc_idx);
> +				return 0;	/* Will get registered later */
> +			}
> +		}
>   
>   		/*
>   		 * If stealing the guc_id, this ce has the same guc_id as the
>   		 * context whos guc_id was stole.
>   		 */
>   		with_intel_runtime_pm(runtime_pm, wakeref)
> -			ret = deregister_context(ce, ce->guc_id);
> +			ret = deregister_context(ce, ce->guc_id, loop);
> +		if (unlikely(ret == -EBUSY)) {
> +			clr_context_wait_for_deregister_to_register(ce);
> +			intel_context_put(ce);
> +		} else if (unlikely(ret == -ENODEV))
> +			ret = 0;	/* Will get registered later */
>   	} else {
>   		with_intel_runtime_pm(runtime_pm, wakeref)
> -			ret = register_context(ce);
> +			ret = register_context(ce, loop);
> +		if (unlikely(ret == -EBUSY))
> +			reset_lrc_desc(guc, desc_idx);
> +		else if (unlikely(ret == -ENODEV))
> +			ret = 0;	/* Will get registered later */
>   	}
>   
>   	return ret;
> @@ -991,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>   	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
>   
>   	trace_intel_context_sched_disable(ce);
> -	intel_context_get(ce);
>   
>   	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
>   				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> @@ -1003,6 +1336,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
>   
>   	set_context_pending_disable(ce);
>   	clr_context_enabled(ce);
> +	intel_context_get(ce);
>   
>   	return ce->guc_id;
>   }
> @@ -1015,7 +1349,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   	u16 guc_id;
>   	intel_wakeref_t wakeref;
>   
> -	if (context_guc_id_invalid(ce) ||
> +	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
>   		clr_context_enabled(ce);
>   		goto unpin;
> @@ -1036,6 +1370,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   	 * a request before we set the 'context_pending_disable' flag here.
>   	 */
>   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   		return;
>   	}
>   	guc_id = prep_context_pending_disable(ce);
> @@ -1052,19 +1387,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   
>   static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>   {
> -	struct intel_engine_cs *engine = ce->engine;
> -	struct intel_guc *guc = &engine->gt->uc.guc;
> -	unsigned long flags;
> +	struct intel_guc *guc = ce_to_guc(ce);
>   
>   	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
>   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
>   	GEM_BUG_ON(context_enabled(ce));
>   
> -	spin_lock_irqsave(&ce->guc_state.lock, flags);
> -	set_context_destroyed(ce);
> -	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> -
> -	deregister_context(ce, ce->guc_id);
> +	deregister_context(ce, ce->guc_id, true);
>   }
>   
>   static void __guc_context_destroy(struct intel_context *ce)
> @@ -1092,13 +1421,15 @@ static void guc_context_destroy(struct kref *kref)
>   	struct intel_guc *guc = &ce->engine->gt->uc.guc;
>   	intel_wakeref_t wakeref;
>   	unsigned long flags;
> +	bool disabled;
>   
>   	/*
>   	 * If the guc_id is invalid this context has been stolen and we can free
>   	 * it immediately. Also can be freed immediately if the context is not
>   	 * registered with the GuC.
>   	 */
> -	if (context_guc_id_invalid(ce) ||
> +	if (submission_disabled(guc) ||
> +	    context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
>   		release_guc_id(guc, ce);
>   		__guc_context_destroy(ce);
> @@ -1125,6 +1456,18 @@ static void guc_context_destroy(struct kref *kref)
>   		list_del_init(&ce->guc_id_link);
>   	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>   
> +	/* Seal race with Reset */
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	disabled = submission_disabled(guc);
> +	if (likely(!disabled))
> +		set_context_destroyed(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +	if (unlikely(disabled)) {
> +		release_guc_id(guc, ce);
> +		__guc_context_destroy(ce);
> +		return;
> +	}
> +
>   	/*
>   	 * We defer GuC context deregistration until the context is destroyed
>   	 * in order to save on CTBs. With this optimization ideally we only need
> @@ -1212,8 +1555,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
>   {
>   	unsigned long flags;
>   
> -	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
> -
>   	spin_lock_irqsave(&ce->guc_state.lock, flags);
>   	clr_context_wait_for_deregister_to_register(ce);
>   	__guc_signal_context_fence(ce);
> @@ -1222,8 +1563,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
>   
>   static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>   {
> -	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> -		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> +	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
> +		!submission_disabled(ce_to_guc(ce));
>   }
>   
>   static int guc_request_alloc(struct i915_request *rq)
> @@ -1281,8 +1623,12 @@ static int guc_request_alloc(struct i915_request *rq)
>   	if (unlikely(ret < 0))
>   		return ret;;
>   	if (context_needs_register(ce, !!ret)) {
> -		ret = guc_lrc_desc_pin(ce);
> +		ret = guc_lrc_desc_pin(ce, true);
>   		if (unlikely(ret)) {	/* unwind */
> +			if (ret == -EPIPE) {
> +				disable_submission(guc);
> +				goto out;	/* GPU will be reset */
> +			}
>   			atomic_dec(&ce->guc_id_ref);
>   			unpin_guc_id(guc, ce);
>   			return ret;
> @@ -1319,20 +1665,6 @@ static int guc_request_alloc(struct i915_request *rq)
>   	return 0;
>   }
>   
> -static struct intel_engine_cs *
> -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> -{
> -	struct intel_engine_cs *engine;
> -	intel_engine_mask_t tmp, mask = ve->mask;
> -	unsigned int num_siblings = 0;
> -
> -	for_each_engine_masked(engine, ve->gt, mask, tmp)
> -		if (num_siblings++ == sibling)
> -			return engine;
> -
> -	return NULL;
> -}
> -
>   static int guc_virtual_context_pre_pin(struct intel_context *ce,
>   				       struct i915_gem_ww_ctx *ww,
>   				       void **vaddr)
> @@ -1528,7 +1860,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
>   {
>   	if (context_guc_id_invalid(ce))
>   		pin_guc_id(guc, ce);
> -	guc_lrc_desc_pin(ce);
> +	guc_lrc_desc_pin(ce, true);
>   }
>   
>   static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> @@ -1599,10 +1931,10 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
>   
>   	engine->sched_engine->schedule = i915_schedule;
>   
> -	engine->reset.prepare = guc_reset_prepare;
> -	engine->reset.rewind = guc_reset_rewind;
> -	engine->reset.cancel = guc_reset_cancel;
> -	engine->reset.finish = guc_reset_finish;
> +	engine->reset.prepare = guc_reset_nop;
> +	engine->reset.rewind = guc_rewind_nop;
> +	engine->reset.cancel = guc_reset_nop;
> +	engine->reset.finish = guc_reset_nop;
>   
>   	engine->emit_flush = gen8_emit_flush_xcs;
>   	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
> @@ -1651,6 +1983,17 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
>   	intel_engine_set_irq_handler(engine, cs_irq_handler);
>   }
>   
> +static void guc_sched_engine_destroy(struct kref *kref)
> +{
> +	struct i915_sched_engine *sched_engine =
> +		container_of(kref, typeof(*sched_engine), ref);
> +	struct intel_guc *guc = sched_engine->private_data;
> +
> +	guc->sched_engine = NULL;
> +	tasklet_kill(&sched_engine->tasklet); /* flush the callback */
> +	kfree(sched_engine);
> +}
> +
>   int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> @@ -1669,6 +2012,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   		guc->sched_engine->schedule = i915_schedule;
>   		guc->sched_engine->private_data = guc;
> +		guc->sched_engine->destroy = guc_sched_engine_destroy;
>   		tasklet_setup(&guc->sched_engine->tasklet,
>   			      guc_submission_tasklet);
>   	}
> @@ -1775,7 +2119,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   		 * register this context.
>   		 */
>   		with_intel_runtime_pm(runtime_pm, wakeref)
> -			register_context(ce);
> +			register_context(ce, true);
>   		guc_signal_context_fence(ce);
>   		intel_context_put(ce);
>   	} else if (context_destroyed(ce)) {
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index 6d8b9233214e..f0b02200aa01 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
>   {
>   	struct intel_guc *guc = &uc->guc;
>   
> -	if (!intel_guc_is_ready(guc))
> +
> +	/* Nothing to do if GuC isn't supported */
> +	if (!intel_uc_supports_guc(uc))
>   		return;
>   
> +	/* Firmware expected to be running when this function is called */
> +	if (!intel_guc_is_ready(guc))
> +		goto sanitize;
> +
> +	if (intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_reset_prepare(guc);
> +
> +sanitize:
>   	__uc_sanitize(uc);
>   }
>   
> +void intel_uc_reset(struct intel_uc *uc, bool stalled)
> +{
> +	struct intel_guc *guc = &uc->guc;
> +
> +	/* Firmware can not be running when this function is called  */
> +	if (intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_reset(guc, stalled);
> +}
> +
> +void intel_uc_reset_finish(struct intel_uc *uc)
> +{
> +	struct intel_guc *guc = &uc->guc;
> +
> +	/* Firmware expected to be running when this function is called */
> +	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_reset_finish(guc);
> +}
> +
> +void intel_uc_cancel_requests(struct intel_uc *uc)
> +{
> +	struct intel_guc *guc = &uc->guc;
> +
> +	/* Firmware can not be running when this function is called  */
> +	if (intel_uc_uses_guc_submission(uc))
> +		intel_guc_submission_cancel_requests(guc);
> +}
> +
>   void intel_uc_runtime_suspend(struct intel_uc *uc)
>   {
>   	struct intel_guc *guc = &uc->guc;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> index c4cef885e984..eaa3202192ac 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
>   void intel_uc_driver_remove(struct intel_uc *uc);
>   void intel_uc_init_mmio(struct intel_uc *uc);
>   void intel_uc_reset_prepare(struct intel_uc *uc);
> +void intel_uc_reset(struct intel_uc *uc, bool stalled);
> +void intel_uc_reset_finish(struct intel_uc *uc);
> +void intel_uc_cancel_requests(struct intel_uc *uc);
>   void intel_uc_suspend(struct intel_uc *uc);
>   void intel_uc_runtime_suspend(struct intel_uc *uc);
>   int intel_uc_resume(struct intel_uc *uc);

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 30/51] drm/i915/guc: Handle context reset notification
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 20:29     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 20:29 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:17, Matthew Brost wrote:
> GuC will issue a reset on detecting an engine hang and will notify
> the driver via a G2H message. The driver will service the notification
> by resetting the guilty context to a simple state or banning it
> completely.
>
> v2:
>   (John Harrison)
>    - Move msg[0] lookup after length check
>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 ++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++++++++++++++++++
>   drivers/gpu/drm/i915/i915_trace.h             | 10 ++++++
>   4 files changed, 51 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index b3cfc52fe0bc..f23a3a618550 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   					  const u32 *msg, u32 len);
>   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   				     const u32 *msg, u32 len);
> +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> +					const u32 *msg, u32 len);
>   
>   void intel_guc_submission_reset_prepare(struct intel_guc *guc);
>   void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 503a78517610..c4f9b44b9f86 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -981,6 +981,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>   	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
>   		ret = intel_guc_sched_done_process_msg(guc, payload, len);
>   		break;
> +	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
> +		ret = intel_guc_context_reset_process_msg(guc, payload, len);
> +		break;
>   	default:
>   		ret = -EOPNOTSUPP;
>   		break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index fdb17279095c..feaf1ca61eaa 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2196,6 +2196,42 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   	return 0;
>   }
>   
> +static void guc_context_replay(struct intel_context *ce)
> +{
> +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> +
> +	__guc_reset_context(ce, true);
> +	tasklet_hi_schedule(&sched_engine->tasklet);
> +}
> +
> +static void guc_handle_context_reset(struct intel_guc *guc,
> +				     struct intel_context *ce)
> +{
> +	trace_intel_context_reset(ce);
> +	guc_context_replay(ce);
> +}
> +
> +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> +					const u32 *msg, u32 len)
> +{
> +	struct intel_context *ce;
> +	int desc_idx;
> +
> +	if (unlikely(len != 1)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
I think we decided that these should be drm_err rather than drm_dbg?

With that updated:
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> +		return -EPROTO;
> +	}
> +
> +	desc_idx = msg[0];
> +	ce = g2h_context_lookup(guc, desc_idx);
> +	if (unlikely(!ce))
> +		return -EPROTO;
> +
> +	guc_handle_context_reset(guc, ce);
> +
> +	return 0;
> +}
> +
>   void intel_guc_submission_print_info(struct intel_guc *guc,
>   				     struct drm_printer *p)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 97c2e83984ed..c095c4d39456 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
>   		      __entry->guc_sched_state_no_lock)
>   );
>   
> +DEFINE_EVENT(intel_context, intel_context_reset,
> +	     TP_PROTO(struct intel_context *ce),
> +	     TP_ARGS(ce)
> +);
> +
>   DEFINE_EVENT(intel_context, intel_context_register,
>   	     TP_PROTO(struct intel_context *ce),
>   	     TP_ARGS(ce)
> @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
>   {
>   }
>   
> +static inline void
> +trace_intel_context_reset(struct intel_context *ce)
> +{
> +}
> +
>   static inline void
>   trace_intel_context_register(struct intel_context *ce)
>   {


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 30/51] drm/i915/guc: Handle context reset notification
@ 2021-07-20 20:29     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 20:29 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:17, Matthew Brost wrote:
> GuC will issue a reset on detecting an engine hang and will notify
> the driver via a G2H message. The driver will service the notification
> by resetting the guilty context to a simple state or banning it
> completely.
>
> v2:
>   (John Harrison)
>    - Move msg[0] lookup after length check
>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 ++
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++++++++++++++++++
>   drivers/gpu/drm/i915/i915_trace.h             | 10 ++++++
>   4 files changed, 51 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index b3cfc52fe0bc..f23a3a618550 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>   					  const u32 *msg, u32 len);
>   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   				     const u32 *msg, u32 len);
> +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> +					const u32 *msg, u32 len);
>   
>   void intel_guc_submission_reset_prepare(struct intel_guc *guc);
>   void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 503a78517610..c4f9b44b9f86 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -981,6 +981,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>   	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
>   		ret = intel_guc_sched_done_process_msg(guc, payload, len);
>   		break;
> +	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
> +		ret = intel_guc_context_reset_process_msg(guc, payload, len);
> +		break;
>   	default:
>   		ret = -EOPNOTSUPP;
>   		break;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index fdb17279095c..feaf1ca61eaa 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -2196,6 +2196,42 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   	return 0;
>   }
>   
> +static void guc_context_replay(struct intel_context *ce)
> +{
> +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> +
> +	__guc_reset_context(ce, true);
> +	tasklet_hi_schedule(&sched_engine->tasklet);
> +}
> +
> +static void guc_handle_context_reset(struct intel_guc *guc,
> +				     struct intel_context *ce)
> +{
> +	trace_intel_context_reset(ce);
> +	guc_context_replay(ce);
> +}
> +
> +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> +					const u32 *msg, u32 len)
> +{
> +	struct intel_context *ce;
> +	int desc_idx;
> +
> +	if (unlikely(len != 1)) {
> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
I think we decided that these should be drm_err rather than drm_dbg?

With that updated:
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> +		return -EPROTO;
> +	}
> +
> +	desc_idx = msg[0];
> +	ce = g2h_context_lookup(guc, desc_idx);
> +	if (unlikely(!ce))
> +		return -EPROTO;
> +
> +	guc_handle_context_reset(guc, ce);
> +
> +	return 0;
> +}
> +
>   void intel_guc_submission_print_info(struct intel_guc *guc,
>   				     struct drm_printer *p)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 97c2e83984ed..c095c4d39456 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
>   		      __entry->guc_sched_state_no_lock)
>   );
>   
> +DEFINE_EVENT(intel_context, intel_context_reset,
> +	     TP_PROTO(struct intel_context *ce),
> +	     TP_ARGS(ce)
> +);
> +
>   DEFINE_EVENT(intel_context, intel_context_register,
>   	     TP_PROTO(struct intel_context *ce),
>   	     TP_ARGS(ce)
> @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
>   {
>   }
>   
> +static inline void
> +trace_intel_context_reset(struct intel_context *ce)
> +{
> +}
> +
>   static inline void
>   trace_intel_context_register(struct intel_context *ce)
>   {

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 30/51] drm/i915/guc: Handle context reset notification
  2021-07-20 20:29     ` [Intel-gfx] " John Harrison
@ 2021-07-20 20:38       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 20:38 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On Tue, Jul 20, 2021 at 01:29:26PM -0700, John Harrison wrote:
> On 7/16/2021 13:17, Matthew Brost wrote:
> > GuC will issue a reset on detecting an engine hang and will notify
> > the driver via a G2H message. The driver will service the notification
> > by resetting the guilty context to a simple state or banning it
> > completely.
> > 
> > v2:
> >   (John Harrison)
> >    - Move msg[0] lookup after length check
> > 
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 ++
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_trace.h             | 10 ++++++
> >   4 files changed, 51 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index b3cfc52fe0bc..f23a3a618550 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   					  const u32 *msg, u32 len);
> >   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   				     const u32 *msg, u32 len);
> > +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> > +					const u32 *msg, u32 len);
> >   void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> >   void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 503a78517610..c4f9b44b9f86 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -981,6 +981,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
> >   	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
> >   		ret = intel_guc_sched_done_process_msg(guc, payload, len);
> >   		break;
> > +	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
> > +		ret = intel_guc_context_reset_process_msg(guc, payload, len);
> > +		break;
> >   	default:
> >   		ret = -EOPNOTSUPP;
> >   		break;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index fdb17279095c..feaf1ca61eaa 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -2196,6 +2196,42 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   	return 0;
> >   }
> > +static void guc_context_replay(struct intel_context *ce)
> > +{
> > +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> > +
> > +	__guc_reset_context(ce, true);
> > +	tasklet_hi_schedule(&sched_engine->tasklet);
> > +}
> > +
> > +static void guc_handle_context_reset(struct intel_guc *guc,
> > +				     struct intel_context *ce)
> > +{
> > +	trace_intel_context_reset(ce);
> > +	guc_context_replay(ce);
> > +}
> > +
> > +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> > +					const u32 *msg, u32 len)
> > +{
> > +	struct intel_context *ce;
> > +	int desc_idx;
> > +
> > +	if (unlikely(len != 1)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> I think we decided that these should be drm_err rather than drm_dbg?
> 

Yes, we did. Already fixed this message and all subsequent in my branch.

Matt

> With that updated:
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> 
> > +		return -EPROTO;
> > +	}
> > +
> > +	desc_idx = msg[0];
> > +	ce = g2h_context_lookup(guc, desc_idx);
> > +	if (unlikely(!ce))
> > +		return -EPROTO;
> > +
> > +	guc_handle_context_reset(guc, ce);
> > +
> > +	return 0;
> > +}
> > +
> >   void intel_guc_submission_print_info(struct intel_guc *guc,
> >   				     struct drm_printer *p)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> > index 97c2e83984ed..c095c4d39456 100644
> > --- a/drivers/gpu/drm/i915/i915_trace.h
> > +++ b/drivers/gpu/drm/i915/i915_trace.h
> > @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
> >   		      __entry->guc_sched_state_no_lock)
> >   );
> > +DEFINE_EVENT(intel_context, intel_context_reset,
> > +	     TP_PROTO(struct intel_context *ce),
> > +	     TP_ARGS(ce)
> > +);
> > +
> >   DEFINE_EVENT(intel_context, intel_context_register,
> >   	     TP_PROTO(struct intel_context *ce),
> >   	     TP_ARGS(ce)
> > @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
> >   {
> >   }
> > +static inline void
> > +trace_intel_context_reset(struct intel_context *ce)
> > +{
> > +}
> > +
> >   static inline void
> >   trace_intel_context_register(struct intel_context *ce)
> >   {
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 30/51] drm/i915/guc: Handle context reset notification
@ 2021-07-20 20:38       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 20:38 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Tue, Jul 20, 2021 at 01:29:26PM -0700, John Harrison wrote:
> On 7/16/2021 13:17, Matthew Brost wrote:
> > GuC will issue a reset on detecting an engine hang and will notify
> > the driver via a G2H message. The driver will service the notification
> > by resetting the guilty context to a simple state or banning it
> > completely.
> > 
> > v2:
> >   (John Harrison)
> >    - Move msg[0] lookup after length check
> > 
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: John Harrison <John.C.Harrison@Intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  2 ++
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |  3 ++
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 36 +++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_trace.h             | 10 ++++++
> >   4 files changed, 51 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index b3cfc52fe0bc..f23a3a618550 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -262,6 +262,8 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   					  const u32 *msg, u32 len);
> >   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   				     const u32 *msg, u32 len);
> > +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> > +					const u32 *msg, u32 len);
> >   void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> >   void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > index 503a78517610..c4f9b44b9f86 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> > @@ -981,6 +981,9 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
> >   	case INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE:
> >   		ret = intel_guc_sched_done_process_msg(guc, payload, len);
> >   		break;
> > +	case INTEL_GUC_ACTION_CONTEXT_RESET_NOTIFICATION:
> > +		ret = intel_guc_context_reset_process_msg(guc, payload, len);
> > +		break;
> >   	default:
> >   		ret = -EOPNOTSUPP;
> >   		break;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index fdb17279095c..feaf1ca61eaa 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -2196,6 +2196,42 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   	return 0;
> >   }
> > +static void guc_context_replay(struct intel_context *ce)
> > +{
> > +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> > +
> > +	__guc_reset_context(ce, true);
> > +	tasklet_hi_schedule(&sched_engine->tasklet);
> > +}
> > +
> > +static void guc_handle_context_reset(struct intel_guc *guc,
> > +				     struct intel_context *ce)
> > +{
> > +	trace_intel_context_reset(ce);
> > +	guc_context_replay(ce);
> > +}
> > +
> > +int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> > +					const u32 *msg, u32 len)
> > +{
> > +	struct intel_context *ce;
> > +	int desc_idx;
> > +
> > +	if (unlikely(len != 1)) {
> > +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> I think we decided that these should be drm_err rather than drm_dbg?
> 

Yes, we did. Already fixed this message and all subsequent in my branch.

Matt

> With that updated:
> Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
> 
> > +		return -EPROTO;
> > +	}
> > +
> > +	desc_idx = msg[0];
> > +	ce = g2h_context_lookup(guc, desc_idx);
> > +	if (unlikely(!ce))
> > +		return -EPROTO;
> > +
> > +	guc_handle_context_reset(guc, ce);
> > +
> > +	return 0;
> > +}
> > +
> >   void intel_guc_submission_print_info(struct intel_guc *guc,
> >   				     struct drm_printer *p)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> > index 97c2e83984ed..c095c4d39456 100644
> > --- a/drivers/gpu/drm/i915/i915_trace.h
> > +++ b/drivers/gpu/drm/i915/i915_trace.h
> > @@ -929,6 +929,11 @@ DECLARE_EVENT_CLASS(intel_context,
> >   		      __entry->guc_sched_state_no_lock)
> >   );
> > +DEFINE_EVENT(intel_context, intel_context_reset,
> > +	     TP_PROTO(struct intel_context *ce),
> > +	     TP_ARGS(ce)
> > +);
> > +
> >   DEFINE_EVENT(intel_context, intel_context_register,
> >   	     TP_PROTO(struct intel_context *ce),
> >   	     TP_ARGS(ce)
> > @@ -1026,6 +1031,11 @@ trace_i915_request_out(struct i915_request *rq)
> >   {
> >   }
> > +static inline void
> > +trace_intel_context_reset(struct intel_context *ce)
> > +{
> > +}
> > +
> >   static inline void
> >   trace_intel_context_register(struct intel_context *ce)
> >   {
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface
  2021-07-20 20:19     ` [Intel-gfx] " John Harrison
@ 2021-07-20 20:59       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 20:59 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, daniele.ceraolospurio, dri-devel

On Tue, Jul 20, 2021 at 01:19:48PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > Reset implementation for new GuC interface. This is the legacy reset
> > implementation which is called when the i915 owns the engine hang check.
> > Future patches will offload the engine hang check to GuC but we will
> > continue to maintain this legacy path as a fallback and this code path
> > is also required if the GuC dies.
> > 
> > With the new GuC interface it is not possible to reset individual
> > engines - it is only possible to reset the GPU entirely. This patch
> > forces an entire chip reset if any engine hangs.
> > 
> > v2:
> >   (Michal)
> >    - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
> > v3:
> >   (John H)
> >    - Split into a series of smaller patches
> While the split happened, it doesn't look like any of the other comments
> were address. Repeated below for clarity. Also, Tvrtko has a bunch of

Opps, I thought you asked for the split and didn't realize you had comments.

I thought I addressed Tvrtko's comments but a I'll circle back to them.

> outstanding comments too.
> 
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
> >   drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  13 -
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 562 ++++++++++++++----
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  39 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
> >   7 files changed, 515 insertions(+), 134 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > index aef3084e8b16..463a6ae605a0 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> >   	if (intel_gt_is_wedged(gt))
> >   		intel_gt_unset_wedged(gt);
> > -	intel_uc_sanitize(&gt->uc);
> > -
> >   	for_each_engine(engine, gt, id)
> >   		if (engine->reset.prepare)
> >   			engine->reset.prepare(engine);
> > @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> >   			__intel_engine_reset(engine, false);
> >   	}
> > +	intel_uc_reset(&gt->uc, false);
> > +
> >   	for_each_engine(engine, gt, id)
> >   		if (engine->reset.finish)
> >   			engine->reset.finish(engine);
> > @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
> >   		goto err_wedged;
> >   	}
> > +	intel_uc_reset_finish(&gt->uc);
> > +
> >   	intel_rps_enable(&gt->rps);
> >   	intel_llc_enable(&gt->llc);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index 72251638d4ea..2987282dff6d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
> >   		__intel_engine_reset(engine, stalled_mask & engine->mask);
> >   	local_bh_enable();
> > +	intel_uc_reset(&gt->uc, true);
> > +
> >   	intel_ggtt_restore_fences(gt->ggtt);
> >   	return err;
> > @@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
> >   		if (awake & engine->mask)
> >   			intel_engine_pm_put(engine);
> >   	}
> > +
> > +	intel_uc_reset_finish(&gt->uc);
> >   }
> >   static void nop_submit_request(struct i915_request *request)
> > @@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
> >   	for_each_engine(engine, gt, id)
> >   		if (engine->reset.cancel)
> >   			engine->reset.cancel(engine);
> > +	intel_uc_cancel_requests(&gt->uc);
> >   	local_bh_enable();
> >   	reset_finish(gt, awake);
> > @@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> >   	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
> >   	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
> > +	if (intel_engine_uses_guc(engine))
> > +		return -ENODEV;
> > +
> >   	if (!intel_engine_pm_get_if_awake(engine))
> >   		return 0;
> > @@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> >   			   "Resetting %s for %s\n", engine->name, msg);
> >   	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
> > -	if (intel_engine_uses_guc(engine))
> > -		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> > -	else
> > -		ret = intel_gt_reset_engine(engine);
> > +	ret = intel_gt_reset_engine(engine);
> >   	if (ret) {
> >   		/* If we fail here, we expect to fallback to a global reset */
> > -		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> > +		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
> >   		goto out;
> >   	}
> > @@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
> >   	 * Try engine reset when available. We fall back to full reset if
> >   	 * single reset fails.
> >   	 */
> > -	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> > +	if (!intel_uc_uses_guc_submission(&gt->uc) &&
> > +	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> >   		local_bh_disable();
> >   		for_each_engine_masked(engine, gt, engine_mask, tmp) {
> >   			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > index 6661dcb02239..9b09395b998f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc)
> >   	return 0;
> >   }
> > -/**
> > - * intel_guc_reset_engine() - ask GuC to reset an engine
> > - * @guc:	intel_guc structure
> > - * @engine:	engine to be reset
> > - */
> > -int intel_guc_reset_engine(struct intel_guc *guc,
> > -			   struct intel_engine_cs *engine)
> > -{
> > -	/* XXX: to be implemented with submission interface rework */
> > -
> > -	return -ENODEV;
> > -}
> > -
> >   /**
> >    * intel_guc_resume() - notify GuC resuming from suspend state
> >    * @guc:	the guc
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 3cc566565224..d75a76882a44 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > -int intel_guc_reset_engine(struct intel_guc *guc,
> > -			   struct intel_engine_cs *engine);
> > -
> >   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   					  const u32 *msg, u32 len);
> >   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   				     const u32 *msg, u32 len);
> > +void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> > +void intel_guc_submission_reset_finish(struct intel_guc *guc);
> > +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
> > +
> >   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index f8a6077fa3f9..0f5823922369 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
> >   static inline void
> >   set_context_wait_for_deregister_to_register(struct intel_context *ce)
> >   {
> > -	/* Only should be called from guc_lrc_desc_pin() */
> > +	/* Only should be called from guc_lrc_desc_pin() without lock */
> >   	ce->guc_state.sched_state |=
> >   		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> >   }
> > @@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> >   static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
> >   {
> > +	guc->lrc_desc_pool_vaddr = NULL;
> >   	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
> >   }
> > +static inline bool guc_submission_initialized(struct intel_guc *guc)
> > +{
> > +	return guc->lrc_desc_pool_vaddr != NULL;
> > +}
> > +
> >   static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> >   {
> > -	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > +	if (likely(guc_submission_initialized(guc))) {
> > +		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > +		unsigned long flags;
> > -	memset(desc, 0, sizeof(*desc));
> > -	xa_erase_irq(&guc->context_lookup, id);
> > +		memset(desc, 0, sizeof(*desc));
> > +
> > +		/*
> > +		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
> > +		 * the lower level functions directly.
> > +		 */
> > +		xa_lock_irqsave(&guc->context_lookup, flags);
> > +		__xa_erase(&guc->context_lookup, id);
> > +		xa_unlock_irqrestore(&guc->context_lookup, flags);
> > +	}
> >   }
> >   static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > @@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> >   static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   					   struct intel_context *ce)
> >   {
> > -	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > +	unsigned long flags;
> > +
> > +	/*
> > +	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
> > +	 * lower level functions directly.
> > +	 */
> > +	xa_lock_irqsave(&guc->context_lookup, flags);
> > +	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > +	xa_unlock_irqrestore(&guc->context_lookup, flags);
> >   }
> >   static int guc_submission_send_busy_loop(struct intel_guc* guc,
> > @@ -326,6 +350,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> >   					true, timeout);
> >   }
> > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
> > +
> >   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> >   	int err;
> > @@ -333,11 +359,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	u32 action[3];
> >   	int len = 0;
> >   	u32 g2h_len_dw = 0;
> > -	bool enabled = context_enabled(ce);
> > +	bool enabled;
> >   	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> >   	GEM_BUG_ON(context_guc_id_invalid(ce));
> > +	/*
> > +	 * Corner case where the GuC firmware was blown away and reloaded while
> > +	 * this context was pinned.
> > +	 */
> > +	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
> > +		err = guc_lrc_desc_pin(ce, false);
> > +		if (unlikely(err))
> > +			goto out;
> > +	}
> > +	enabled = context_enabled(ce);
> > +
> >   	if (!enabled) {
> >   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >   		action[len++] = ce->guc_id;
> > @@ -360,6 +397,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   		intel_context_put(ce);
> >   	}
> > +out:
> >   	return err;
> >   }
> > @@ -414,15 +452,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> >   	if (submit) {
> >   		guc_set_lrc_tail(last);
> >   resubmit:
> > -		/*
> > -		 * We only check for -EBUSY here even though it is possible for
> > -		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> > -		 * died and a full GT reset needs to be done. The hangcheck will
> > -		 * eventually detect that the GuC has died and trigger this
> > -		 * reset so no need to handle -EDEADLK here.
> > -		 */
> >   		ret = guc_add_request(guc, last);
> > -		if (ret == -EBUSY) {
> > +		if (unlikely(ret == -EPIPE))
> > +			goto deadlk;
> > +		else if (ret == -EBUSY) {
> >   			tasklet_schedule(&sched_engine->tasklet);
> >   			guc->stalled_request = last;
> >   			return false;
> > @@ -432,6 +465,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> >   	guc->stalled_request = NULL;
> >   	return submit;
> > +
> > +deadlk:
> > +	sched_engine->tasklet.callback = NULL;
> > +	tasklet_disable_nosync(&sched_engine->tasklet);
> > +	return false;
> >   }
> >   static void guc_submission_tasklet(struct tasklet_struct *t)
> > @@ -458,27 +496,166 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> >   		intel_engine_signal_breadcrumbs(engine);
> >   }
> > -static void guc_reset_prepare(struct intel_engine_cs *engine)
> > +static void __guc_context_destroy(struct intel_context *ce);
> > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> > +static void guc_signal_context_fence(struct intel_context *ce);
> > +
> > +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> > +{
> > +	struct intel_context *ce;
> > +	unsigned long index, flags;
> > +	bool pending_disable, pending_enable, deregister, destroyed;
> > +
> > +	xa_for_each(&guc->context_lookup, index, ce) {
> > +		/* Flush context */
> > +		spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +		/*
> > +		 * Once we are at this point submission_disabled() is guaranteed
> > +		 * to visible to all callers who set the below flags (see above
> Still needs to be 'to be visible'
> 

Fixed.

> > +		 * flush and flushes in reset_prepare). If submission_disabled()
> > +		 * is set, the caller shouldn't set these flags.
> > +		 */
> > +
> > +		destroyed = context_destroyed(ce);
> > +		pending_enable = context_pending_enable(ce);
> > +		pending_disable = context_pending_disable(ce);
> > +		deregister = context_wait_for_deregister_to_register(ce);
> > +		init_sched_state(ce);
> > +
> > +		if (pending_enable || destroyed || deregister) {
> > +			atomic_dec(&guc->outstanding_submission_g2h);
> > +			if (deregister)
> > +				guc_signal_context_fence(ce);
> > +			if (destroyed) {
> > +				release_guc_id(guc, ce);
> > +				__guc_context_destroy(ce);
> > +			}
> > +			if (pending_enable|| deregister)
> > +				intel_context_put(ce);
> > +		}
> > +
> > +		/* Not mutualy exclusive with above if statement. */
> > +		if (pending_disable) {
> > +			guc_signal_context_fence(ce);
> > +			intel_context_sched_disable_unpin(ce);
> > +			atomic_dec(&guc->outstanding_submission_g2h);
> > +			intel_context_put(ce);
> > +		}
> > +	}
> > +}
> > +
> > +static inline bool
> > +submission_disabled(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +
> > +	return unlikely(!sched_engine ||
> > +			!__tasklet_is_enabled(&sched_engine->tasklet));
> > +}
> > +
> > +static void disable_submission(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +
> > +	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
> > +		GEM_BUG_ON(!guc->ct.enabled);
> > +		__tasklet_disable_sync_once(&sched_engine->tasklet);
> > +		sched_engine->tasklet.callback = NULL;
> > +	}
> > +}
> > +
> > +static void enable_submission(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&guc->sched_engine->lock, flags);
> > +	sched_engine->tasklet.callback = guc_submission_tasklet;
> > +	wmb();
> > +	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
> > +	    __tasklet_enable(&sched_engine->tasklet)) {
> > +		GEM_BUG_ON(!guc->ct.enabled);
> > +
> > +		/* And kick in case we missed a new request submission. */
> > +		tasklet_hi_schedule(&sched_engine->tasklet);
> > +	}
> > +	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
> > +}
> > +
> > +static void guc_flush_submissions(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +void intel_guc_submission_reset_prepare(struct intel_guc *guc)
> >   {
> > -	ENGINE_TRACE(engine, "\n");
> > +	int i;
> > +
> > +	if (unlikely(!guc_submission_initialized(guc)))
> > +		/* Reset called during driver load? GuC not yet initialised! */
> > +		return;
> Still think this should have braces. Maybe not syntactically required but
> just looks wrong.
> 

Sure.

> > +
> > +	disable_submission(guc);
> > +	guc->interrupts.disable(guc);
> > +
> > +	/* Flush IRQ handler */
> > +	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
> > +	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
> > +
> > +	guc_flush_submissions(guc);
> >   	/*
> > -	 * Prevent request submission to the hardware until we have
> > -	 * completed the reset in i915_gem_reset_finish(). If a request
> > -	 * is completed by one engine, it may then queue a request
> > -	 * to a second via its execlists->tasklet *just* as we are
> > -	 * calling engine->init_hw() and also writing the ELSP.
> > -	 * Turning off the execlists->tasklet until the reset is over
> > -	 * prevents the race.
> > -	 */
> > -	__tasklet_disable_sync_once(&engine->sched_engine->tasklet);
> > +	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
> > +	 * each pass as interrupt have been disabled. We always scrub for
> > +	 * outstanding G2H as it is possible for outstanding_submission_g2h to
> > +	 * be incremented after the context state update.
> > + 	 */
> > +	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
> > +		intel_guc_to_host_event_handler(guc);
> > +#define wait_for_reset(guc, wait_var) \
> > +		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
> > +		do {
> > +			wait_for_reset(guc, &guc->outstanding_submission_g2h);
> > +		} while (!list_empty(&guc->ct.requests.incoming));
> > +	}
> > +	scrub_guc_desc_for_outstanding_g2h(guc);
> > +}
> > +
> > +static struct intel_engine_cs *
> > +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > +{
> > +	struct intel_engine_cs *engine;
> > +	intel_engine_mask_t tmp, mask = ve->mask;
> > +	unsigned int num_siblings = 0;
> > +
> > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > +		if (num_siblings++ == sibling)
> > +			return engine;
> > +
> > +	return NULL;
> > +}
> > +
> > +static inline struct intel_engine_cs *
> > +__context_to_physical_engine(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = ce->engine;
> > +
> > +	if (intel_engine_is_virtual(engine))
> > +		engine = guc_virtual_get_sibling(engine, 0);
> > +
> > +	return engine;
> >   }
> > -static void guc_reset_state(struct intel_context *ce,
> > -			    struct intel_engine_cs *engine,
> > -			    u32 head,
> > -			    bool scrub)
> > +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
> >   {
> > +	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> > +
> >   	GEM_BUG_ON(!intel_context_is_pinned(ce));
> >   	/*
> > @@ -496,42 +673,147 @@ static void guc_reset_state(struct intel_context *ce,
> >   	lrc_update_regs(ce, engine, head);
> >   }
> > -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> > +static void guc_reset_nop(struct intel_engine_cs *engine)
> >   {
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_request *rq;
> > +}
> > +
> > +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
> > +{
> > +}
> > +
> > +static void
> > +__unwind_incomplete_requests(struct intel_context *ce)
> > +{
> > +	struct i915_request *rq, *rn;
> > +	struct list_head *pl;
> > +	int prio = I915_PRIORITY_INVALID;
> > +	struct i915_sched_engine * const sched_engine =
> > +		ce->engine->sched_engine;
> >   	unsigned long flags;
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	spin_lock(&ce->guc_active.lock);
> > +	list_for_each_entry_safe(rq, rn,
> > +				 &ce->guc_active.requests,
> > +				 sched.link) {
> > +		if (i915_request_completed(rq))
> > +			continue;
> > +
> > +		list_del_init(&rq->sched.link);
> > +		spin_unlock(&ce->guc_active.lock);
> > +
> > +		__i915_request_unsubmit(rq);
> > +
> > +		/* Push the request back into the queue for later resubmission. */
> > +		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> > +		if (rq_prio(rq) != prio) {
> > +			prio = rq_prio(rq);
> > +			pl = i915_sched_lookup_priolist(sched_engine, prio);
> > +		}
> > +		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> > -	/* Push back any incomplete requests for replay after the reset. */
> > -	rq = execlists_unwind_incomplete_requests(execlists);
> > -	if (!rq)
> > -		goto out_unlock;
> > +		list_add_tail(&rq->sched.link, pl);
> > +		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +
> > +		spin_lock(&ce->guc_active.lock);
> > +	}
> > +	spin_unlock(&ce->guc_active.lock);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +static struct i915_request *context_find_active_request(struct intel_context *ce)
> > +{
> > +	struct i915_request *rq, *active = NULL;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&ce->guc_active.lock, flags);
> > +	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> > +				    sched.link) {
> > +		if (i915_request_completed(rq))
> > +			break;
> > +
> > +		active = rq;
> > +	}
> > +	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> > +
> > +	return active;
> > +}
> > +
> > +static void __guc_reset_context(struct intel_context *ce, bool stalled)
> > +{
> > +	struct i915_request *rq;
> > +	u32 head;
> > +
> > +	/*
> > +	 * GuC will implicitly mark the context as non-schedulable
> > +	 * when it sends the reset notification. Make sure our state
> > +	 * reflects this change. The context will be marked enabled
> > +	 * on resubmission.
> > +	 */
> > +	clr_context_enabled(ce);
> > +
> > +	rq = context_find_active_request(ce);
> > +	if (!rq) {
> > +		head = ce->ring->tail;
> > +		stalled = false;
> > +		goto out_replay;
> > +	}
> >   	if (!i915_request_started(rq))
> >   		stalled = false;
> > +	GEM_BUG_ON(i915_active_is_idle(&ce->active));
> > +	head = intel_ring_wrap(ce->ring, rq->head);
> >   	__i915_request_reset(rq, stalled);
> > -	guc_reset_state(rq->context, engine, rq->head, stalled);
> > -out_unlock:
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +out_replay:
> > +	guc_reset_state(ce, head, stalled);
> > +	__unwind_incomplete_requests(ce);
> > +}
> > +
> > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> > +{
> > +	struct intel_context *ce;
> > +	unsigned long index;
> > +
> > +	if (unlikely(!guc_submission_initialized(guc)))
> > +		/* Reset called during driver load? GuC not yet initialised! */
> > +		return;
> And again.
> 

Yep.

> > +
> > +	xa_for_each(&guc->context_lookup, index, ce)
> > +		if (intel_context_is_pinned(ce))
> > +			__guc_reset_context(ce, stalled);
> > +
> > +	/* GuC is blown away, drop all references to contexts */
> > +	xa_destroy(&guc->context_lookup);
> > +}
> > +
> > +static void guc_cancel_context_requests(struct intel_context *ce)
> > +{
> > +	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
> > +	struct i915_request *rq;
> > +	unsigned long flags;
> > +
> > +	/* Mark all executing requests as skipped. */
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	spin_lock(&ce->guc_active.lock);
> > +	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
> > +		i915_request_put(i915_request_mark_eio(rq));
> > +	spin_unlock(&ce->guc_active.lock);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> > -static void guc_reset_cancel(struct intel_engine_cs *engine)
> > +static void
> > +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
> >   {
> > -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >   	struct i915_request *rq, *rn;
> >   	struct rb_node *rb;
> >   	unsigned long flags;
> >   	/* Can be called during boot if GuC fails to load */
> > -	if (!engine->gt)
> > +	if (!sched_engine)
> >   		return;
> > -	ENGINE_TRACE(engine, "\n");
> > -
> >   	/*
> >   	 * Before we call engine->cancel_requests(), we should have exclusive
> >   	 * access to the submission state. This is arranged for us by the
> > @@ -548,21 +830,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	 */
> >   	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	/* Mark all executing requests as skipped. */
> > -	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
> > -		i915_request_set_error_once(rq, -EIO);
> > -		i915_request_mark_complete(rq);
> > -	}
> > -
> >   	/* Flush the queued requests to the timeline list (for retiring). */
> >   	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >   		struct i915_priolist *p = to_priolist(rb);
> >   		priolist_for_each_request_consume(rq, rn, p) {
> >   			list_del_init(&rq->sched.link);
> > +
> >   			__i915_request_submit(rq);
> > -			dma_fence_set_error(&rq->fence, -EIO);
> > -			i915_request_mark_complete(rq);
> > +
> > +			i915_request_put(i915_request_mark_eio(rq));
> >   		}
> >   		rb_erase_cached(&p->node, &sched_engine->queue);
> > @@ -577,14 +854,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> > -static void guc_reset_finish(struct intel_engine_cs *engine)
> > +void intel_guc_submission_cancel_requests(struct intel_guc *guc)
> >   {
> > -	if (__tasklet_enable(&engine->sched_engine->tasklet))
> > -		/* And kick in case we missed a new request submission. */
> > -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> > +	struct intel_context *ce;
> > +	unsigned long index;
> > -	ENGINE_TRACE(engine, "depth->%d\n",
> > -		     atomic_read(&engine->sched_engine->tasklet.count));
> > +	xa_for_each(&guc->context_lookup, index, ce)
> > +		if (intel_context_is_pinned(ce))
> > +			guc_cancel_context_requests(ce);
> > +
> > +	guc_cancel_sched_engine_requests(guc->sched_engine);
> > +
> > +	/* GuC is blown away, drop all references to contexts */
> > +	xa_destroy(&guc->context_lookup);
> And again, most of this code is common to 'intel_guc_submission_reset' so
> could be moved to a shared helper function.
> 

Similar code structure but different.

We use the same xa_for_each loop + pinned check but call
__guc_reset_context vs guc_cancel_context_requests. Also here I don't
think we can clear &guc->context_lookup until
guc_cancel_sched_engine_requests completes. Not sure how to structure a
helper that makes sense.

Matt

> John.
> 
> > +}
> > +
> > +void intel_guc_submission_reset_finish(struct intel_guc *guc)
> > +{
> > +	/* Reset called during driver load or during wedge? */
> > +	if (unlikely(!guc_submission_initialized(guc) ||
> > +		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
> > +		return;
> > +
> > +	/*
> > +	 * Technically possible for either of these values to be non-zero here,
> > +	 * but very unlikely + harmless. Regardless let's add a warn so we can
> > +	 * see in CI if this happens frequently / a precursor to taking down the
> > +	 * machine.
> > +	 */
> > +	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
> > +	atomic_set(&guc->outstanding_submission_g2h, 0);
> > +
> > +	enable_submission(guc);
> >   }
> >   /*
> > @@ -651,6 +952,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> >   	else
> >   		trace_i915_request_guc_submit(rq);
> > +	if (unlikely(ret == -EPIPE))
> > +		disable_submission(guc);
> > +
> >   	return ret;
> >   }
> > @@ -663,7 +967,8 @@ static void guc_submit_request(struct i915_request *rq)
> >   	/* Will be called from irq-context when using foreign fences. */
> >   	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > +	if (submission_disabled(guc) || guc->stalled_request ||
> > +	    !i915_sched_engine_is_empty(sched_engine))
> >   		queue_request(sched_engine, rq, rq_prio(rq));
> >   	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> >   		tasklet_hi_schedule(&sched_engine->tasklet);
> > @@ -800,7 +1105,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> >   static int __guc_action_register_context(struct intel_guc *guc,
> >   					 u32 guc_id,
> > -					 u32 offset)
> > +					 u32 offset,
> > +					 bool loop)
> >   {
> >   	u32 action[] = {
> >   		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > @@ -809,10 +1115,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
> >   	};
> >   	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > -					     0, true);
> > +					     0, loop);
> >   }
> > -static int register_context(struct intel_context *ce)
> > +static int register_context(struct intel_context *ce, bool loop)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> >   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > @@ -820,11 +1126,12 @@ static int register_context(struct intel_context *ce)
> >   	trace_intel_context_register(ce);
> > -	return __guc_action_register_context(guc, ce->guc_id, offset);
> > +	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
> >   }
> >   static int __guc_action_deregister_context(struct intel_guc *guc,
> > -					   u32 guc_id)
> > +					   u32 guc_id,
> > +					   bool loop)
> >   {
> >   	u32 action[] = {
> >   		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > @@ -833,16 +1140,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> >   	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> >   					     G2H_LEN_DW_DEREGISTER_CONTEXT,
> > -					     true);
> > +					     loop);
> >   }
> > -static int deregister_context(struct intel_context *ce, u32 guc_id)
> > +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> >   	trace_intel_context_deregister(ce);
> > -	return __guc_action_deregister_context(guc, guc_id);
> > +	return __guc_action_deregister_context(guc, guc_id, loop);
> >   }
> >   static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > @@ -871,7 +1178,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
> >   	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> >   }
> > -static int guc_lrc_desc_pin(struct intel_context *ce)
> > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   {
> >   	struct intel_runtime_pm *runtime_pm =
> >   		&ce->engine->gt->i915->runtime_pm;
> > @@ -917,18 +1224,45 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
> >   	 */
> >   	if (context_registered) {
> >   		trace_intel_context_steal_guc_id(ce);
> > -		set_context_wait_for_deregister_to_register(ce);
> > -		intel_context_get(ce);
> > +		if (!loop) {
> > +			set_context_wait_for_deregister_to_register(ce);
> > +			intel_context_get(ce);
> > +		} else {
> > +			bool disabled;
> > +			unsigned long flags;
> > +
> > +			/* Seal race with Reset */
> > +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +			disabled = submission_disabled(guc);
> > +			if (likely(!disabled)) {
> > +				set_context_wait_for_deregister_to_register(ce);
> > +				intel_context_get(ce);
> > +			}
> > +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +			if (unlikely(disabled)) {
> > +				reset_lrc_desc(guc, desc_idx);
> > +				return 0;	/* Will get registered later */
> > +			}
> > +		}
> >   		/*
> >   		 * If stealing the guc_id, this ce has the same guc_id as the
> >   		 * context whos guc_id was stole.
> >   		 */
> >   		with_intel_runtime_pm(runtime_pm, wakeref)
> > -			ret = deregister_context(ce, ce->guc_id);
> > +			ret = deregister_context(ce, ce->guc_id, loop);
> > +		if (unlikely(ret == -EBUSY)) {
> > +			clr_context_wait_for_deregister_to_register(ce);
> > +			intel_context_put(ce);
> > +		} else if (unlikely(ret == -ENODEV))
> > +			ret = 0;	/* Will get registered later */
> >   	} else {
> >   		with_intel_runtime_pm(runtime_pm, wakeref)
> > -			ret = register_context(ce);
> > +			ret = register_context(ce, loop);
> > +		if (unlikely(ret == -EBUSY))
> > +			reset_lrc_desc(guc, desc_idx);
> > +		else if (unlikely(ret == -ENODEV))
> > +			ret = 0;	/* Will get registered later */
> >   	}
> >   	return ret;
> > @@ -991,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> >   	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
> >   	trace_intel_context_sched_disable(ce);
> > -	intel_context_get(ce);
> >   	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> >   				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > @@ -1003,6 +1336,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
> >   	set_context_pending_disable(ce);
> >   	clr_context_enabled(ce);
> > +	intel_context_get(ce);
> >   	return ce->guc_id;
> >   }
> > @@ -1015,7 +1349,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   	u16 guc_id;
> >   	intel_wakeref_t wakeref;
> > -	if (context_guc_id_invalid(ce) ||
> > +	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
> >   	    !lrc_desc_registered(guc, ce->guc_id)) {
> >   		clr_context_enabled(ce);
> >   		goto unpin;
> > @@ -1036,6 +1370,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   	 * a request before we set the 'context_pending_disable' flag here.
> >   	 */
> >   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >   		return;
> >   	}
> >   	guc_id = prep_context_pending_disable(ce);
> > @@ -1052,19 +1387,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >   {
> > -	struct intel_engine_cs *engine = ce->engine;
> > -	struct intel_guc *guc = &engine->gt->uc.guc;
> > -	unsigned long flags;
> > +	struct intel_guc *guc = ce_to_guc(ce);
> >   	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> >   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> >   	GEM_BUG_ON(context_enabled(ce));
> > -	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > -	set_context_destroyed(ce);
> > -	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > -
> > -	deregister_context(ce, ce->guc_id);
> > +	deregister_context(ce, ce->guc_id, true);
> >   }
> >   static void __guc_context_destroy(struct intel_context *ce)
> > @@ -1092,13 +1421,15 @@ static void guc_context_destroy(struct kref *kref)
> >   	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> >   	intel_wakeref_t wakeref;
> >   	unsigned long flags;
> > +	bool disabled;
> >   	/*
> >   	 * If the guc_id is invalid this context has been stolen and we can free
> >   	 * it immediately. Also can be freed immediately if the context is not
> >   	 * registered with the GuC.
> >   	 */
> > -	if (context_guc_id_invalid(ce) ||
> > +	if (submission_disabled(guc) ||
> > +	    context_guc_id_invalid(ce) ||
> >   	    !lrc_desc_registered(guc, ce->guc_id)) {
> >   		release_guc_id(guc, ce);
> >   		__guc_context_destroy(ce);
> > @@ -1125,6 +1456,18 @@ static void guc_context_destroy(struct kref *kref)
> >   		list_del_init(&ce->guc_id_link);
> >   	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +	/* Seal race with Reset */
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	disabled = submission_disabled(guc);
> > +	if (likely(!disabled))
> > +		set_context_destroyed(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +	if (unlikely(disabled)) {
> > +		release_guc_id(guc, ce);
> > +		__guc_context_destroy(ce);
> > +		return;
> > +	}
> > +
> >   	/*
> >   	 * We defer GuC context deregistration until the context is destroyed
> >   	 * in order to save on CTBs. With this optimization ideally we only need
> > @@ -1212,8 +1555,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
> >   {
> >   	unsigned long flags;
> > -	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
> > -
> >   	spin_lock_irqsave(&ce->guc_state.lock, flags);
> >   	clr_context_wait_for_deregister_to_register(ce);
> >   	__guc_signal_context_fence(ce);
> > @@ -1222,8 +1563,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
> >   static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> >   {
> > -	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > -		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > +	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
> > +		!submission_disabled(ce_to_guc(ce));
> >   }
> >   static int guc_request_alloc(struct i915_request *rq)
> > @@ -1281,8 +1623,12 @@ static int guc_request_alloc(struct i915_request *rq)
> >   	if (unlikely(ret < 0))
> >   		return ret;;
> >   	if (context_needs_register(ce, !!ret)) {
> > -		ret = guc_lrc_desc_pin(ce);
> > +		ret = guc_lrc_desc_pin(ce, true);
> >   		if (unlikely(ret)) {	/* unwind */
> > +			if (ret == -EPIPE) {
> > +				disable_submission(guc);
> > +				goto out;	/* GPU will be reset */
> > +			}
> >   			atomic_dec(&ce->guc_id_ref);
> >   			unpin_guc_id(guc, ce);
> >   			return ret;
> > @@ -1319,20 +1665,6 @@ static int guc_request_alloc(struct i915_request *rq)
> >   	return 0;
> >   }
> > -static struct intel_engine_cs *
> > -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > -{
> > -	struct intel_engine_cs *engine;
> > -	intel_engine_mask_t tmp, mask = ve->mask;
> > -	unsigned int num_siblings = 0;
> > -
> > -	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > -		if (num_siblings++ == sibling)
> > -			return engine;
> > -
> > -	return NULL;
> > -}
> > -
> >   static int guc_virtual_context_pre_pin(struct intel_context *ce,
> >   				       struct i915_gem_ww_ctx *ww,
> >   				       void **vaddr)
> > @@ -1528,7 +1860,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
> >   {
> >   	if (context_guc_id_invalid(ce))
> >   		pin_guc_id(guc, ce);
> > -	guc_lrc_desc_pin(ce);
> > +	guc_lrc_desc_pin(ce, true);
> >   }
> >   static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > @@ -1599,10 +1931,10 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->sched_engine->schedule = i915_schedule;
> > -	engine->reset.prepare = guc_reset_prepare;
> > -	engine->reset.rewind = guc_reset_rewind;
> > -	engine->reset.cancel = guc_reset_cancel;
> > -	engine->reset.finish = guc_reset_finish;
> > +	engine->reset.prepare = guc_reset_nop;
> > +	engine->reset.rewind = guc_rewind_nop;
> > +	engine->reset.cancel = guc_reset_nop;
> > +	engine->reset.finish = guc_reset_nop;
> >   	engine->emit_flush = gen8_emit_flush_xcs;
> >   	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
> > @@ -1651,6 +1983,17 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
> >   	intel_engine_set_irq_handler(engine, cs_irq_handler);
> >   }
> > +static void guc_sched_engine_destroy(struct kref *kref)
> > +{
> > +	struct i915_sched_engine *sched_engine =
> > +		container_of(kref, typeof(*sched_engine), ref);
> > +	struct intel_guc *guc = sched_engine->private_data;
> > +
> > +	guc->sched_engine = NULL;
> > +	tasklet_kill(&sched_engine->tasklet); /* flush the callback */
> > +	kfree(sched_engine);
> > +}
> > +
> >   int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > @@ -1669,6 +2012,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   		guc->sched_engine->schedule = i915_schedule;
> >   		guc->sched_engine->private_data = guc;
> > +		guc->sched_engine->destroy = guc_sched_engine_destroy;
> >   		tasklet_setup(&guc->sched_engine->tasklet,
> >   			      guc_submission_tasklet);
> >   	}
> > @@ -1775,7 +2119,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   		 * register this context.
> >   		 */
> >   		with_intel_runtime_pm(runtime_pm, wakeref)
> > -			register_context(ce);
> > +			register_context(ce, true);
> >   		guc_signal_context_fence(ce);
> >   		intel_context_put(ce);
> >   	} else if (context_destroyed(ce)) {
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > index 6d8b9233214e..f0b02200aa01 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
> >   {
> >   	struct intel_guc *guc = &uc->guc;
> > -	if (!intel_guc_is_ready(guc))
> > +
> > +	/* Nothing to do if GuC isn't supported */
> > +	if (!intel_uc_supports_guc(uc))
> >   		return;
> > +	/* Firmware expected to be running when this function is called */
> > +	if (!intel_guc_is_ready(guc))
> > +		goto sanitize;
> > +
> > +	if (intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_reset_prepare(guc);
> > +
> > +sanitize:
> >   	__uc_sanitize(uc);
> >   }
> > +void intel_uc_reset(struct intel_uc *uc, bool stalled)
> > +{
> > +	struct intel_guc *guc = &uc->guc;
> > +
> > +	/* Firmware can not be running when this function is called  */
> > +	if (intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_reset(guc, stalled);
> > +}
> > +
> > +void intel_uc_reset_finish(struct intel_uc *uc)
> > +{
> > +	struct intel_guc *guc = &uc->guc;
> > +
> > +	/* Firmware expected to be running when this function is called */
> > +	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_reset_finish(guc);
> > +}
> > +
> > +void intel_uc_cancel_requests(struct intel_uc *uc)
> > +{
> > +	struct intel_guc *guc = &uc->guc;
> > +
> > +	/* Firmware can not be running when this function is called  */
> > +	if (intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_cancel_requests(guc);
> > +}
> > +
> >   void intel_uc_runtime_suspend(struct intel_uc *uc)
> >   {
> >   	struct intel_guc *guc = &uc->guc;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > index c4cef885e984..eaa3202192ac 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
> >   void intel_uc_driver_remove(struct intel_uc *uc);
> >   void intel_uc_init_mmio(struct intel_uc *uc);
> >   void intel_uc_reset_prepare(struct intel_uc *uc);
> > +void intel_uc_reset(struct intel_uc *uc, bool stalled);
> > +void intel_uc_reset_finish(struct intel_uc *uc);
> > +void intel_uc_cancel_requests(struct intel_uc *uc);
> >   void intel_uc_suspend(struct intel_uc *uc);
> >   void intel_uc_runtime_suspend(struct intel_uc *uc);
> >   int intel_uc_resume(struct intel_uc *uc);
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface
@ 2021-07-20 20:59       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-20 20:59 UTC (permalink / raw)
  To: John Harrison; +Cc: intel-gfx, dri-devel

On Tue, Jul 20, 2021 at 01:19:48PM -0700, John Harrison wrote:
> On 7/16/2021 13:16, Matthew Brost wrote:
> > Reset implementation for new GuC interface. This is the legacy reset
> > implementation which is called when the i915 owns the engine hang check.
> > Future patches will offload the engine hang check to GuC but we will
> > continue to maintain this legacy path as a fallback and this code path
> > is also required if the GuC dies.
> > 
> > With the new GuC interface it is not possible to reset individual
> > engines - it is only possible to reset the GPU entirely. This patch
> > forces an entire chip reset if any engine hangs.
> > 
> > v2:
> >   (Michal)
> >    - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
> > v3:
> >   (John H)
> >    - Split into a series of smaller patches
> While the split happened, it doesn't look like any of the other comments
> were address. Repeated below for clarity. Also, Tvrtko has a bunch of

Opps, I thought you asked for the split and didn't realize you had comments.

I thought I addressed Tvrtko's comments but a I'll circle back to them.

> outstanding comments too.
> 
> > 
> > Cc: John Harrison <john.c.harrison@intel.com>
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_gt_pm.c         |   6 +-
> >   drivers/gpu/drm/i915/gt/intel_reset.c         |  18 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.c        |  13 -
> >   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   8 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 562 ++++++++++++++----
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.c         |  39 +-
> >   drivers/gpu/drm/i915/gt/uc/intel_uc.h         |   3 +
> >   7 files changed, 515 insertions(+), 134 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > index aef3084e8b16..463a6ae605a0 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > @@ -174,8 +174,6 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> >   	if (intel_gt_is_wedged(gt))
> >   		intel_gt_unset_wedged(gt);
> > -	intel_uc_sanitize(&gt->uc);
> > -
> >   	for_each_engine(engine, gt, id)
> >   		if (engine->reset.prepare)
> >   			engine->reset.prepare(engine);
> > @@ -191,6 +189,8 @@ static void gt_sanitize(struct intel_gt *gt, bool force)
> >   			__intel_engine_reset(engine, false);
> >   	}
> > +	intel_uc_reset(&gt->uc, false);
> > +
> >   	for_each_engine(engine, gt, id)
> >   		if (engine->reset.finish)
> >   			engine->reset.finish(engine);
> > @@ -243,6 +243,8 @@ int intel_gt_resume(struct intel_gt *gt)
> >   		goto err_wedged;
> >   	}
> > +	intel_uc_reset_finish(&gt->uc);
> > +
> >   	intel_rps_enable(&gt->rps);
> >   	intel_llc_enable(&gt->llc);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index 72251638d4ea..2987282dff6d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -826,6 +826,8 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
> >   		__intel_engine_reset(engine, stalled_mask & engine->mask);
> >   	local_bh_enable();
> > +	intel_uc_reset(&gt->uc, true);
> > +
> >   	intel_ggtt_restore_fences(gt->ggtt);
> >   	return err;
> > @@ -850,6 +852,8 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
> >   		if (awake & engine->mask)
> >   			intel_engine_pm_put(engine);
> >   	}
> > +
> > +	intel_uc_reset_finish(&gt->uc);
> >   }
> >   static void nop_submit_request(struct i915_request *request)
> > @@ -903,6 +907,7 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
> >   	for_each_engine(engine, gt, id)
> >   		if (engine->reset.cancel)
> >   			engine->reset.cancel(engine);
> > +	intel_uc_cancel_requests(&gt->uc);
> >   	local_bh_enable();
> >   	reset_finish(gt, awake);
> > @@ -1191,6 +1196,9 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> >   	ENGINE_TRACE(engine, "flags=%lx\n", gt->reset.flags);
> >   	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags));
> > +	if (intel_engine_uses_guc(engine))
> > +		return -ENODEV;
> > +
> >   	if (!intel_engine_pm_get_if_awake(engine))
> >   		return 0;
> > @@ -1201,13 +1209,10 @@ int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
> >   			   "Resetting %s for %s\n", engine->name, msg);
> >   	atomic_inc(&engine->i915->gpu_error.reset_engine_count[engine->uabi_class]);
> > -	if (intel_engine_uses_guc(engine))
> > -		ret = intel_guc_reset_engine(&engine->gt->uc.guc, engine);
> > -	else
> > -		ret = intel_gt_reset_engine(engine);
> > +	ret = intel_gt_reset_engine(engine);
> >   	if (ret) {
> >   		/* If we fail here, we expect to fallback to a global reset */
> > -		ENGINE_TRACE(engine, "Failed to reset, err: %d\n", ret);
> > +		ENGINE_TRACE(engine, "Failed to reset %s, err: %d\n", engine->name, ret);
> >   		goto out;
> >   	}
> > @@ -1341,7 +1346,8 @@ void intel_gt_handle_error(struct intel_gt *gt,
> >   	 * Try engine reset when available. We fall back to full reset if
> >   	 * single reset fails.
> >   	 */
> > -	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> > +	if (!intel_uc_uses_guc_submission(&gt->uc) &&
> > +	    intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
> >   		local_bh_disable();
> >   		for_each_engine_masked(engine, gt, engine_mask, tmp) {
> >   			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > index 6661dcb02239..9b09395b998f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> > @@ -572,19 +572,6 @@ int intel_guc_suspend(struct intel_guc *guc)
> >   	return 0;
> >   }
> > -/**
> > - * intel_guc_reset_engine() - ask GuC to reset an engine
> > - * @guc:	intel_guc structure
> > - * @engine:	engine to be reset
> > - */
> > -int intel_guc_reset_engine(struct intel_guc *guc,
> > -			   struct intel_engine_cs *engine)
> > -{
> > -	/* XXX: to be implemented with submission interface rework */
> > -
> > -	return -ENODEV;
> > -}
> > -
> >   /**
> >    * intel_guc_resume() - notify GuC resuming from suspend state
> >    * @guc:	the guc
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > index 3cc566565224..d75a76882a44 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > @@ -242,14 +242,16 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
> >   int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout);
> > -int intel_guc_reset_engine(struct intel_guc *guc,
> > -			   struct intel_engine_cs *engine);
> > -
> >   int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   					  const u32 *msg, u32 len);
> >   int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   				     const u32 *msg, u32 len);
> > +void intel_guc_submission_reset_prepare(struct intel_guc *guc);
> > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
> > +void intel_guc_submission_reset_finish(struct intel_guc *guc);
> > +void intel_guc_submission_cancel_requests(struct intel_guc *guc);
> > +
> >   void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index f8a6077fa3f9..0f5823922369 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -141,7 +141,7 @@ context_wait_for_deregister_to_register(struct intel_context *ce)
> >   static inline void
> >   set_context_wait_for_deregister_to_register(struct intel_context *ce)
> >   {
> > -	/* Only should be called from guc_lrc_desc_pin() */
> > +	/* Only should be called from guc_lrc_desc_pin() without lock */
> >   	ce->guc_state.sched_state |=
> >   		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> >   }
> > @@ -241,15 +241,31 @@ static int guc_lrc_desc_pool_create(struct intel_guc *guc)
> >   static void guc_lrc_desc_pool_destroy(struct intel_guc *guc)
> >   {
> > +	guc->lrc_desc_pool_vaddr = NULL;
> >   	i915_vma_unpin_and_release(&guc->lrc_desc_pool, I915_VMA_RELEASE_MAP);
> >   }
> > +static inline bool guc_submission_initialized(struct intel_guc *guc)
> > +{
> > +	return guc->lrc_desc_pool_vaddr != NULL;
> > +}
> > +
> >   static inline void reset_lrc_desc(struct intel_guc *guc, u32 id)
> >   {
> > -	struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > +	if (likely(guc_submission_initialized(guc))) {
> > +		struct guc_lrc_desc *desc = __get_lrc_desc(guc, id);
> > +		unsigned long flags;
> > -	memset(desc, 0, sizeof(*desc));
> > -	xa_erase_irq(&guc->context_lookup, id);
> > +		memset(desc, 0, sizeof(*desc));
> > +
> > +		/*
> > +		 * xarray API doesn't have xa_erase_irqsave wrapper, so calling
> > +		 * the lower level functions directly.
> > +		 */
> > +		xa_lock_irqsave(&guc->context_lookup, flags);
> > +		__xa_erase(&guc->context_lookup, id);
> > +		xa_unlock_irqrestore(&guc->context_lookup, flags);
> > +	}
> >   }
> >   static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> > @@ -260,7 +276,15 @@ static inline bool lrc_desc_registered(struct intel_guc *guc, u32 id)
> >   static inline void set_lrc_desc_registered(struct intel_guc *guc, u32 id,
> >   					   struct intel_context *ce)
> >   {
> > -	xa_store_irq(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > +	unsigned long flags;
> > +
> > +	/*
> > +	 * xarray API doesn't have xa_save_irqsave wrapper, so calling the
> > +	 * lower level functions directly.
> > +	 */
> > +	xa_lock_irqsave(&guc->context_lookup, flags);
> > +	__xa_store(&guc->context_lookup, id, ce, GFP_ATOMIC);
> > +	xa_unlock_irqrestore(&guc->context_lookup, flags);
> >   }
> >   static int guc_submission_send_busy_loop(struct intel_guc* guc,
> > @@ -326,6 +350,8 @@ int intel_guc_wait_for_idle(struct intel_guc *guc, long timeout)
> >   					true, timeout);
> >   }
> > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
> > +
> >   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   {
> >   	int err;
> > @@ -333,11 +359,22 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   	u32 action[3];
> >   	int len = 0;
> >   	u32 g2h_len_dw = 0;
> > -	bool enabled = context_enabled(ce);
> > +	bool enabled;
> >   	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> >   	GEM_BUG_ON(context_guc_id_invalid(ce));
> > +	/*
> > +	 * Corner case where the GuC firmware was blown away and reloaded while
> > +	 * this context was pinned.
> > +	 */
> > +	if (unlikely(!lrc_desc_registered(guc, ce->guc_id))) {
> > +		err = guc_lrc_desc_pin(ce, false);
> > +		if (unlikely(err))
> > +			goto out;
> > +	}
> > +	enabled = context_enabled(ce);
> > +
> >   	if (!enabled) {
> >   		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >   		action[len++] = ce->guc_id;
> > @@ -360,6 +397,7 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   		intel_context_put(ce);
> >   	}
> > +out:
> >   	return err;
> >   }
> > @@ -414,15 +452,10 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> >   	if (submit) {
> >   		guc_set_lrc_tail(last);
> >   resubmit:
> > -		/*
> > -		 * We only check for -EBUSY here even though it is possible for
> > -		 * -EDEADLK to be returned. If -EDEADLK is returned, the GuC has
> > -		 * died and a full GT reset needs to be done. The hangcheck will
> > -		 * eventually detect that the GuC has died and trigger this
> > -		 * reset so no need to handle -EDEADLK here.
> > -		 */
> >   		ret = guc_add_request(guc, last);
> > -		if (ret == -EBUSY) {
> > +		if (unlikely(ret == -EPIPE))
> > +			goto deadlk;
> > +		else if (ret == -EBUSY) {
> >   			tasklet_schedule(&sched_engine->tasklet);
> >   			guc->stalled_request = last;
> >   			return false;
> > @@ -432,6 +465,11 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
> >   	guc->stalled_request = NULL;
> >   	return submit;
> > +
> > +deadlk:
> > +	sched_engine->tasklet.callback = NULL;
> > +	tasklet_disable_nosync(&sched_engine->tasklet);
> > +	return false;
> >   }
> >   static void guc_submission_tasklet(struct tasklet_struct *t)
> > @@ -458,27 +496,166 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
> >   		intel_engine_signal_breadcrumbs(engine);
> >   }
> > -static void guc_reset_prepare(struct intel_engine_cs *engine)
> > +static void __guc_context_destroy(struct intel_context *ce);
> > +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> > +static void guc_signal_context_fence(struct intel_context *ce);
> > +
> > +static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> > +{
> > +	struct intel_context *ce;
> > +	unsigned long index, flags;
> > +	bool pending_disable, pending_enable, deregister, destroyed;
> > +
> > +	xa_for_each(&guc->context_lookup, index, ce) {
> > +		/* Flush context */
> > +		spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +		/*
> > +		 * Once we are at this point submission_disabled() is guaranteed
> > +		 * to visible to all callers who set the below flags (see above
> Still needs to be 'to be visible'
> 

Fixed.

> > +		 * flush and flushes in reset_prepare). If submission_disabled()
> > +		 * is set, the caller shouldn't set these flags.
> > +		 */
> > +
> > +		destroyed = context_destroyed(ce);
> > +		pending_enable = context_pending_enable(ce);
> > +		pending_disable = context_pending_disable(ce);
> > +		deregister = context_wait_for_deregister_to_register(ce);
> > +		init_sched_state(ce);
> > +
> > +		if (pending_enable || destroyed || deregister) {
> > +			atomic_dec(&guc->outstanding_submission_g2h);
> > +			if (deregister)
> > +				guc_signal_context_fence(ce);
> > +			if (destroyed) {
> > +				release_guc_id(guc, ce);
> > +				__guc_context_destroy(ce);
> > +			}
> > +			if (pending_enable|| deregister)
> > +				intel_context_put(ce);
> > +		}
> > +
> > +		/* Not mutualy exclusive with above if statement. */
> > +		if (pending_disable) {
> > +			guc_signal_context_fence(ce);
> > +			intel_context_sched_disable_unpin(ce);
> > +			atomic_dec(&guc->outstanding_submission_g2h);
> > +			intel_context_put(ce);
> > +		}
> > +	}
> > +}
> > +
> > +static inline bool
> > +submission_disabled(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +
> > +	return unlikely(!sched_engine ||
> > +			!__tasklet_is_enabled(&sched_engine->tasklet));
> > +}
> > +
> > +static void disable_submission(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +
> > +	if (__tasklet_is_enabled(&sched_engine->tasklet)) {
> > +		GEM_BUG_ON(!guc->ct.enabled);
> > +		__tasklet_disable_sync_once(&sched_engine->tasklet);
> > +		sched_engine->tasklet.callback = NULL;
> > +	}
> > +}
> > +
> > +static void enable_submission(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&guc->sched_engine->lock, flags);
> > +	sched_engine->tasklet.callback = guc_submission_tasklet;
> > +	wmb();
> > +	if (!__tasklet_is_enabled(&sched_engine->tasklet) &&
> > +	    __tasklet_enable(&sched_engine->tasklet)) {
> > +		GEM_BUG_ON(!guc->ct.enabled);
> > +
> > +		/* And kick in case we missed a new request submission. */
> > +		tasklet_hi_schedule(&sched_engine->tasklet);
> > +	}
> > +	spin_unlock_irqrestore(&guc->sched_engine->lock, flags);
> > +}
> > +
> > +static void guc_flush_submissions(struct intel_guc *guc)
> > +{
> > +	struct i915_sched_engine * const sched_engine = guc->sched_engine;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +void intel_guc_submission_reset_prepare(struct intel_guc *guc)
> >   {
> > -	ENGINE_TRACE(engine, "\n");
> > +	int i;
> > +
> > +	if (unlikely(!guc_submission_initialized(guc)))
> > +		/* Reset called during driver load? GuC not yet initialised! */
> > +		return;
> Still think this should have braces. Maybe not syntactically required but
> just looks wrong.
> 

Sure.

> > +
> > +	disable_submission(guc);
> > +	guc->interrupts.disable(guc);
> > +
> > +	/* Flush IRQ handler */
> > +	spin_lock_irq(&guc_to_gt(guc)->irq_lock);
> > +	spin_unlock_irq(&guc_to_gt(guc)->irq_lock);
> > +
> > +	guc_flush_submissions(guc);
> >   	/*
> > -	 * Prevent request submission to the hardware until we have
> > -	 * completed the reset in i915_gem_reset_finish(). If a request
> > -	 * is completed by one engine, it may then queue a request
> > -	 * to a second via its execlists->tasklet *just* as we are
> > -	 * calling engine->init_hw() and also writing the ELSP.
> > -	 * Turning off the execlists->tasklet until the reset is over
> > -	 * prevents the race.
> > -	 */
> > -	__tasklet_disable_sync_once(&engine->sched_engine->tasklet);
> > +	 * Handle any outstanding G2Hs before reset. Call IRQ handler directly
> > +	 * each pass as interrupt have been disabled. We always scrub for
> > +	 * outstanding G2H as it is possible for outstanding_submission_g2h to
> > +	 * be incremented after the context state update.
> > + 	 */
> > +	for (i = 0; i < 4 && atomic_read(&guc->outstanding_submission_g2h); ++i) {
> > +		intel_guc_to_host_event_handler(guc);
> > +#define wait_for_reset(guc, wait_var) \
> > +		guc_wait_for_pending_msg(guc, wait_var, false, (HZ / 20))
> > +		do {
> > +			wait_for_reset(guc, &guc->outstanding_submission_g2h);
> > +		} while (!list_empty(&guc->ct.requests.incoming));
> > +	}
> > +	scrub_guc_desc_for_outstanding_g2h(guc);
> > +}
> > +
> > +static struct intel_engine_cs *
> > +guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > +{
> > +	struct intel_engine_cs *engine;
> > +	intel_engine_mask_t tmp, mask = ve->mask;
> > +	unsigned int num_siblings = 0;
> > +
> > +	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > +		if (num_siblings++ == sibling)
> > +			return engine;
> > +
> > +	return NULL;
> > +}
> > +
> > +static inline struct intel_engine_cs *
> > +__context_to_physical_engine(struct intel_context *ce)
> > +{
> > +	struct intel_engine_cs *engine = ce->engine;
> > +
> > +	if (intel_engine_is_virtual(engine))
> > +		engine = guc_virtual_get_sibling(engine, 0);
> > +
> > +	return engine;
> >   }
> > -static void guc_reset_state(struct intel_context *ce,
> > -			    struct intel_engine_cs *engine,
> > -			    u32 head,
> > -			    bool scrub)
> > +static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
> >   {
> > +	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
> > +
> >   	GEM_BUG_ON(!intel_context_is_pinned(ce));
> >   	/*
> > @@ -496,42 +673,147 @@ static void guc_reset_state(struct intel_context *ce,
> >   	lrc_update_regs(ce, engine, head);
> >   }
> > -static void guc_reset_rewind(struct intel_engine_cs *engine, bool stalled)
> > +static void guc_reset_nop(struct intel_engine_cs *engine)
> >   {
> > -	struct intel_engine_execlists * const execlists = &engine->execlists;
> > -	struct i915_request *rq;
> > +}
> > +
> > +static void guc_rewind_nop(struct intel_engine_cs *engine, bool stalled)
> > +{
> > +}
> > +
> > +static void
> > +__unwind_incomplete_requests(struct intel_context *ce)
> > +{
> > +	struct i915_request *rq, *rn;
> > +	struct list_head *pl;
> > +	int prio = I915_PRIORITY_INVALID;
> > +	struct i915_sched_engine * const sched_engine =
> > +		ce->engine->sched_engine;
> >   	unsigned long flags;
> > -	spin_lock_irqsave(&engine->sched_engine->lock, flags);
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	spin_lock(&ce->guc_active.lock);
> > +	list_for_each_entry_safe(rq, rn,
> > +				 &ce->guc_active.requests,
> > +				 sched.link) {
> > +		if (i915_request_completed(rq))
> > +			continue;
> > +
> > +		list_del_init(&rq->sched.link);
> > +		spin_unlock(&ce->guc_active.lock);
> > +
> > +		__i915_request_unsubmit(rq);
> > +
> > +		/* Push the request back into the queue for later resubmission. */
> > +		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> > +		if (rq_prio(rq) != prio) {
> > +			prio = rq_prio(rq);
> > +			pl = i915_sched_lookup_priolist(sched_engine, prio);
> > +		}
> > +		GEM_BUG_ON(i915_sched_engine_is_empty(sched_engine));
> > -	/* Push back any incomplete requests for replay after the reset. */
> > -	rq = execlists_unwind_incomplete_requests(execlists);
> > -	if (!rq)
> > -		goto out_unlock;
> > +		list_add_tail(&rq->sched.link, pl);
> > +		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +
> > +		spin_lock(&ce->guc_active.lock);
> > +	}
> > +	spin_unlock(&ce->guc_active.lock);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +static struct i915_request *context_find_active_request(struct intel_context *ce)
> > +{
> > +	struct i915_request *rq, *active = NULL;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&ce->guc_active.lock, flags);
> > +	list_for_each_entry_reverse(rq, &ce->guc_active.requests,
> > +				    sched.link) {
> > +		if (i915_request_completed(rq))
> > +			break;
> > +
> > +		active = rq;
> > +	}
> > +	spin_unlock_irqrestore(&ce->guc_active.lock, flags);
> > +
> > +	return active;
> > +}
> > +
> > +static void __guc_reset_context(struct intel_context *ce, bool stalled)
> > +{
> > +	struct i915_request *rq;
> > +	u32 head;
> > +
> > +	/*
> > +	 * GuC will implicitly mark the context as non-schedulable
> > +	 * when it sends the reset notification. Make sure our state
> > +	 * reflects this change. The context will be marked enabled
> > +	 * on resubmission.
> > +	 */
> > +	clr_context_enabled(ce);
> > +
> > +	rq = context_find_active_request(ce);
> > +	if (!rq) {
> > +		head = ce->ring->tail;
> > +		stalled = false;
> > +		goto out_replay;
> > +	}
> >   	if (!i915_request_started(rq))
> >   		stalled = false;
> > +	GEM_BUG_ON(i915_active_is_idle(&ce->active));
> > +	head = intel_ring_wrap(ce->ring, rq->head);
> >   	__i915_request_reset(rq, stalled);
> > -	guc_reset_state(rq->context, engine, rq->head, stalled);
> > -out_unlock:
> > -	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> > +out_replay:
> > +	guc_reset_state(ce, head, stalled);
> > +	__unwind_incomplete_requests(ce);
> > +}
> > +
> > +void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> > +{
> > +	struct intel_context *ce;
> > +	unsigned long index;
> > +
> > +	if (unlikely(!guc_submission_initialized(guc)))
> > +		/* Reset called during driver load? GuC not yet initialised! */
> > +		return;
> And again.
> 

Yep.

> > +
> > +	xa_for_each(&guc->context_lookup, index, ce)
> > +		if (intel_context_is_pinned(ce))
> > +			__guc_reset_context(ce, stalled);
> > +
> > +	/* GuC is blown away, drop all references to contexts */
> > +	xa_destroy(&guc->context_lookup);
> > +}
> > +
> > +static void guc_cancel_context_requests(struct intel_context *ce)
> > +{
> > +	struct i915_sched_engine *sched_engine = ce_to_guc(ce)->sched_engine;
> > +	struct i915_request *rq;
> > +	unsigned long flags;
> > +
> > +	/* Mark all executing requests as skipped. */
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	spin_lock(&ce->guc_active.lock);
> > +	list_for_each_entry(rq, &ce->guc_active.requests, sched.link)
> > +		i915_request_put(i915_request_mark_eio(rq));
> > +	spin_unlock(&ce->guc_active.lock);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> > -static void guc_reset_cancel(struct intel_engine_cs *engine)
> > +static void
> > +guc_cancel_sched_engine_requests(struct i915_sched_engine *sched_engine)
> >   {
> > -	struct i915_sched_engine * const sched_engine = engine->sched_engine;
> >   	struct i915_request *rq, *rn;
> >   	struct rb_node *rb;
> >   	unsigned long flags;
> >   	/* Can be called during boot if GuC fails to load */
> > -	if (!engine->gt)
> > +	if (!sched_engine)
> >   		return;
> > -	ENGINE_TRACE(engine, "\n");
> > -
> >   	/*
> >   	 * Before we call engine->cancel_requests(), we should have exclusive
> >   	 * access to the submission state. This is arranged for us by the
> > @@ -548,21 +830,16 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	 */
> >   	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	/* Mark all executing requests as skipped. */
> > -	list_for_each_entry(rq, &sched_engine->requests, sched.link) {
> > -		i915_request_set_error_once(rq, -EIO);
> > -		i915_request_mark_complete(rq);
> > -	}
> > -
> >   	/* Flush the queued requests to the timeline list (for retiring). */
> >   	while ((rb = rb_first_cached(&sched_engine->queue))) {
> >   		struct i915_priolist *p = to_priolist(rb);
> >   		priolist_for_each_request_consume(rq, rn, p) {
> >   			list_del_init(&rq->sched.link);
> > +
> >   			__i915_request_submit(rq);
> > -			dma_fence_set_error(&rq->fence, -EIO);
> > -			i915_request_mark_complete(rq);
> > +
> > +			i915_request_put(i915_request_mark_eio(rq));
> >   		}
> >   		rb_erase_cached(&p->node, &sched_engine->queue);
> > @@ -577,14 +854,38 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
> >   	spin_unlock_irqrestore(&sched_engine->lock, flags);
> >   }
> > -static void guc_reset_finish(struct intel_engine_cs *engine)
> > +void intel_guc_submission_cancel_requests(struct intel_guc *guc)
> >   {
> > -	if (__tasklet_enable(&engine->sched_engine->tasklet))
> > -		/* And kick in case we missed a new request submission. */
> > -		tasklet_hi_schedule(&engine->sched_engine->tasklet);
> > +	struct intel_context *ce;
> > +	unsigned long index;
> > -	ENGINE_TRACE(engine, "depth->%d\n",
> > -		     atomic_read(&engine->sched_engine->tasklet.count));
> > +	xa_for_each(&guc->context_lookup, index, ce)
> > +		if (intel_context_is_pinned(ce))
> > +			guc_cancel_context_requests(ce);
> > +
> > +	guc_cancel_sched_engine_requests(guc->sched_engine);
> > +
> > +	/* GuC is blown away, drop all references to contexts */
> > +	xa_destroy(&guc->context_lookup);
> And again, most of this code is common to 'intel_guc_submission_reset' so
> could be moved to a shared helper function.
> 

Similar code structure but different.

We use the same xa_for_each loop + pinned check but call
__guc_reset_context vs guc_cancel_context_requests. Also here I don't
think we can clear &guc->context_lookup until
guc_cancel_sched_engine_requests completes. Not sure how to structure a
helper that makes sense.

Matt

> John.
> 
> > +}
> > +
> > +void intel_guc_submission_reset_finish(struct intel_guc *guc)
> > +{
> > +	/* Reset called during driver load or during wedge? */
> > +	if (unlikely(!guc_submission_initialized(guc) ||
> > +		     test_bit(I915_WEDGED, &guc_to_gt(guc)->reset.flags)))
> > +		return;
> > +
> > +	/*
> > +	 * Technically possible for either of these values to be non-zero here,
> > +	 * but very unlikely + harmless. Regardless let's add a warn so we can
> > +	 * see in CI if this happens frequently / a precursor to taking down the
> > +	 * machine.
> > +	 */
> > +	GEM_WARN_ON(atomic_read(&guc->outstanding_submission_g2h));
> > +	atomic_set(&guc->outstanding_submission_g2h, 0);
> > +
> > +	enable_submission(guc);
> >   }
> >   /*
> > @@ -651,6 +952,9 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> >   	else
> >   		trace_i915_request_guc_submit(rq);
> > +	if (unlikely(ret == -EPIPE))
> > +		disable_submission(guc);
> > +
> >   	return ret;
> >   }
> > @@ -663,7 +967,8 @@ static void guc_submit_request(struct i915_request *rq)
> >   	/* Will be called from irq-context when using foreign fences. */
> >   	spin_lock_irqsave(&sched_engine->lock, flags);
> > -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
> > +	if (submission_disabled(guc) || guc->stalled_request ||
> > +	    !i915_sched_engine_is_empty(sched_engine))
> >   		queue_request(sched_engine, rq, rq_prio(rq));
> >   	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> >   		tasklet_hi_schedule(&sched_engine->tasklet);
> > @@ -800,7 +1105,8 @@ static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> >   static int __guc_action_register_context(struct intel_guc *guc,
> >   					 u32 guc_id,
> > -					 u32 offset)
> > +					 u32 offset,
> > +					 bool loop)
> >   {
> >   	u32 action[] = {
> >   		INTEL_GUC_ACTION_REGISTER_CONTEXT,
> > @@ -809,10 +1115,10 @@ static int __guc_action_register_context(struct intel_guc *guc,
> >   	};
> >   	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > -					     0, true);
> > +					     0, loop);
> >   }
> > -static int register_context(struct intel_context *ce)
> > +static int register_context(struct intel_context *ce, bool loop)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> >   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> > @@ -820,11 +1126,12 @@ static int register_context(struct intel_context *ce)
> >   	trace_intel_context_register(ce);
> > -	return __guc_action_register_context(guc, ce->guc_id, offset);
> > +	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
> >   }
> >   static int __guc_action_deregister_context(struct intel_guc *guc,
> > -					   u32 guc_id)
> > +					   u32 guc_id,
> > +					   bool loop)
> >   {
> >   	u32 action[] = {
> >   		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> > @@ -833,16 +1140,16 @@ static int __guc_action_deregister_context(struct intel_guc *guc,
> >   	return guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> >   					     G2H_LEN_DW_DEREGISTER_CONTEXT,
> > -					     true);
> > +					     loop);
> >   }
> > -static int deregister_context(struct intel_context *ce, u32 guc_id)
> > +static int deregister_context(struct intel_context *ce, u32 guc_id, bool loop)
> >   {
> >   	struct intel_guc *guc = ce_to_guc(ce);
> >   	trace_intel_context_deregister(ce);
> > -	return __guc_action_deregister_context(guc, guc_id);
> > +	return __guc_action_deregister_context(guc, guc_id, loop);
> >   }
> >   static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
> > @@ -871,7 +1178,7 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
> >   	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> >   }
> > -static int guc_lrc_desc_pin(struct intel_context *ce)
> > +static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   {
> >   	struct intel_runtime_pm *runtime_pm =
> >   		&ce->engine->gt->i915->runtime_pm;
> > @@ -917,18 +1224,45 @@ static int guc_lrc_desc_pin(struct intel_context *ce)
> >   	 */
> >   	if (context_registered) {
> >   		trace_intel_context_steal_guc_id(ce);
> > -		set_context_wait_for_deregister_to_register(ce);
> > -		intel_context_get(ce);
> > +		if (!loop) {
> > +			set_context_wait_for_deregister_to_register(ce);
> > +			intel_context_get(ce);
> > +		} else {
> > +			bool disabled;
> > +			unsigned long flags;
> > +
> > +			/* Seal race with Reset */
> > +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +			disabled = submission_disabled(guc);
> > +			if (likely(!disabled)) {
> > +				set_context_wait_for_deregister_to_register(ce);
> > +				intel_context_get(ce);
> > +			}
> > +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +			if (unlikely(disabled)) {
> > +				reset_lrc_desc(guc, desc_idx);
> > +				return 0;	/* Will get registered later */
> > +			}
> > +		}
> >   		/*
> >   		 * If stealing the guc_id, this ce has the same guc_id as the
> >   		 * context whos guc_id was stole.
> >   		 */
> >   		with_intel_runtime_pm(runtime_pm, wakeref)
> > -			ret = deregister_context(ce, ce->guc_id);
> > +			ret = deregister_context(ce, ce->guc_id, loop);
> > +		if (unlikely(ret == -EBUSY)) {
> > +			clr_context_wait_for_deregister_to_register(ce);
> > +			intel_context_put(ce);
> > +		} else if (unlikely(ret == -ENODEV))
> > +			ret = 0;	/* Will get registered later */
> >   	} else {
> >   		with_intel_runtime_pm(runtime_pm, wakeref)
> > -			ret = register_context(ce);
> > +			ret = register_context(ce, loop);
> > +		if (unlikely(ret == -EBUSY))
> > +			reset_lrc_desc(guc, desc_idx);
> > +		else if (unlikely(ret == -ENODEV))
> > +			ret = 0;	/* Will get registered later */
> >   	}
> >   	return ret;
> > @@ -991,7 +1325,6 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> >   	GEM_BUG_ON(guc_id == GUC_INVALID_LRC_ID);
> >   	trace_intel_context_sched_disable(ce);
> > -	intel_context_get(ce);
> >   	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> >   				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > @@ -1003,6 +1336,7 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
> >   	set_context_pending_disable(ce);
> >   	clr_context_enabled(ce);
> > +	intel_context_get(ce);
> >   	return ce->guc_id;
> >   }
> > @@ -1015,7 +1349,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   	u16 guc_id;
> >   	intel_wakeref_t wakeref;
> > -	if (context_guc_id_invalid(ce) ||
> > +	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
> >   	    !lrc_desc_registered(guc, ce->guc_id)) {
> >   		clr_context_enabled(ce);
> >   		goto unpin;
> > @@ -1036,6 +1370,7 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   	 * a request before we set the 'context_pending_disable' flag here.
> >   	 */
> >   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >   		return;
> >   	}
> >   	guc_id = prep_context_pending_disable(ce);
> > @@ -1052,19 +1387,13 @@ static void guc_context_sched_disable(struct intel_context *ce)
> >   static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >   {
> > -	struct intel_engine_cs *engine = ce->engine;
> > -	struct intel_guc *guc = &engine->gt->uc.guc;
> > -	unsigned long flags;
> > +	struct intel_guc *guc = ce_to_guc(ce);
> >   	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> >   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> >   	GEM_BUG_ON(context_enabled(ce));
> > -	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > -	set_context_destroyed(ce);
> > -	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > -
> > -	deregister_context(ce, ce->guc_id);
> > +	deregister_context(ce, ce->guc_id, true);
> >   }
> >   static void __guc_context_destroy(struct intel_context *ce)
> > @@ -1092,13 +1421,15 @@ static void guc_context_destroy(struct kref *kref)
> >   	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> >   	intel_wakeref_t wakeref;
> >   	unsigned long flags;
> > +	bool disabled;
> >   	/*
> >   	 * If the guc_id is invalid this context has been stolen and we can free
> >   	 * it immediately. Also can be freed immediately if the context is not
> >   	 * registered with the GuC.
> >   	 */
> > -	if (context_guc_id_invalid(ce) ||
> > +	if (submission_disabled(guc) ||
> > +	    context_guc_id_invalid(ce) ||
> >   	    !lrc_desc_registered(guc, ce->guc_id)) {
> >   		release_guc_id(guc, ce);
> >   		__guc_context_destroy(ce);
> > @@ -1125,6 +1456,18 @@ static void guc_context_destroy(struct kref *kref)
> >   		list_del_init(&ce->guc_id_link);
> >   	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > +	/* Seal race with Reset */
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	disabled = submission_disabled(guc);
> > +	if (likely(!disabled))
> > +		set_context_destroyed(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +	if (unlikely(disabled)) {
> > +		release_guc_id(guc, ce);
> > +		__guc_context_destroy(ce);
> > +		return;
> > +	}
> > +
> >   	/*
> >   	 * We defer GuC context deregistration until the context is destroyed
> >   	 * in order to save on CTBs. With this optimization ideally we only need
> > @@ -1212,8 +1555,6 @@ static void guc_signal_context_fence(struct intel_context *ce)
> >   {
> >   	unsigned long flags;
> > -	GEM_BUG_ON(!context_wait_for_deregister_to_register(ce));
> > -
> >   	spin_lock_irqsave(&ce->guc_state.lock, flags);
> >   	clr_context_wait_for_deregister_to_register(ce);
> >   	__guc_signal_context_fence(ce);
> > @@ -1222,8 +1563,9 @@ static void guc_signal_context_fence(struct intel_context *ce)
> >   static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
> >   {
> > -	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > -		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> > +	return (new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> > +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id)) &&
> > +		!submission_disabled(ce_to_guc(ce));
> >   }
> >   static int guc_request_alloc(struct i915_request *rq)
> > @@ -1281,8 +1623,12 @@ static int guc_request_alloc(struct i915_request *rq)
> >   	if (unlikely(ret < 0))
> >   		return ret;;
> >   	if (context_needs_register(ce, !!ret)) {
> > -		ret = guc_lrc_desc_pin(ce);
> > +		ret = guc_lrc_desc_pin(ce, true);
> >   		if (unlikely(ret)) {	/* unwind */
> > +			if (ret == -EPIPE) {
> > +				disable_submission(guc);
> > +				goto out;	/* GPU will be reset */
> > +			}
> >   			atomic_dec(&ce->guc_id_ref);
> >   			unpin_guc_id(guc, ce);
> >   			return ret;
> > @@ -1319,20 +1665,6 @@ static int guc_request_alloc(struct i915_request *rq)
> >   	return 0;
> >   }
> > -static struct intel_engine_cs *
> > -guc_virtual_get_sibling(struct intel_engine_cs *ve, unsigned int sibling)
> > -{
> > -	struct intel_engine_cs *engine;
> > -	intel_engine_mask_t tmp, mask = ve->mask;
> > -	unsigned int num_siblings = 0;
> > -
> > -	for_each_engine_masked(engine, ve->gt, mask, tmp)
> > -		if (num_siblings++ == sibling)
> > -			return engine;
> > -
> > -	return NULL;
> > -}
> > -
> >   static int guc_virtual_context_pre_pin(struct intel_context *ce,
> >   				       struct i915_gem_ww_ctx *ww,
> >   				       void **vaddr)
> > @@ -1528,7 +1860,7 @@ static inline void guc_kernel_context_pin(struct intel_guc *guc,
> >   {
> >   	if (context_guc_id_invalid(ce))
> >   		pin_guc_id(guc, ce);
> > -	guc_lrc_desc_pin(ce);
> > +	guc_lrc_desc_pin(ce, true);
> >   }
> >   static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> > @@ -1599,10 +1931,10 @@ static void guc_default_vfuncs(struct intel_engine_cs *engine)
> >   	engine->sched_engine->schedule = i915_schedule;
> > -	engine->reset.prepare = guc_reset_prepare;
> > -	engine->reset.rewind = guc_reset_rewind;
> > -	engine->reset.cancel = guc_reset_cancel;
> > -	engine->reset.finish = guc_reset_finish;
> > +	engine->reset.prepare = guc_reset_nop;
> > +	engine->reset.rewind = guc_rewind_nop;
> > +	engine->reset.cancel = guc_reset_nop;
> > +	engine->reset.finish = guc_reset_nop;
> >   	engine->emit_flush = gen8_emit_flush_xcs;
> >   	engine->emit_init_breadcrumb = gen8_emit_init_breadcrumb;
> > @@ -1651,6 +1983,17 @@ static inline void guc_default_irqs(struct intel_engine_cs *engine)
> >   	intel_engine_set_irq_handler(engine, cs_irq_handler);
> >   }
> > +static void guc_sched_engine_destroy(struct kref *kref)
> > +{
> > +	struct i915_sched_engine *sched_engine =
> > +		container_of(kref, typeof(*sched_engine), ref);
> > +	struct intel_guc *guc = sched_engine->private_data;
> > +
> > +	guc->sched_engine = NULL;
> > +	tasklet_kill(&sched_engine->tasklet); /* flush the callback */
> > +	kfree(sched_engine);
> > +}
> > +
> >   int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   {
> >   	struct drm_i915_private *i915 = engine->i915;
> > @@ -1669,6 +2012,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   		guc->sched_engine->schedule = i915_schedule;
> >   		guc->sched_engine->private_data = guc;
> > +		guc->sched_engine->destroy = guc_sched_engine_destroy;
> >   		tasklet_setup(&guc->sched_engine->tasklet,
> >   			      guc_submission_tasklet);
> >   	}
> > @@ -1775,7 +2119,7 @@ int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >   		 * register this context.
> >   		 */
> >   		with_intel_runtime_pm(runtime_pm, wakeref)
> > -			register_context(ce);
> > +			register_context(ce, true);
> >   		guc_signal_context_fence(ce);
> >   		intel_context_put(ce);
> >   	} else if (context_destroyed(ce)) {
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > index 6d8b9233214e..f0b02200aa01 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> > @@ -565,12 +565,49 @@ void intel_uc_reset_prepare(struct intel_uc *uc)
> >   {
> >   	struct intel_guc *guc = &uc->guc;
> > -	if (!intel_guc_is_ready(guc))
> > +
> > +	/* Nothing to do if GuC isn't supported */
> > +	if (!intel_uc_supports_guc(uc))
> >   		return;
> > +	/* Firmware expected to be running when this function is called */
> > +	if (!intel_guc_is_ready(guc))
> > +		goto sanitize;
> > +
> > +	if (intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_reset_prepare(guc);
> > +
> > +sanitize:
> >   	__uc_sanitize(uc);
> >   }
> > +void intel_uc_reset(struct intel_uc *uc, bool stalled)
> > +{
> > +	struct intel_guc *guc = &uc->guc;
> > +
> > +	/* Firmware can not be running when this function is called  */
> > +	if (intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_reset(guc, stalled);
> > +}
> > +
> > +void intel_uc_reset_finish(struct intel_uc *uc)
> > +{
> > +	struct intel_guc *guc = &uc->guc;
> > +
> > +	/* Firmware expected to be running when this function is called */
> > +	if (intel_guc_is_fw_running(guc) && intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_reset_finish(guc);
> > +}
> > +
> > +void intel_uc_cancel_requests(struct intel_uc *uc)
> > +{
> > +	struct intel_guc *guc = &uc->guc;
> > +
> > +	/* Firmware can not be running when this function is called  */
> > +	if (intel_uc_uses_guc_submission(uc))
> > +		intel_guc_submission_cancel_requests(guc);
> > +}
> > +
> >   void intel_uc_runtime_suspend(struct intel_uc *uc)
> >   {
> >   	struct intel_guc *guc = &uc->guc;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.h b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > index c4cef885e984..eaa3202192ac 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.h
> > @@ -37,6 +37,9 @@ void intel_uc_driver_late_release(struct intel_uc *uc);
> >   void intel_uc_driver_remove(struct intel_uc *uc);
> >   void intel_uc_init_mmio(struct intel_uc *uc);
> >   void intel_uc_reset_prepare(struct intel_uc *uc);
> > +void intel_uc_reset(struct intel_uc *uc, bool stalled);
> > +void intel_uc_reset_finish(struct intel_uc *uc);
> > +void intel_uc_cancel_requests(struct intel_uc *uc);
> >   void intel_uc_suspend(struct intel_uc *uc);
> >   void intel_uc_runtime_suspend(struct intel_uc *uc);
> >   int intel_uc_resume(struct intel_uc *uc);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 42/51] drm/i915/guc: Implement banned contexts for GuC submission
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 21:41     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 21:41 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:17, Matthew Brost wrote:
> When using GuC submission, if a context gets banned disable scheduling
> and mark all inflight requests as complete.
>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
>   drivers/gpu/drm/i915/gt/intel_context.h       |  13 ++
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
>   drivers/gpu/drm/i915/gt/intel_reset.c         |  32 +---
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |  20 +++
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 151 ++++++++++++++++--
>   drivers/gpu/drm/i915/i915_trace.h             |  10 ++
>   8 files changed, 195 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 28c62f7ccfc7..d87a4c6da5bc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1084,7 +1084,7 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban)
>   	for_each_gem_engine(ce, engines, it) {
>   		struct intel_engine_cs *engine;
>   
> -		if (ban && intel_context_set_banned(ce))
> +		if (ban && intel_context_ban(ce, NULL))
>   			continue;
>   
>   		/*
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index 2ed9bf5f91a5..814d9277096a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -16,6 +16,7 @@
>   #include "intel_engine_types.h"
>   #include "intel_ring_types.h"
>   #include "intel_timeline_types.h"
> +#include "i915_trace.h"
>   
>   #define CE_TRACE(ce, fmt, ...) do {					\
>   	const struct intel_context *ce__ = (ce);			\
> @@ -243,6 +244,18 @@ static inline bool intel_context_set_banned(struct intel_context *ce)
>   	return test_and_set_bit(CONTEXT_BANNED, &ce->flags);
>   }
>   
> +static inline bool intel_context_ban(struct intel_context *ce,
> +				     struct i915_request *rq)
> +{
> +	bool ret = intel_context_set_banned(ce);
> +
> +	trace_intel_context_ban(ce);
> +	if (ce->ops->ban)
> +		ce->ops->ban(ce, rq);
> +
> +	return ret;
> +}
> +
>   static inline bool
>   intel_context_force_single_submission(const struct intel_context *ce)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 035108c10b2c..57c19ee3e313 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -35,6 +35,8 @@ struct intel_context_ops {
>   
>   	int (*alloc)(struct intel_context *ce);
>   
> +	void (*ban)(struct intel_context *ce, struct i915_request *rq);
> +
>   	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
>   	int (*pin)(struct intel_context *ce, void *vaddr);
>   	void (*unpin)(struct intel_context *ce);
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index f3cdbf4ba5c8..3ed694cab5af 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -22,7 +22,6 @@
>   #include "intel_reset.h"
>   
>   #include "uc/intel_guc.h"
> -#include "uc/intel_guc_submission.h"
>   
>   #define RESET_MAX_RETRIES 3
>   
> @@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr)
>   	intel_uncore_rmw_fw(uncore, reg, clr, 0);
>   }
>   
> -static void skip_context(struct i915_request *rq)
> -{
> -	struct intel_context *hung_ctx = rq->context;
> -
> -	list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) {
> -		if (!i915_request_is_active(rq))
> -			return;
> -
> -		if (rq->context == hung_ctx) {
> -			i915_request_set_error_once(rq, -EIO);
> -			__i915_request_skip(rq);
> -		}
> -	}
> -}
> -
>   static void client_mark_guilty(struct i915_gem_context *ctx, bool banned)
>   {
>   	struct drm_i915_file_private *file_priv = ctx->file_priv;
> @@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq)
>   	bool banned;
>   	int i;
>   
> -	if (intel_context_is_closed(rq->context)) {
> -		intel_context_set_banned(rq->context);
> +	if (intel_context_is_closed(rq->context))
>   		return true;
> -	}
>   
>   	rcu_read_lock();
>   	ctx = rcu_dereference(rq->context->gem_context);
> @@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq)
>   	banned = !i915_gem_context_is_recoverable(ctx);
>   	if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES))
>   		banned = true;
> -	if (banned) {
> +	if (banned)
>   		drm_dbg(&ctx->i915->drm, "context %s: guilty %d, banned\n",
>   			ctx->name, atomic_read(&ctx->guilty_count));
> -		intel_context_set_banned(rq->context);
> -	}
>   
>   	client_mark_guilty(ctx, banned);
>   
> @@ -149,6 +129,8 @@ static void mark_innocent(struct i915_request *rq)
>   
>   void __i915_request_reset(struct i915_request *rq, bool guilty)
>   {
> +	bool banned = false;
> +
>   	RQ_TRACE(rq, "guilty? %s\n", yesno(guilty));
>   	GEM_BUG_ON(__i915_request_is_complete(rq));
>   
> @@ -156,13 +138,15 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
>   	if (guilty) {
>   		i915_request_set_error_once(rq, -EIO);
>   		__i915_request_skip(rq);
> -		if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine))
> -			skip_context(rq);
> +		banned = mark_guilty(rq);
>   	} else {
>   		i915_request_set_error_once(rq, -EAGAIN);
>   		mark_innocent(rq);
>   	}
>   	rcu_read_unlock();
> +
> +	if (banned)
> +		intel_context_ban(rq->context, rq);
>   }
>   
>   static bool i915_in_reset(struct pci_dev *pdev)
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 3b1471c50d40..03939be4297e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -586,9 +586,29 @@ static void ring_context_reset(struct intel_context *ce)
>   	clear_bit(CONTEXT_VALID_BIT, &ce->flags);
>   }
>   
> +static void ring_context_ban(struct intel_context *ce,
> +			     struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine;
> +
> +	if (!rq || !i915_request_is_active(rq))
> +		return;
> +
> +	engine = rq->engine;
> +	lockdep_assert_held(&engine->sched_engine->lock);
> +	list_for_each_entry_continue(rq, &engine->sched_engine->requests,
> +				     sched.link)
> +		if (rq->context == ce) {
> +			i915_request_set_error_once(rq, -EIO);
> +			__i915_request_skip(rq);
> +		}
> +}
> +
>   static const struct intel_context_ops ring_context_ops = {
>   	.alloc = ring_context_alloc,
>   
> +	.ban = ring_context_ban,
> +
>   	.pre_pin = ring_context_pre_pin,
>   	.pin = ring_context_pin,
>   	.unpin = ring_context_unpin,
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index dc18ac510ac8..eb6062f95d3b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -274,6 +274,8 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine);
>   
>   int intel_guc_global_policies_update(struct intel_guc *guc);
>   
> +void intel_guc_context_ban(struct intel_context *ce, struct i915_request *rq);
> +
>   void intel_guc_submission_reset_prepare(struct intel_guc *guc);
>   void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
>   void intel_guc_submission_reset_finish(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 6536bd6807a0..149990196e3a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -125,6 +125,7 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
>   #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
>   #define SCHED_STATE_DESTROYED				BIT(1)
>   #define SCHED_STATE_PENDING_DISABLE			BIT(2)
> +#define SCHED_STATE_BANNED				BIT(3)
>   static inline void init_sched_state(struct intel_context *ce)
>   {
>   	/* Only should be called from guc_lrc_desc_pin() */
> @@ -187,6 +188,23 @@ static inline void clr_context_pending_disable(struct intel_context *ce)
>   		(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
>   }
>   
> +static inline bool context_banned(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state & SCHED_STATE_BANNED);
> +}
> +
> +static inline void set_context_banned(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state |= SCHED_STATE_BANNED;
> +}
> +
> +static inline void clr_context_banned(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state &= ~SCHED_STATE_BANNED;
> +}
> +
>   static inline bool context_guc_id_invalid(struct intel_context *ce)
>   {
>   	return (ce->guc_id == GUC_INVALID_LRC_ID);
> @@ -356,13 +374,23 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
>   
>   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
> -	int err;
> +	int err = 0;
>   	struct intel_context *ce = rq->context;
>   	u32 action[3];
>   	int len = 0;
>   	u32 g2h_len_dw = 0;
>   	bool enabled;
>   
> +	/*
> +	 * Corner case where requests were sitting in the priority list or a
> +	 * request resubmitted after the context was banned.
> +	 */
> +	if (unlikely(intel_context_is_banned(ce))) {
> +		i915_request_put(i915_request_mark_eio(rq));
> +		intel_engine_signal_breadcrumbs(ce->engine);
> +		goto out;
> +	}
> +
>   	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
>   	GEM_BUG_ON(context_guc_id_invalid(ce));
>   
> @@ -398,6 +426,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   		clr_context_pending_enable(ce);
>   		intel_context_put(ce);
>   	}
> +	if (likely(!err))
> +		trace_i915_request_guc_submit(rq);
>   
>   out:
>   	return err;
> @@ -462,7 +492,6 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   			guc->stalled_request = last;
>   			return false;
>   		}
> -		trace_i915_request_guc_submit(last);
>   	}
>   
>   	guc->stalled_request = NULL;
> @@ -501,12 +530,13 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>   static void __guc_context_destroy(struct intel_context *ce);
>   static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
>   static void guc_signal_context_fence(struct intel_context *ce);
> +static void guc_cancel_context_requests(struct intel_context *ce);
>   
>   static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   {
>   	struct intel_context *ce;
>   	unsigned long index, flags;
> -	bool pending_disable, pending_enable, deregister, destroyed;
> +	bool pending_disable, pending_enable, deregister, destroyed, banned;
>   
>   	xa_for_each(&guc->context_lookup, index, ce) {
>   		/* Flush context */
> @@ -524,6 +554,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   		pending_enable = context_pending_enable(ce);
>   		pending_disable = context_pending_disable(ce);
>   		deregister = context_wait_for_deregister_to_register(ce);
> +		banned = context_banned(ce);
>   		init_sched_state(ce);
>   
>   		if (pending_enable || destroyed || deregister) {
> @@ -541,6 +572,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   		/* Not mutualy exclusive with above if statement. */
>   		if (pending_disable) {
>   			guc_signal_context_fence(ce);
> +			if (banned) {
> +				guc_cancel_context_requests(ce);
> +				intel_engine_signal_breadcrumbs(ce->engine);
> +			}
>   			intel_context_sched_disable_unpin(ce);
>   			atomic_dec(&guc->outstanding_submission_g2h);
>   			intel_context_put(ce);
> @@ -659,6 +694,9 @@ static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
>   {
>   	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
>   
> +	if (intel_context_is_banned(ce))
> +		return;
> +
>   	GEM_BUG_ON(!intel_context_is_pinned(ce));
>   
>   	/*
> @@ -729,6 +767,8 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
>   	struct i915_request *rq;
>   	u32 head;
>   
> +	intel_context_get(ce);
> +
>   	/*
>   	 * GuC will implicitly mark the context as non-schedulable
>   	 * when it sends the reset notification. Make sure our state
> @@ -754,6 +794,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
>   out_replay:
>   	guc_reset_state(ce, head, stalled);
>   	__unwind_incomplete_requests(ce);
> +	intel_context_put(ce);
>   }
>   
>   void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> @@ -936,8 +977,6 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>   	ret = guc_add_request(guc, rq);
>   	if (ret == -EBUSY)
>   		guc->stalled_request = rq;
> -	else
> -		trace_i915_request_guc_submit(rq);
>   
>   	if (unlikely(ret == -EPIPE))
>   		disable_submission(guc);
> @@ -1329,13 +1368,77 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
>   	return ce->guc_id;
>   }
>   
> +static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> +						 u16 guc_id,
> +						 u32 preemption_timeout)
> +{
> +	u32 action [] = {
> +		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> +		guc_id,
> +		preemption_timeout
> +	};
> +
> +	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +}
> +
> +static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	struct intel_runtime_pm *runtime_pm =
> +		&ce->engine->gt->i915->runtime_pm;
> +	intel_wakeref_t wakeref;
> +	unsigned long flags;
> +
> +	guc_flush_submissions(guc);
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	set_context_banned(ce);
> +
> +	if (submission_disabled(guc) || (!context_enabled(ce) &&
> +	    !context_pending_disable(ce))) {
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +		guc_cancel_context_requests(ce);
> +		intel_engine_signal_breadcrumbs(ce->engine);
> +	} else if (!context_pending_disable(ce)) {
> +		u16 guc_id;
> +
> +		/*
> +		 * We add +2 here as the schedule disable complete CTB handler
> +		 * calls intel_context_sched_disable_unpin (-2 to pin_count).
> +		 */
> +		atomic_add(2, &ce->pin_count);
> +
> +		guc_id = prep_context_pending_disable(ce);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +		/*
> +		 * In addition to disabling scheduling, set the preemption
> +		 * timeout to the minimum value (1 us) so the banned context
> +		 * gets kicked off the HW ASAP.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref) {
> +			__guc_context_set_preemption_timeout(guc, guc_id, 1);
> +			__guc_context_sched_disable(guc, ce, guc_id);
> +		}
> +	} else {
> +		if (!context_guc_id_invalid(ce))
> +			with_intel_runtime_pm(runtime_pm, wakeref)
> +				__guc_context_set_preemption_timeout(guc,
> +								     ce->guc_id,
> +								     1);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +	}
> +}
> +
>   static void guc_context_sched_disable(struct intel_context *ce)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
> -	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
>   	unsigned long flags;
> -	u16 guc_id;
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
>   	intel_wakeref_t wakeref;
> +	u16 guc_id;
> +	bool enabled;
>   
>   	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
> @@ -1349,14 +1452,22 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   	spin_lock_irqsave(&ce->guc_state.lock, flags);
>   
>   	/*
> -	 * We have to check if the context has been pinned again as another pin
> -	 * operation is allowed to pass this function. Checking the pin count,
> -	 * within ce->guc_state.lock, synchronizes this function with
> +	 * We have to check if the context has been disabled by another thread.
> +	 * We also have to check if the context has been pinned again as another
> +	 * pin operation is allowed to pass this function. Checking the pin
> +	 * count, within ce->guc_state.lock, synchronizes this function with
>   	 * guc_request_alloc ensuring a request doesn't slip through the
>   	 * 'context_pending_disable' fence. Checking within the spin lock (can't
>   	 * sleep) ensures another process doesn't pin this context and generate
>   	 * a request before we set the 'context_pending_disable' flag here.
>   	 */
> +	enabled = context_enabled(ce);
> +	if (unlikely(!enabled || submission_disabled(guc))) {
> +		if (enabled)
> +			clr_context_enabled(ce);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +		goto unpin;
> +	}
>   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
>   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   		return;
> @@ -1513,6 +1624,8 @@ static const struct intel_context_ops guc_context_ops = {
>   	.unpin = guc_context_unpin,
>   	.post_unpin = guc_context_post_unpin,
>   
> +	.ban = guc_context_ban,
> +
>   	.enter = intel_context_enter_engine,
>   	.exit = intel_context_exit_engine,
>   
> @@ -1706,6 +1819,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>   	.unpin = guc_context_unpin,
>   	.post_unpin = guc_context_post_unpin,
>   
> +	.ban = guc_context_ban,
> +
>   	.enter = guc_virtual_context_enter,
>   	.exit = guc_virtual_context_exit,
>   
> @@ -2159,6 +2274,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   	if (context_pending_enable(ce)) {
>   		clr_context_pending_enable(ce);
>   	} else if (context_pending_disable(ce)) {
> +		bool banned;
> +
>   		/*
>   		 * Unpin must be done before __guc_signal_context_fence,
>   		 * otherwise a race exists between the requests getting
> @@ -2169,9 +2286,16 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   		intel_context_sched_disable_unpin(ce);
>   
>   		spin_lock_irqsave(&ce->guc_state.lock, flags);
> +		banned = context_banned(ce);
> +		clr_context_banned(ce);
>   		clr_context_pending_disable(ce);
>   		__guc_signal_context_fence(ce);
>   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +		if (banned) {
> +			guc_cancel_context_requests(ce);
> +			intel_engine_signal_breadcrumbs(ce->engine);
> +		}
>   	}
>   
>   	decr_outstanding_submission_g2h(guc);
> @@ -2206,8 +2330,11 @@ static void guc_handle_context_reset(struct intel_guc *guc,
>   				     struct intel_context *ce)
>   {
>   	trace_intel_context_reset(ce);
> -	capture_error_state(guc, ce);
> -	guc_context_replay(ce);
> +
> +	if (likely(!intel_context_is_banned(ce))) {
> +		capture_error_state(guc, ce);
> +		guc_context_replay(ce);
> +	}
>   }
>   
>   int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index c095c4d39456..937d3706af9b 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -934,6 +934,11 @@ DEFINE_EVENT(intel_context, intel_context_reset,
>   	     TP_ARGS(ce)
>   );
>   
> +DEFINE_EVENT(intel_context, intel_context_ban,
> +	     TP_PROTO(struct intel_context *ce),
> +	     TP_ARGS(ce)
> +);
> +
>   DEFINE_EVENT(intel_context, intel_context_register,
>   	     TP_PROTO(struct intel_context *ce),
>   	     TP_ARGS(ce)
> @@ -1036,6 +1041,11 @@ trace_intel_context_reset(struct intel_context *ce)
>   {
>   }
>   
> +static inline void
> +trace_intel_context_ban(struct intel_context *ce)
> +{
> +}
> +
>   static inline void
>   trace_intel_context_register(struct intel_context *ce)
>   {


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 42/51] drm/i915/guc: Implement banned contexts for GuC submission
@ 2021-07-20 21:41     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 21:41 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:17, Matthew Brost wrote:
> When using GuC submission, if a context gets banned disable scheduling
> and mark all inflight requests as complete.
>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |   2 +-
>   drivers/gpu/drm/i915/gt/intel_context.h       |  13 ++
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   2 +
>   drivers/gpu/drm/i915/gt/intel_reset.c         |  32 +---
>   .../gpu/drm/i915/gt/intel_ring_submission.c   |  20 +++
>   drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   2 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 151 ++++++++++++++++--
>   drivers/gpu/drm/i915/i915_trace.h             |  10 ++
>   8 files changed, 195 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 28c62f7ccfc7..d87a4c6da5bc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1084,7 +1084,7 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban)
>   	for_each_gem_engine(ce, engines, it) {
>   		struct intel_engine_cs *engine;
>   
> -		if (ban && intel_context_set_banned(ce))
> +		if (ban && intel_context_ban(ce, NULL))
>   			continue;
>   
>   		/*
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index 2ed9bf5f91a5..814d9277096a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -16,6 +16,7 @@
>   #include "intel_engine_types.h"
>   #include "intel_ring_types.h"
>   #include "intel_timeline_types.h"
> +#include "i915_trace.h"
>   
>   #define CE_TRACE(ce, fmt, ...) do {					\
>   	const struct intel_context *ce__ = (ce);			\
> @@ -243,6 +244,18 @@ static inline bool intel_context_set_banned(struct intel_context *ce)
>   	return test_and_set_bit(CONTEXT_BANNED, &ce->flags);
>   }
>   
> +static inline bool intel_context_ban(struct intel_context *ce,
> +				     struct i915_request *rq)
> +{
> +	bool ret = intel_context_set_banned(ce);
> +
> +	trace_intel_context_ban(ce);
> +	if (ce->ops->ban)
> +		ce->ops->ban(ce, rq);
> +
> +	return ret;
> +}
> +
>   static inline bool
>   intel_context_force_single_submission(const struct intel_context *ce)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 035108c10b2c..57c19ee3e313 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -35,6 +35,8 @@ struct intel_context_ops {
>   
>   	int (*alloc)(struct intel_context *ce);
>   
> +	void (*ban)(struct intel_context *ce, struct i915_request *rq);
> +
>   	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
>   	int (*pin)(struct intel_context *ce, void *vaddr);
>   	void (*unpin)(struct intel_context *ce);
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index f3cdbf4ba5c8..3ed694cab5af 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -22,7 +22,6 @@
>   #include "intel_reset.h"
>   
>   #include "uc/intel_guc.h"
> -#include "uc/intel_guc_submission.h"
>   
>   #define RESET_MAX_RETRIES 3
>   
> @@ -39,21 +38,6 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr)
>   	intel_uncore_rmw_fw(uncore, reg, clr, 0);
>   }
>   
> -static void skip_context(struct i915_request *rq)
> -{
> -	struct intel_context *hung_ctx = rq->context;
> -
> -	list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) {
> -		if (!i915_request_is_active(rq))
> -			return;
> -
> -		if (rq->context == hung_ctx) {
> -			i915_request_set_error_once(rq, -EIO);
> -			__i915_request_skip(rq);
> -		}
> -	}
> -}
> -
>   static void client_mark_guilty(struct i915_gem_context *ctx, bool banned)
>   {
>   	struct drm_i915_file_private *file_priv = ctx->file_priv;
> @@ -88,10 +72,8 @@ static bool mark_guilty(struct i915_request *rq)
>   	bool banned;
>   	int i;
>   
> -	if (intel_context_is_closed(rq->context)) {
> -		intel_context_set_banned(rq->context);
> +	if (intel_context_is_closed(rq->context))
>   		return true;
> -	}
>   
>   	rcu_read_lock();
>   	ctx = rcu_dereference(rq->context->gem_context);
> @@ -123,11 +105,9 @@ static bool mark_guilty(struct i915_request *rq)
>   	banned = !i915_gem_context_is_recoverable(ctx);
>   	if (time_before(jiffies, prev_hang + CONTEXT_FAST_HANG_JIFFIES))
>   		banned = true;
> -	if (banned) {
> +	if (banned)
>   		drm_dbg(&ctx->i915->drm, "context %s: guilty %d, banned\n",
>   			ctx->name, atomic_read(&ctx->guilty_count));
> -		intel_context_set_banned(rq->context);
> -	}
>   
>   	client_mark_guilty(ctx, banned);
>   
> @@ -149,6 +129,8 @@ static void mark_innocent(struct i915_request *rq)
>   
>   void __i915_request_reset(struct i915_request *rq, bool guilty)
>   {
> +	bool banned = false;
> +
>   	RQ_TRACE(rq, "guilty? %s\n", yesno(guilty));
>   	GEM_BUG_ON(__i915_request_is_complete(rq));
>   
> @@ -156,13 +138,15 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
>   	if (guilty) {
>   		i915_request_set_error_once(rq, -EIO);
>   		__i915_request_skip(rq);
> -		if (mark_guilty(rq) && !intel_engine_uses_guc(rq->engine))
> -			skip_context(rq);
> +		banned = mark_guilty(rq);
>   	} else {
>   		i915_request_set_error_once(rq, -EAGAIN);
>   		mark_innocent(rq);
>   	}
>   	rcu_read_unlock();
> +
> +	if (banned)
> +		intel_context_ban(rq->context, rq);
>   }
>   
>   static bool i915_in_reset(struct pci_dev *pdev)
> diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> index 3b1471c50d40..03939be4297e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> @@ -586,9 +586,29 @@ static void ring_context_reset(struct intel_context *ce)
>   	clear_bit(CONTEXT_VALID_BIT, &ce->flags);
>   }
>   
> +static void ring_context_ban(struct intel_context *ce,
> +			     struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine;
> +
> +	if (!rq || !i915_request_is_active(rq))
> +		return;
> +
> +	engine = rq->engine;
> +	lockdep_assert_held(&engine->sched_engine->lock);
> +	list_for_each_entry_continue(rq, &engine->sched_engine->requests,
> +				     sched.link)
> +		if (rq->context == ce) {
> +			i915_request_set_error_once(rq, -EIO);
> +			__i915_request_skip(rq);
> +		}
> +}
> +
>   static const struct intel_context_ops ring_context_ops = {
>   	.alloc = ring_context_alloc,
>   
> +	.ban = ring_context_ban,
> +
>   	.pre_pin = ring_context_pre_pin,
>   	.pin = ring_context_pin,
>   	.unpin = ring_context_unpin,
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index dc18ac510ac8..eb6062f95d3b 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -274,6 +274,8 @@ void intel_guc_find_hung_context(struct intel_engine_cs *engine);
>   
>   int intel_guc_global_policies_update(struct intel_guc *guc);
>   
> +void intel_guc_context_ban(struct intel_context *ce, struct i915_request *rq);
> +
>   void intel_guc_submission_reset_prepare(struct intel_guc *guc);
>   void intel_guc_submission_reset(struct intel_guc *guc, bool stalled);
>   void intel_guc_submission_reset_finish(struct intel_guc *guc);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 6536bd6807a0..149990196e3a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -125,6 +125,7 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
>   #define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
>   #define SCHED_STATE_DESTROYED				BIT(1)
>   #define SCHED_STATE_PENDING_DISABLE			BIT(2)
> +#define SCHED_STATE_BANNED				BIT(3)
>   static inline void init_sched_state(struct intel_context *ce)
>   {
>   	/* Only should be called from guc_lrc_desc_pin() */
> @@ -187,6 +188,23 @@ static inline void clr_context_pending_disable(struct intel_context *ce)
>   		(ce->guc_state.sched_state & ~SCHED_STATE_PENDING_DISABLE);
>   }
>   
> +static inline bool context_banned(struct intel_context *ce)
> +{
> +	return (ce->guc_state.sched_state & SCHED_STATE_BANNED);
> +}
> +
> +static inline void set_context_banned(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state |= SCHED_STATE_BANNED;
> +}
> +
> +static inline void clr_context_banned(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	ce->guc_state.sched_state &= ~SCHED_STATE_BANNED;
> +}
> +
>   static inline bool context_guc_id_invalid(struct intel_context *ce)
>   {
>   	return (ce->guc_id == GUC_INVALID_LRC_ID);
> @@ -356,13 +374,23 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop);
>   
>   static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   {
> -	int err;
> +	int err = 0;
>   	struct intel_context *ce = rq->context;
>   	u32 action[3];
>   	int len = 0;
>   	u32 g2h_len_dw = 0;
>   	bool enabled;
>   
> +	/*
> +	 * Corner case where requests were sitting in the priority list or a
> +	 * request resubmitted after the context was banned.
> +	 */
> +	if (unlikely(intel_context_is_banned(ce))) {
> +		i915_request_put(i915_request_mark_eio(rq));
> +		intel_engine_signal_breadcrumbs(ce->engine);
> +		goto out;
> +	}
> +
>   	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
>   	GEM_BUG_ON(context_guc_id_invalid(ce));
>   
> @@ -398,6 +426,8 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   		clr_context_pending_enable(ce);
>   		intel_context_put(ce);
>   	}
> +	if (likely(!err))
> +		trace_i915_request_guc_submit(rq);
>   
>   out:
>   	return err;
> @@ -462,7 +492,6 @@ static int guc_dequeue_one_context(struct intel_guc *guc)
>   			guc->stalled_request = last;
>   			return false;
>   		}
> -		trace_i915_request_guc_submit(last);
>   	}
>   
>   	guc->stalled_request = NULL;
> @@ -501,12 +530,13 @@ static void cs_irq_handler(struct intel_engine_cs *engine, u16 iir)
>   static void __guc_context_destroy(struct intel_context *ce);
>   static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
>   static void guc_signal_context_fence(struct intel_context *ce);
> +static void guc_cancel_context_requests(struct intel_context *ce);
>   
>   static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   {
>   	struct intel_context *ce;
>   	unsigned long index, flags;
> -	bool pending_disable, pending_enable, deregister, destroyed;
> +	bool pending_disable, pending_enable, deregister, destroyed, banned;
>   
>   	xa_for_each(&guc->context_lookup, index, ce) {
>   		/* Flush context */
> @@ -524,6 +554,7 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   		pending_enable = context_pending_enable(ce);
>   		pending_disable = context_pending_disable(ce);
>   		deregister = context_wait_for_deregister_to_register(ce);
> +		banned = context_banned(ce);
>   		init_sched_state(ce);
>   
>   		if (pending_enable || destroyed || deregister) {
> @@ -541,6 +572,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   		/* Not mutualy exclusive with above if statement. */
>   		if (pending_disable) {
>   			guc_signal_context_fence(ce);
> +			if (banned) {
> +				guc_cancel_context_requests(ce);
> +				intel_engine_signal_breadcrumbs(ce->engine);
> +			}
>   			intel_context_sched_disable_unpin(ce);
>   			atomic_dec(&guc->outstanding_submission_g2h);
>   			intel_context_put(ce);
> @@ -659,6 +694,9 @@ static void guc_reset_state(struct intel_context *ce, u32 head, bool scrub)
>   {
>   	struct intel_engine_cs *engine = __context_to_physical_engine(ce);
>   
> +	if (intel_context_is_banned(ce))
> +		return;
> +
>   	GEM_BUG_ON(!intel_context_is_pinned(ce));
>   
>   	/*
> @@ -729,6 +767,8 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
>   	struct i915_request *rq;
>   	u32 head;
>   
> +	intel_context_get(ce);
> +
>   	/*
>   	 * GuC will implicitly mark the context as non-schedulable
>   	 * when it sends the reset notification. Make sure our state
> @@ -754,6 +794,7 @@ static void __guc_reset_context(struct intel_context *ce, bool stalled)
>   out_replay:
>   	guc_reset_state(ce, head, stalled);
>   	__unwind_incomplete_requests(ce);
> +	intel_context_put(ce);
>   }
>   
>   void intel_guc_submission_reset(struct intel_guc *guc, bool stalled)
> @@ -936,8 +977,6 @@ static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>   	ret = guc_add_request(guc, rq);
>   	if (ret == -EBUSY)
>   		guc->stalled_request = rq;
> -	else
> -		trace_i915_request_guc_submit(rq);
>   
>   	if (unlikely(ret == -EPIPE))
>   		disable_submission(guc);
> @@ -1329,13 +1368,77 @@ static u16 prep_context_pending_disable(struct intel_context *ce)
>   	return ce->guc_id;
>   }
>   
> +static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> +						 u16 guc_id,
> +						 u32 preemption_timeout)
> +{
> +	u32 action [] = {
> +		INTEL_GUC_ACTION_SET_CONTEXT_PREEMPTION_TIMEOUT,
> +		guc_id,
> +		preemption_timeout
> +	};
> +
> +	intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +}
> +
> +static void guc_context_ban(struct intel_context *ce, struct i915_request *rq)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	struct intel_runtime_pm *runtime_pm =
> +		&ce->engine->gt->i915->runtime_pm;
> +	intel_wakeref_t wakeref;
> +	unsigned long flags;
> +
> +	guc_flush_submissions(guc);
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	set_context_banned(ce);
> +
> +	if (submission_disabled(guc) || (!context_enabled(ce) &&
> +	    !context_pending_disable(ce))) {
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +		guc_cancel_context_requests(ce);
> +		intel_engine_signal_breadcrumbs(ce->engine);
> +	} else if (!context_pending_disable(ce)) {
> +		u16 guc_id;
> +
> +		/*
> +		 * We add +2 here as the schedule disable complete CTB handler
> +		 * calls intel_context_sched_disable_unpin (-2 to pin_count).
> +		 */
> +		atomic_add(2, &ce->pin_count);
> +
> +		guc_id = prep_context_pending_disable(ce);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +		/*
> +		 * In addition to disabling scheduling, set the preemption
> +		 * timeout to the minimum value (1 us) so the banned context
> +		 * gets kicked off the HW ASAP.
> +		 */
> +		with_intel_runtime_pm(runtime_pm, wakeref) {
> +			__guc_context_set_preemption_timeout(guc, guc_id, 1);
> +			__guc_context_sched_disable(guc, ce, guc_id);
> +		}
> +	} else {
> +		if (!context_guc_id_invalid(ce))
> +			with_intel_runtime_pm(runtime_pm, wakeref)
> +				__guc_context_set_preemption_timeout(guc,
> +								     ce->guc_id,
> +								     1);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +	}
> +}
> +
>   static void guc_context_sched_disable(struct intel_context *ce)
>   {
>   	struct intel_guc *guc = ce_to_guc(ce);
> -	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
>   	unsigned long flags;
> -	u16 guc_id;
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
>   	intel_wakeref_t wakeref;
> +	u16 guc_id;
> +	bool enabled;
>   
>   	if (submission_disabled(guc) || context_guc_id_invalid(ce) ||
>   	    !lrc_desc_registered(guc, ce->guc_id)) {
> @@ -1349,14 +1452,22 @@ static void guc_context_sched_disable(struct intel_context *ce)
>   	spin_lock_irqsave(&ce->guc_state.lock, flags);
>   
>   	/*
> -	 * We have to check if the context has been pinned again as another pin
> -	 * operation is allowed to pass this function. Checking the pin count,
> -	 * within ce->guc_state.lock, synchronizes this function with
> +	 * We have to check if the context has been disabled by another thread.
> +	 * We also have to check if the context has been pinned again as another
> +	 * pin operation is allowed to pass this function. Checking the pin
> +	 * count, within ce->guc_state.lock, synchronizes this function with
>   	 * guc_request_alloc ensuring a request doesn't slip through the
>   	 * 'context_pending_disable' fence. Checking within the spin lock (can't
>   	 * sleep) ensures another process doesn't pin this context and generate
>   	 * a request before we set the 'context_pending_disable' flag here.
>   	 */
> +	enabled = context_enabled(ce);
> +	if (unlikely(!enabled || submission_disabled(guc))) {
> +		if (enabled)
> +			clr_context_enabled(ce);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +		goto unpin;
> +	}
>   	if (unlikely(atomic_add_unless(&ce->pin_count, -2, 2))) {
>   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   		return;
> @@ -1513,6 +1624,8 @@ static const struct intel_context_ops guc_context_ops = {
>   	.unpin = guc_context_unpin,
>   	.post_unpin = guc_context_post_unpin,
>   
> +	.ban = guc_context_ban,
> +
>   	.enter = intel_context_enter_engine,
>   	.exit = intel_context_exit_engine,
>   
> @@ -1706,6 +1819,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>   	.unpin = guc_context_unpin,
>   	.post_unpin = guc_context_post_unpin,
>   
> +	.ban = guc_context_ban,
> +
>   	.enter = guc_virtual_context_enter,
>   	.exit = guc_virtual_context_exit,
>   
> @@ -2159,6 +2274,8 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   	if (context_pending_enable(ce)) {
>   		clr_context_pending_enable(ce);
>   	} else if (context_pending_disable(ce)) {
> +		bool banned;
> +
>   		/*
>   		 * Unpin must be done before __guc_signal_context_fence,
>   		 * otherwise a race exists between the requests getting
> @@ -2169,9 +2286,16 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   		intel_context_sched_disable_unpin(ce);
>   
>   		spin_lock_irqsave(&ce->guc_state.lock, flags);
> +		banned = context_banned(ce);
> +		clr_context_banned(ce);
>   		clr_context_pending_disable(ce);
>   		__guc_signal_context_fence(ce);
>   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +		if (banned) {
> +			guc_cancel_context_requests(ce);
> +			intel_engine_signal_breadcrumbs(ce->engine);
> +		}
>   	}
>   
>   	decr_outstanding_submission_g2h(guc);
> @@ -2206,8 +2330,11 @@ static void guc_handle_context_reset(struct intel_guc *guc,
>   				     struct intel_context *ce)
>   {
>   	trace_intel_context_reset(ce);
> -	capture_error_state(guc, ce);
> -	guc_context_replay(ce);
> +
> +	if (likely(!intel_context_is_banned(ce))) {
> +		capture_error_state(guc, ce);
> +		guc_context_replay(ce);
> +	}
>   }
>   
>   int intel_guc_context_reset_process_msg(struct intel_guc *guc,
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index c095c4d39456..937d3706af9b 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -934,6 +934,11 @@ DEFINE_EVENT(intel_context, intel_context_reset,
>   	     TP_ARGS(ce)
>   );
>   
> +DEFINE_EVENT(intel_context, intel_context_ban,
> +	     TP_PROTO(struct intel_context *ce),
> +	     TP_ARGS(ce)
> +);
> +
>   DEFINE_EVENT(intel_context, intel_context_register,
>   	     TP_PROTO(struct intel_context *ce),
>   	     TP_ARGS(ce)
> @@ -1036,6 +1041,11 @@ trace_intel_context_reset(struct intel_context *ce)
>   {
>   }
>   
> +static inline void
> +trace_intel_context_ban(struct intel_context *ce)
> +{
> +}
> +
>   static inline void
>   trace_intel_context_register(struct intel_context *ce)
>   {

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-20 21:46     ` John Harrison
  -1 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 21:46 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: daniele.ceraolospurio

On 7/16/2021 13:17, Matthew Brost wrote:
> Requests may take slightly longer with GuC submission, let's increase
> the timeouts in live_requests.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index bd5c96a77ba3..d67710d10615 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg)
>   		i915_request_add(rq);
>   
>   		err = 0;
> -		if (i915_request_wait(rq, 0, HZ / 5) < 0)
> +		if (i915_request_wait(rq, 0, HZ) < 0)
>   			err = -ETIME;
>   		i915_request_put(rq);
>   		if (err)
> @@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg)
>   	}
>   	igt_spinner_end(&spin);
>   
> -	if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0)
> +	if (err == 0 && i915_request_wait(rq, 0, HZ) < 0)
>   		err = -EIO;
>   	i915_request_put(rq);
>   


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests
@ 2021-07-20 21:46     ` John Harrison
  0 siblings, 0 replies; 221+ messages in thread
From: John Harrison @ 2021-07-20 21:46 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel

On 7/16/2021 13:17, Matthew Brost wrote:
> Requests may take slightly longer with GuC submission, let's increase
> the timeouts in live_requests.
>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>

> ---
>   drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index bd5c96a77ba3..d67710d10615 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg)
>   		i915_request_add(rq);
>   
>   		err = 0;
> -		if (i915_request_wait(rq, 0, HZ / 5) < 0)
> +		if (i915_request_wait(rq, 0, HZ) < 0)
>   			err = -ETIME;
>   		i915_request_put(rq);
>   		if (err)
> @@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg)
>   	}
>   	igt_spinner_end(&spin);
>   
> -	if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0)
> +	if (err == 0 && i915_request_wait(rq, 0, HZ) < 0)
>   		err = -EIO;
>   	i915_request_put(rq);
>   

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
  2021-07-20  4:04       ` [Intel-gfx] " Matthew Brost
@ 2021-07-21 23:51         ` Daniele Ceraolo Spurio
  -1 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-21 23:51 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, john.c.harrison, dri-devel



On 7/19/2021 9:04 PM, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 05:51:46PM -0700, Daniele Ceraolo Spurio wrote:
>>
>> On 7/16/2021 1:16 PM, Matthew Brost wrote:
>>> Implement GuC context operations which includes GuC specific operations
>>> alloc, pin, unpin, and destroy.
>>>
>>> v2:
>>>    (Daniel Vetter)
>>>     - Use msleep_interruptible rather than cond_resched in busy loop
>>>    (Michal)
>>>     - Remove C++ style comment
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>>>    drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>>>    drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
>>>    drivers/gpu/drm/i915/i915_reg.h               |   1 +
>>>    drivers/gpu/drm/i915/i915_request.c           |   1 +
>>>    8 files changed, 685 insertions(+), 55 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
>>> index bd63813c8a80..32fd6647154b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
>>> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>>>    	mutex_init(&ce->pin_mutex);
>>> +	spin_lock_init(&ce->guc_state.lock);
>>> +
>>> +	ce->guc_id = GUC_INVALID_LRC_ID;
>>> +	INIT_LIST_HEAD(&ce->guc_id_link);
>>> +
>>>    	i915_active_init(&ce->active,
>>>    			 __intel_context_active, __intel_context_retire, 0);
>>>    }
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> index 6d99631d19b9..606c480aec26 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> @@ -96,6 +96,7 @@ struct intel_context {
>>>    #define CONTEXT_BANNED			6
>>>    #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
>>>    #define CONTEXT_NOPREEMPT		8
>>> +#define CONTEXT_LRCA_DIRTY		9
>>>    	struct {
>>>    		u64 timeout_us;
>>> @@ -138,14 +139,29 @@ struct intel_context {
>>>    	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
>>> +	struct {
>>> +		/** lock: protects everything in guc_state */
>>> +		spinlock_t lock;
>>> +		/**
>>> +		 * sched_state: scheduling state of this context using GuC
>>> +		 * submission
>>> +		 */
>>> +		u8 sched_state;
>>> +	} guc_state;
>>> +
>>>    	/* GuC scheduling state flags that do not require a lock. */
>>>    	atomic_t guc_sched_state_no_lock;
>>> +	/* GuC LRC descriptor ID */
>>> +	u16 guc_id;
>>> +
>>> +	/* GuC LRC descriptor reference count */
>>> +	atomic_t guc_id_ref;
>>> +
>>>    	/*
>>> -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
>>> -	 * in the series will.
>>> +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
>>>    	 */
>>> -	u16 guc_id;
>>> +	struct list_head guc_id_link;
>>>    };
>>>    #endif /* __INTEL_CONTEXT_TYPES__ */
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>> index 41e5350a7a05..49d4857ad9b7 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>> @@ -87,7 +87,6 @@
>>>    #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
>>>    #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
>>> -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
>>>    #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
>>>    /* in Gen12 ID 0x7FF is reserved to indicate idle */
>>>    #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index 8c7b92f699f1..30773cd699f5 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -7,6 +7,7 @@
>>>    #define _INTEL_GUC_H_
>>>    #include <linux/xarray.h>
>>> +#include <linux/delay.h>
>>>    #include "intel_uncore.h"
>>>    #include "intel_guc_fw.h"
>>> @@ -44,6 +45,14 @@ struct intel_guc {
>>>    		void (*disable)(struct intel_guc *guc);
>>>    	} interrupts;
>>> +	/*
>>> +	 * contexts_lock protects the pool of free guc ids and a linked list of
>>> +	 * guc ids available to be stolen
>>> +	 */
>>> +	spinlock_t contexts_lock;
>>> +	struct ida guc_ids;
>>> +	struct list_head guc_id_list;
>>> +
>>>    	bool submission_selected;
>>>    	struct i915_vma *ads_vma;
>>> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>>>    				 response_buf, response_buf_size, 0);
>>>    }
>>> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
>>> +					   const u32 *action,
>>> +					   u32 len,
>>> +					   bool loop)
>>> +{
>>> +	int err;
>>> +	unsigned int sleep_period_ms = 1;
>>> +	bool not_atomic = !in_atomic() && !irqs_disabled();
>>> +
>>> +	/* No sleeping with spin locks, just busy loop */
>>> +	might_sleep_if(loop && not_atomic);
>>> +
>>> +retry:
>>> +	err = intel_guc_send_nb(guc, action, len);
>>> +	if (unlikely(err == -EBUSY && loop)) {
>>> +		if (likely(not_atomic)) {
>>> +			if (msleep_interruptible(sleep_period_ms))
>>> +				return -EINTR;
>>> +			sleep_period_ms = sleep_period_ms << 1;
>>> +		} else {
>>> +			cpu_relax();
>> Potentially something we can change later, but if we're in atomic context we
>> can't keep looping without a timeout while we get -EBUSY, we need to bail
>> out early.
>>
> Eventually intel_guc_send_nb will pop with a different return code after
> 1 second I think if the GuC is hung.

I know, the point is that 1 second is too long in atomic context. This 
has to either complete quick in atomic or be forked off to a worker. Can 
fix as a follow up.

Daniele

>   
>>> +		}
>>> +		goto retry;
>>> +	}
>>> +
>>> +	return err;
>>> +}
>>> +
>>>    static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>>>    {
>>>    	intel_guc_ct_event_handler(&guc->ct);
>>> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>>>    int intel_guc_reset_engine(struct intel_guc *guc,
>>>    			   struct intel_engine_cs *engine);
>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>> +					  const u32 *msg, u32 len);
>>> +
>>>    void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>>>    #endif
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index 83ec60ea3f89..28ff82c5be45 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>>>    	case INTEL_GUC_ACTION_DEFAULT:
>>>    		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>>>    		break;
>>> +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
>>> +		ret = intel_guc_deregister_done_process_msg(guc, payload,
>>> +							    len);
>>> +		break;
>>>    	default:
>>>    		ret = -EOPNOTSUPP;
>>>    		break;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index 53b4a5eb4a85..a47b3813b4d0 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -13,7 +13,9 @@
>>>    #include "gt/intel_gt.h"
>>>    #include "gt/intel_gt_irq.h"
>>>    #include "gt/intel_gt_pm.h"
>>> +#include "gt/intel_gt_requests.h"
>>>    #include "gt/intel_lrc.h"
>>> +#include "gt/intel_lrc_reg.h"
>>>    #include "gt/intel_mocs.h"
>>>    #include "gt/intel_ring.h"
>>> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
>>>    		   &ce->guc_sched_state_no_lock);
>>>    }
>>> +/*
>>> + * Below is a set of functions which control the GuC scheduling state which
>>> + * require a lock, aside from the special case where the functions are called
>>> + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
>>> + * path to be executing on the context.
>>> + */
>> Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even if it
>> isn't strictly needed? I'd prefer to avoid the asymmetry in the locking
> If this code is called from releasing a fence a circular dependency is
> created. Once we move the DRM scheduler the locking becomes way easier
> and we clean this up then. We already have task for locking clean up.
>
>> scheme if possible, as that might case trouble if thing change in the
>> future.
> Again this is temporary code, so I think we can live with this until the
> DRM scheduler lands.
>
>>> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
>>> +#define SCHED_STATE_DESTROYED				BIT(1)
>>> +static inline void init_sched_state(struct intel_context *ce)
>>> +{
>>> +	/* Only should be called from guc_lrc_desc_pin() */
>>> +	atomic_set(&ce->guc_sched_state_no_lock, 0);
>>> +	ce->guc_state.sched_state = 0;
>>> +}
>>> +
>>> +static inline bool
>>> +context_wait_for_deregister_to_register(struct intel_context *ce)
>>> +{
>>> +	return (ce->guc_state.sched_state &
>>> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
>> No need for (). Below as well.
>>
> Yep.
>
>>> +}
>>> +
>>> +static inline void
>>> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
>>> +{
>>> +	/* Only should be called from guc_lrc_desc_pin() */
>>> +	ce->guc_state.sched_state |=
>>> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
>>> +}
>>> +
>>> +static inline void
>>> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
>>> +{
>>> +	lockdep_assert_held(&ce->guc_state.lock);
>>> +	ce->guc_state.sched_state =
>>> +		(ce->guc_state.sched_state &
>>> +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
>> nit: can also use
>>
>> ce->guc_state.sched_state &=
>> 	~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER
>>
> Yep.
>   
>>> +}
>>> +
>>> +static inline bool
>>> +context_destroyed(struct intel_context *ce)
>>> +{
>>> +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
>>> +}
>>> +
>>> +static inline void
>>> +set_context_destroyed(struct intel_context *ce)
>>> +{
>>> +	lockdep_assert_held(&ce->guc_state.lock);
>>> +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
>>> +}
>>> +
>>> +static inline bool context_guc_id_invalid(struct intel_context *ce)
>>> +{
>>> +	return (ce->guc_id == GUC_INVALID_LRC_ID);
>>> +}
>>> +
>>> +static inline void set_context_guc_id_invalid(struct intel_context *ce)
>>> +{
>>> +	ce->guc_id = GUC_INVALID_LRC_ID;
>>> +}
>>> +
>>> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
>>> +{
>>> +	return &ce->engine->gt->uc.guc;
>>> +}
>>> +
>>>    static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>>>    {
>>>    	return rb_entry(rb, struct i915_priolist, node);
>>> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>    	int len = 0;
>>>    	bool enabled = context_enabled(ce);
>>> +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
>>> +	GEM_BUG_ON(context_guc_id_invalid(ce));
>>> +
>>>    	if (!enabled) {
>>>    		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>>>    		action[len++] = ce->guc_id;
>>> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
>>>    	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>>> +	spin_lock_init(&guc->contexts_lock);
>>> +	INIT_LIST_HEAD(&guc->guc_id_list);
>>> +	ida_init(&guc->guc_ids);
>>> +
>>>    	return 0;
>>>    }
>>> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>>>    	i915_sched_engine_put(guc->sched_engine);
>>>    }
>>> -static int guc_context_alloc(struct intel_context *ce)
>>> +static inline void queue_request(struct i915_sched_engine *sched_engine,
>>> +				 struct i915_request *rq,
>>> +				 int prio)
>>>    {
>>> -	return lrc_alloc(ce, ce->engine);
>>> +	GEM_BUG_ON(!list_empty(&rq->sched.link));
>>> +	list_add_tail(&rq->sched.link,
>>> +		      i915_sched_lookup_priolist(sched_engine, prio));
>>> +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>> +}
>>> +
>>> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>>> +				     struct i915_request *rq)
>>> +{
>>> +	int ret;
>>> +
>>> +	__i915_request_submit(rq);
>>> +
>>> +	trace_i915_request_in(rq, 0);
>>> +
>>> +	guc_set_lrc_tail(rq);
>>> +	ret = guc_add_request(guc, rq);
>>> +	if (ret == -EBUSY)
>>> +		guc->stalled_request = rq;
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void guc_submit_request(struct i915_request *rq)
>>> +{
>>> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>>> +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
>>> +	unsigned long flags;
>>> +
>>> +	/* Will be called from irq-context when using foreign fences. */
>>> +	spin_lock_irqsave(&sched_engine->lock, flags);
>>> +
>>> +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
>>> +		queue_request(sched_engine, rq, rq_prio(rq));
>>> +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>>> +		tasklet_hi_schedule(&sched_engine->tasklet);
>>> +
>>> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>>> +}
>>> +
>>> +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
>>> +static int new_guc_id(struct intel_guc *guc)
>>> +{
>>> +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
>>> +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
>>> +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
>>> +}
>>> +
>>> +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>> +{
>>> +	if (!context_guc_id_invalid(ce)) {
>>> +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
>>> +		reset_lrc_desc(guc, ce->guc_id);
>>> +		set_context_guc_id_invalid(ce);
>>> +	}
>>> +	if (!list_empty(&ce->guc_id_link))
>>> +		list_del_init(&ce->guc_id_link);
>>> +}
>>> +
>>> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>> +{
>>> +	unsigned long flags;
>>> +
>>> +	spin_lock_irqsave(&guc->contexts_lock, flags);
>>> +	__release_guc_id(guc, ce);
>>> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +}
>>> +
>>> +static int steal_guc_id(struct intel_guc *guc)
>>> +{
>>> +	struct intel_context *ce;
>>> +	int guc_id;
>>> +
>>> +	if (!list_empty(&guc->guc_id_list)) {
>>> +		ce = list_first_entry(&guc->guc_id_list,
>>> +				      struct intel_context,
>>> +				      guc_id_link);
>>> +
>>> +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
>>> +		GEM_BUG_ON(context_guc_id_invalid(ce));
>>> +
>>> +		list_del_init(&ce->guc_id_link);
>>> +		guc_id = ce->guc_id;
>>> +		set_context_guc_id_invalid(ce);
>>> +		return guc_id;
>>> +	} else {
>>> +		return -EAGAIN;
>>> +	}
>>> +}
>>> +
>>> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = new_guc_id(guc);
>>> +	if (unlikely(ret < 0)) {
>>> +		ret = steal_guc_id(guc);
>>> +		if (ret < 0)
>>> +			return ret;
>>> +	}
>>> +
>>> +	*out = ret;
>>> +	return 0;
>> Is it worth adding spinlock_held asserts for guc->contexts_lock in these ID
>> functions? Doubles up as a documentation of what locking we expect.
>>
> Never a bad idea to have asserts in the code. Will add.
>
>>> +}
>>> +
>>> +#define PIN_GUC_ID_TRIES	4
>>> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>> +{
>>> +	int ret = 0;
>>> +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
>>> +
>>> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
>>> +
>>> +try_again:
>>> +	spin_lock_irqsave(&guc->contexts_lock, flags);
>>> +
>>> +	if (context_guc_id_invalid(ce)) {
>>> +		ret = assign_guc_id(guc, &ce->guc_id);
>>> +		if (ret)
>>> +			goto out_unlock;
>>> +		ret = 1;	/* Indidcates newly assigned guc_id */
>>> +	}
>>> +	if (!list_empty(&ce->guc_id_link))
>>> +		list_del_init(&ce->guc_id_link);
>>> +	atomic_inc(&ce->guc_id_ref);
>>> +
>>> +out_unlock:
>>> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +
>>> +	/*
>>> +	 * -EAGAIN indicates no guc_ids are available, let's retire any
>>> +	 * outstanding requests to see if that frees up a guc_id. If the first
>>> +	 * retire didn't help, insert a sleep with the timeslice duration before
>>> +	 * attempting to retire more requests. Double the sleep period each
>>> +	 * subsequent pass before finally giving up. The sleep period has max of
>>> +	 * 100ms and minimum of 1ms.
>>> +	 */
>>> +	if (ret == -EAGAIN && --tries) {
>>> +		if (PIN_GUC_ID_TRIES - tries > 1) {
>>> +			unsigned int timeslice_shifted =
>>> +				ce->engine->props.timeslice_duration_ms <<
>>> +				(PIN_GUC_ID_TRIES - tries - 2);
>>> +			unsigned int max = min_t(unsigned int, 100,
>>> +						 timeslice_shifted);
>>> +
>>> +			msleep(max_t(unsigned int, max, 1));
>>> +		}
>>> +		intel_gt_retire_requests(guc_to_gt(guc));
>>> +		goto try_again;
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>> +{
>>> +	unsigned long flags;
>>> +
>>> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
>>> +
>>> +	spin_lock_irqsave(&guc->contexts_lock, flags);
>>> +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
>>> +	    !atomic_read(&ce->guc_id_ref))
>>> +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
>>> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +}
>>> +
>>> +static int __guc_action_register_context(struct intel_guc *guc,
>>> +					 u32 guc_id,
>>> +					 u32 offset)
>>> +{
>>> +	u32 action[] = {
>>> +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
>>> +		guc_id,
>>> +		offset,
>>> +	};
>>> +
>>> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
>>> +}
>>> +
>>> +static int register_context(struct intel_context *ce)
>>> +{
>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>> +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
>>> +		ce->guc_id * sizeof(struct guc_lrc_desc);
>>> +
>>> +	return __guc_action_register_context(guc, ce->guc_id, offset);
>>> +}
>>> +
>>> +static int __guc_action_deregister_context(struct intel_guc *guc,
>>> +					   u32 guc_id)
>>> +{
>>> +	u32 action[] = {
>>> +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
>>> +		guc_id,
>>> +	};
>>> +
>>> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
>>> +}
>>> +
>>> +static int deregister_context(struct intel_context *ce, u32 guc_id)
>>> +{
>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>> +
>>> +	return __guc_action_deregister_context(guc, guc_id);
>>> +}
>>> +
>>> +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
>>> +{
>>> +	switch (class) {
>>> +	case RENDER_CLASS:
>>> +		return mask >> RCS0;
>>> +	case VIDEO_ENHANCEMENT_CLASS:
>>> +		return mask >> VECS0;
>>> +	case VIDEO_DECODE_CLASS:
>>> +		return mask >> VCS0;
>>> +	case COPY_ENGINE_CLASS:
>>> +		return mask >> BCS0;
>>> +	default:
>>> +		GEM_BUG_ON("Invalid Class");
>> we usually use MISSING_CASE for this type of errors.
> As soon as start talking in logical space (series immediatly after this
> gets merged) this function gets deleted but will fix.
>
>>> +		return 0;
>>> +	}
>>> +}
>>> +
>>> +static void guc_context_policy_init(struct intel_engine_cs *engine,
>>> +				    struct guc_lrc_desc *desc)
>>> +{
>>> +	desc->policy_flags = 0;
>>> +
>>> +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
>>> +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
>>> +}
>>> +
>>> +static int guc_lrc_desc_pin(struct intel_context *ce)
>>> +{
>>> +	struct intel_runtime_pm *runtime_pm =
>>> +		&ce->engine->gt->i915->runtime_pm;
>> If you move this after the engine var below you can skip the ce->engine
>> jump. Also, you can shorten the pointer chasing by using
>> engine->uncore->rpm.
>>
> Sure.
>
>>> +	struct intel_engine_cs *engine = ce->engine;
>>> +	struct intel_guc *guc = &engine->gt->uc.guc;
>>> +	u32 desc_idx = ce->guc_id;
>>> +	struct guc_lrc_desc *desc;
>>> +	bool context_registered;
>>> +	intel_wakeref_t wakeref;
>>> +	int ret = 0;
>>> +
>>> +	GEM_BUG_ON(!engine->mask);
>>> +
>>> +	/*
>>> +	 * Ensure LRC + CT vmas are is same region as write barrier is done
>>> +	 * based on CT vma region.
>>> +	 */
>>> +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
>>> +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
>>> +
>>> +	context_registered = lrc_desc_registered(guc, desc_idx);
>>> +
>>> +	reset_lrc_desc(guc, desc_idx);
>>> +	set_lrc_desc_registered(guc, desc_idx, ce);
>>> +
>>> +	desc = __get_lrc_desc(guc, desc_idx);
>>> +	desc->engine_class = engine_class_to_guc_class(engine->class);
>>> +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
>>> +						      engine->mask);
>>> +	desc->hw_context_desc = ce->lrc.lrca;
>>> +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
>>> +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
>>> +	guc_context_policy_init(engine, desc);
>>> +	init_sched_state(ce);
>>> +
>>> +	/*
>>> +	 * The context_lookup xarray is used to determine if the hardware
>>> +	 * context is currently registered. There are two cases in which it
>>> +	 * could be regisgered either the guc_id has been stole from from
>> typo regisgered
>>
> John got this one. Fixed.
>   
>>> +	 * another context or the lrc descriptor address of this context has
>>> +	 * changed. In either case the context needs to be deregistered with the
>>> +	 * GuC before registering this context.
>>> +	 */
>>> +	if (context_registered) {
>>> +		set_context_wait_for_deregister_to_register(ce);
>>> +		intel_context_get(ce);
>>> +
>>> +		/*
>>> +		 * If stealing the guc_id, this ce has the same guc_id as the
>>> +		 * context whos guc_id was stole.
>>> +		 */
>>> +		with_intel_runtime_pm(runtime_pm, wakeref)
>>> +			ret = deregister_context(ce, ce->guc_id);
>>> +	} else {
>>> +		with_intel_runtime_pm(runtime_pm, wakeref)
>>> +			ret = register_context(ce);
>>> +	}
>>> +
>>> +	return ret;
>>>    }
>>>    static int guc_context_pre_pin(struct intel_context *ce,
>>> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
>>>    static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>>    {
>>> +	if (i915_ggtt_offset(ce->state) !=
>>> +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
>>> +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>> Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being
>> re-assigned in there; it looks like it can only change if the context is
>> new, but for future-proofing IMO better to re-order here.
>>
> No this is right. ce->state is assigned in lr_pre_pin and ce->lrc.lrca is
> assigned in lrc_pin. To catch the change you have to check between
> those two functions.
>
>>> +
>>>    	return lrc_pin(ce, ce->engine, vaddr);
>> Could use a comment to say that the GuC context pinning call is delayed
>> until request alloc and to look at the comment in the latter function for
>> details (or move the explanation here and refer here from request alloc).
>>
> Yes.
>   
>>>    }
>>> +static void guc_context_unpin(struct intel_context *ce)
>>> +{
>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>> +
>>> +	unpin_guc_id(guc, ce);
>>> +	lrc_unpin(ce);
>>> +}
>>> +
>>> +static void guc_context_post_unpin(struct intel_context *ce)
>>> +{
>>> +	lrc_post_unpin(ce);
>> why do we need this function? we can just pass lrc_post_unpin to the ops
>> (like we already do).
>>
> We don't before style reasons I think having a GuC specific wrapper is
> correct.
>
>>> +}
>>> +
>>> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>>> +{
>>> +	struct intel_engine_cs *engine = ce->engine;
>>> +	struct intel_guc *guc = &engine->gt->uc.guc;
>>> +	unsigned long flags;
>>> +
>>> +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
>>> +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
>>> +
>>> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
>>> +	set_context_destroyed(ce);
>>> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>> +
>>> +	deregister_context(ce, ce->guc_id);
>>> +}
>>> +
>>> +static void guc_context_destroy(struct kref *kref)
>>> +{
>>> +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
>>> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
>> same as above, going through engine->uncore->rpm is shorter
>>
> Sure.
>   
>>> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
>>> +	intel_wakeref_t wakeref;
>>> +	unsigned long flags;
>>> +
>>> +	/*
>>> +	 * If the guc_id is invalid this context has been stolen and we can free
>>> +	 * it immediately. Also can be freed immediately if the context is not
>>> +	 * registered with the GuC.
>>> +	 */
>>> +	if (context_guc_id_invalid(ce) ||
>>> +	    !lrc_desc_registered(guc, ce->guc_id)) {
>>> +		release_guc_id(guc, ce);
>> it feels a bit weird that we call release_guc_id in the case where the id is
>> invalid. The code handles it fine, but still the flow doesn't feel clean.
>> Not a blocker.
>>
> I understand. Let's see if this can get cleaned up at some point.
>
>>> +		lrc_destroy(kref);
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * We have to acquire the context spinlock and check guc_id again, if it
>>> +	 * is valid it hasn't been stolen and needs to be deregistered. We
>>> +	 * delete this context from the list of unpinned guc_ids available to
>>> +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
>>> +	 * returns indicating this context has been deregistered the guc_id is
>>> +	 * returned to the pool of available guc_ids.
>>> +	 */
>>> +	spin_lock_irqsave(&guc->contexts_lock, flags);
>>> +	if (context_guc_id_invalid(ce)) {
>>> +		__release_guc_id(guc, ce);
>> But here the call to __release_guc_id is unneded, right? the ce doesn't own
>> the ID anymore and both the steal and the release function already clean
>> ce->guc_id_link, so there should be nothing left to clean.
>>
> This is dead code. Will delete.
>
>>> +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +		lrc_destroy(kref);
>>> +		return;
>>> +	}
>>> +
>>> +	if (!list_empty(&ce->guc_id_link))
>>> +		list_del_init(&ce->guc_id_link);
>>> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +
>>> +	/*
>>> +	 * We defer GuC context deregistration until the context is destroyed
>>> +	 * in order to save on CTBs. With this optimization ideally we only need
>>> +	 * 1 CTB to register the context during the first pin and 1 CTB to
>>> +	 * deregister the context when the context is destroyed. Without this
>>> +	 * optimization, a CTB would be needed every pin & unpin.
>>> +	 *
>>> +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
>>> +	 * from context_free_worker when not runtime wakeref is held.
>>> +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
>>> +	 * in H2G CTB to deregister the context. A future patch may defer this
>>> +	 * H2G CTB if the runtime wakeref is zero.
>>> +	 */
>>> +	with_intel_runtime_pm(runtime_pm, wakeref)
>>> +		guc_lrc_desc_unpin(ce);
>>> +}
>>> +
>>> +static int guc_context_alloc(struct intel_context *ce)
>>> +{
>>> +	return lrc_alloc(ce, ce->engine);
>>> +}
>>> +
>>>    static const struct intel_context_ops guc_context_ops = {
>>>    	.alloc = guc_context_alloc,
>>>    	.pre_pin = guc_context_pre_pin,
>>>    	.pin = guc_context_pin,
>>> -	.unpin = lrc_unpin,
>>> -	.post_unpin = lrc_post_unpin,
>>> +	.unpin = guc_context_unpin,
>>> +	.post_unpin = guc_context_post_unpin,
>>>    	.enter = intel_context_enter_engine,
>>>    	.exit = intel_context_exit_engine,
>>>    	.reset = lrc_reset,
>>> -	.destroy = lrc_destroy,
>>> +	.destroy = guc_context_destroy,
>>>    };
>>> -static int guc_request_alloc(struct i915_request *request)
>>> +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>>>    {
>>> +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
>>> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
>>> +}
>>> +
>>> +static int guc_request_alloc(struct i915_request *rq)
>>> +{
>>> +	struct intel_context *ce = rq->context;
>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>>    	int ret;
>>> -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
>>> +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>>>    	/*
>>>    	 * Flush enough space to reduce the likelihood of waiting after
>>>    	 * we start building the request - in which case we will just
>>>    	 * have to repeat work.
>>>    	 */
>>> -	request->reserved_space += GUC_REQUEST_SIZE;
>>> +	rq->reserved_space += GUC_REQUEST_SIZE;
>>>    	/*
>>>    	 * Note that after this point, we have committed to using
>>> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
>>>    	 */
>>>    	/* Unconditionally invalidate GPU caches and TLBs. */
>>> -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
>>> +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>>>    	if (ret)
>>>    		return ret;
>>> -	request->reserved_space -= GUC_REQUEST_SIZE;
>>> -	return 0;
>>> -}
>>> +	rq->reserved_space -= GUC_REQUEST_SIZE;
>>> -static inline void queue_request(struct i915_sched_engine *sched_engine,
>>> -				 struct i915_request *rq,
>>> -				 int prio)
>>> -{
>>> -	GEM_BUG_ON(!list_empty(&rq->sched.link));
>>> -	list_add_tail(&rq->sched.link,
>>> -		      i915_sched_lookup_priolist(sched_engine, prio));
>>> -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>> -}
>>> -
>>> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>>> -				     struct i915_request *rq)
>>> -{
>>> -	int ret;
>>> -
>>> -	__i915_request_submit(rq);
>>> -
>>> -	trace_i915_request_in(rq, 0);
>>> -
>>> -	guc_set_lrc_tail(rq);
>>> -	ret = guc_add_request(guc, rq);
>>> -	if (ret == -EBUSY)
>>> -		guc->stalled_request = rq;
>>> -
>>> -	return ret;
>>> -}
>>> -
>>> -static void guc_submit_request(struct i915_request *rq)
>>> -{
>>> -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>>> -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
>>> -	unsigned long flags;
>>> +	/*
>>> +	 * Call pin_guc_id here rather than in the pinning step as with
>>> +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
>>> +	 * guc_ids and creating horrible race conditions. This is especially bad
>>> +	 * when guc_ids are being stolen due to over subscription. By the time
>>> +	 * this function is reached, it is guaranteed that the guc_id will be
>>> +	 * persistent until the generated request is retired. Thus, sealing these
>>> +	 * race conditions. It is still safe to fail here if guc_ids are
>>> +	 * exhausted and return -EAGAIN to the user indicating that they can try
>>> +	 * again in the future.
>>> +	 *
>>> +	 * There is no need for a lock here as the timeline mutex ensures at
>>> +	 * most one context can be executing this code path at once. The
>>> +	 * guc_id_ref is incremented once for every request in flight and
>>> +	 * decremented on each retire. When it is zero, a lock around the
>>> +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
>>> +	 */
>>> +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
>>> +		return 0;
>>> -	/* Will be called from irq-context when using foreign fences. */
>>> -	spin_lock_irqsave(&sched_engine->lock, flags);
>>> +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
>>> +	if (unlikely(ret < 0))
>>> +		return ret;;
>> typo ";;"
>>
> Yep.
>
>>> +	if (context_needs_register(ce, !!ret)) {
>>> +		ret = guc_lrc_desc_pin(ce);
>>> +		if (unlikely(ret)) {	/* unwind */
>>> +			atomic_dec(&ce->guc_id_ref);
>>> +			unpin_guc_id(guc, ce);
>>> +			return ret;
>>> +		}
>>> +	}
>>> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
>>> -		queue_request(sched_engine, rq, rq_prio(rq));
>>> -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>>> -		tasklet_hi_schedule(&sched_engine->tasklet);
>>> +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>>
>> Might be worth moving the pinning to a separate function to keep the
>> request_alloc focused on the request. Can be done as a follow up.
>>
> Sure. Let me see how that looks in a follow up.
>   
>>> -	spin_unlock_irqrestore(&sched_engine->lock, flags);
>>> +	return 0;
>>>    }
>>>    static void sanitize_hwsp(struct intel_engine_cs *engine)
>>> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>>>    	engine->submit_request = guc_submit_request;
>>>    }
>>> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
>>> +					  struct intel_context *ce)
>>> +{
>>> +	if (context_guc_id_invalid(ce))
>>> +		pin_guc_id(guc, ce);
>>> +	guc_lrc_desc_pin(ce);
>>> +}
>>> +
>>> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
>>> +{
>>> +	struct intel_gt *gt = guc_to_gt(guc);
>>> +	struct intel_engine_cs *engine;
>>> +	enum intel_engine_id id;
>>> +
>>> +	/* make sure all descriptors are clean... */
>>> +	xa_destroy(&guc->context_lookup);
>>> +
>>> +	/*
>>> +	 * Some contexts might have been pinned before we enabled GuC
>>> +	 * submission, so we need to add them to the GuC bookeeping.
>>> +	 * Also, after a reset the GuC we want to make sure that the information
>>> +	 * shared with GuC is properly reset. The kernel lrcs are not attached
>>> +	 * to the gem_context, so they need to be added separately.
>>> +	 *
>>> +	 * Note: we purposely do not check the error return of
>>> +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
>>> +	 * One, if there aren't enough free IDs, but we're guaranteed to have
>>> +	 * enough here (we're either only pinning a handful of lrc on first boot
>>> +	 * or we're re-pinning lrcs that were already pinned before the reset).
>>> +	 * Two, if the GuC has died and CTBs can't make forward progress.
>>> +	 * Presumably, the GuC should be alive as this function is called on
>>> +	 * driver load or after a reset. Even if it is dead, another full GPU
>>> +	 * reset will be triggered and this function would be called again.
>>> +	 */
>>> +
>>> +	for_each_engine(engine, gt, id)
>>> +		if (engine->kernel_context)
>>> +			guc_kernel_context_pin(guc, engine->kernel_context);
>>> +}
>>> +
>>>    static void guc_release(struct intel_engine_cs *engine)
>>>    {
>>>    	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
>>> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>>>    void intel_guc_submission_enable(struct intel_guc *guc)
>>>    {
>>> +	guc_init_lrc_mapping(guc);
>>>    }
>>>    void intel_guc_submission_disable(struct intel_guc *guc)
>>> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>>>    {
>>>    	guc->submission_selected = __guc_submission_selected(guc);
>>>    }
>>> +
>>> +static inline struct intel_context *
>>> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>>> +{
>>> +	struct intel_context *ce;
>>> +
>>> +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
>>> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
>>> +			"Invalid desc_idx %u", desc_idx);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	ce = __get_context(guc, desc_idx);
>>> +	if (unlikely(!ce)) {
>>> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
>>> +			"Context is NULL, desc_idx %u", desc_idx);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	return ce;
>>> +}
>>> +
>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>> +					  const u32 *msg,
>>> +					  u32 len)
>>> +{
>>> +	struct intel_context *ce;
>>> +	u32 desc_idx = msg[0];
>>> +
>>> +	if (unlikely(len < 1)) {
>>> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
>>> +		return -EPROTO;
>>> +	}
>>> +
>>> +	ce = g2h_context_lookup(guc, desc_idx);
>>> +	if (unlikely(!ce))
>>> +		return -EPROTO;
>>> +
>>> +	if (context_wait_for_deregister_to_register(ce)) {
>>> +		struct intel_runtime_pm *runtime_pm =
>>> +			&ce->engine->gt->i915->runtime_pm;
>>> +		intel_wakeref_t wakeref;
>>> +
>>> +		/*
>>> +		 * Previous owner of this guc_id has been deregistered, now safe
>>> +		 * register this context.
>>> +		 */
>>> +		with_intel_runtime_pm(runtime_pm, wakeref)
>>> +			register_context(ce);
>>> +		clr_context_wait_for_deregister_to_register(ce);
>>> +		intel_context_put(ce);
>>> +	} else if (context_destroyed(ce)) {
>>> +		/* Context has been destroyed */
>>> +		release_guc_id(guc, ce);
>>> +		lrc_destroy(&ce->ref);
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>>> index 943fe485c662..204c95c39353 100644
>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>> @@ -4142,6 +4142,7 @@ enum {
>>>    	FAULT_AND_CONTINUE /* Unsupported */
>>>    };
>>> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>>>    #define GEN8_CTX_VALID (1 << 0)
>>>    #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>>>    #define GEN8_CTX_FORCE_RESTORE (1 << 2)
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 86b4c9f2613d..b48c4905d3fc 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
>>>    	 */
>>>    	if (!list_empty(&rq->sched.link))
>>>    		remove_from_engine(rq);
>>> +	atomic_dec(&rq->context->guc_id_ref);
>> Does this work/make sense  if GuC is disabled?
>>
> It doesn't matter if the GuC is disabled as this var isn't used if it
> is. A following patch adds a vfunc here and pushes this in the GuC
> backend.
>
> Matt
>
>> Daniele
>>
>>>    	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>>>    	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
@ 2021-07-21 23:51         ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-21 23:51 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel



On 7/19/2021 9:04 PM, Matthew Brost wrote:
> On Mon, Jul 19, 2021 at 05:51:46PM -0700, Daniele Ceraolo Spurio wrote:
>>
>> On 7/16/2021 1:16 PM, Matthew Brost wrote:
>>> Implement GuC context operations which includes GuC specific operations
>>> alloc, pin, unpin, and destroy.
>>>
>>> v2:
>>>    (Daniel Vetter)
>>>     - Use msleep_interruptible rather than cond_resched in busy loop
>>>    (Michal)
>>>     - Remove C++ style comment
>>>
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>>>    drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>>>    drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666 ++++++++++++++++--
>>>    drivers/gpu/drm/i915/i915_reg.h               |   1 +
>>>    drivers/gpu/drm/i915/i915_request.c           |   1 +
>>>    8 files changed, 685 insertions(+), 55 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
>>> index bd63813c8a80..32fd6647154b 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
>>> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>>>    	mutex_init(&ce->pin_mutex);
>>> +	spin_lock_init(&ce->guc_state.lock);
>>> +
>>> +	ce->guc_id = GUC_INVALID_LRC_ID;
>>> +	INIT_LIST_HEAD(&ce->guc_id_link);
>>> +
>>>    	i915_active_init(&ce->active,
>>>    			 __intel_context_active, __intel_context_retire, 0);
>>>    }
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> index 6d99631d19b9..606c480aec26 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>> @@ -96,6 +96,7 @@ struct intel_context {
>>>    #define CONTEXT_BANNED			6
>>>    #define CONTEXT_FORCE_SINGLE_SUBMISSION	7
>>>    #define CONTEXT_NOPREEMPT		8
>>> +#define CONTEXT_LRCA_DIRTY		9
>>>    	struct {
>>>    		u64 timeout_us;
>>> @@ -138,14 +139,29 @@ struct intel_context {
>>>    	u8 wa_bb_page; /* if set, page num reserved for context workarounds */
>>> +	struct {
>>> +		/** lock: protects everything in guc_state */
>>> +		spinlock_t lock;
>>> +		/**
>>> +		 * sched_state: scheduling state of this context using GuC
>>> +		 * submission
>>> +		 */
>>> +		u8 sched_state;
>>> +	} guc_state;
>>> +
>>>    	/* GuC scheduling state flags that do not require a lock. */
>>>    	atomic_t guc_sched_state_no_lock;
>>> +	/* GuC LRC descriptor ID */
>>> +	u16 guc_id;
>>> +
>>> +	/* GuC LRC descriptor reference count */
>>> +	atomic_t guc_id_ref;
>>> +
>>>    	/*
>>> -	 * GuC LRC descriptor ID - Not assigned in this patch but future patches
>>> -	 * in the series will.
>>> +	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
>>>    	 */
>>> -	u16 guc_id;
>>> +	struct list_head guc_id_link;
>>>    };
>>>    #endif /* __INTEL_CONTEXT_TYPES__ */
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>> index 41e5350a7a05..49d4857ad9b7 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>> @@ -87,7 +87,6 @@
>>>    #define GEN11_CSB_WRITE_PTR_MASK	(GEN11_CSB_PTR_MASK << 0)
>>>    #define MAX_CONTEXT_HW_ID	(1 << 21) /* exclusive */
>>> -#define MAX_GUC_CONTEXT_HW_ID	(1 << 20) /* exclusive */
>>>    #define GEN11_MAX_CONTEXT_HW_ID	(1 << 11) /* exclusive */
>>>    /* in Gen12 ID 0x7FF is reserved to indicate idle */
>>>    #define GEN12_MAX_CONTEXT_HW_ID	(GEN11_MAX_CONTEXT_HW_ID - 1)
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> index 8c7b92f699f1..30773cd699f5 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>> @@ -7,6 +7,7 @@
>>>    #define _INTEL_GUC_H_
>>>    #include <linux/xarray.h>
>>> +#include <linux/delay.h>
>>>    #include "intel_uncore.h"
>>>    #include "intel_guc_fw.h"
>>> @@ -44,6 +45,14 @@ struct intel_guc {
>>>    		void (*disable)(struct intel_guc *guc);
>>>    	} interrupts;
>>> +	/*
>>> +	 * contexts_lock protects the pool of free guc ids and a linked list of
>>> +	 * guc ids available to be stolen
>>> +	 */
>>> +	spinlock_t contexts_lock;
>>> +	struct ida guc_ids;
>>> +	struct list_head guc_id_list;
>>> +
>>>    	bool submission_selected;
>>>    	struct i915_vma *ads_vma;
>>> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc *guc, const u32 *action, u32 len,
>>>    				 response_buf, response_buf_size, 0);
>>>    }
>>> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
>>> +					   const u32 *action,
>>> +					   u32 len,
>>> +					   bool loop)
>>> +{
>>> +	int err;
>>> +	unsigned int sleep_period_ms = 1;
>>> +	bool not_atomic = !in_atomic() && !irqs_disabled();
>>> +
>>> +	/* No sleeping with spin locks, just busy loop */
>>> +	might_sleep_if(loop && not_atomic);
>>> +
>>> +retry:
>>> +	err = intel_guc_send_nb(guc, action, len);
>>> +	if (unlikely(err == -EBUSY && loop)) {
>>> +		if (likely(not_atomic)) {
>>> +			if (msleep_interruptible(sleep_period_ms))
>>> +				return -EINTR;
>>> +			sleep_period_ms = sleep_period_ms << 1;
>>> +		} else {
>>> +			cpu_relax();
>> Potentially something we can change later, but if we're in atomic context we
>> can't keep looping without a timeout while we get -EBUSY, we need to bail
>> out early.
>>
> Eventually intel_guc_send_nb will pop with a different return code after
> 1 second I think if the GuC is hung.

I know, the point is that 1 second is too long in atomic context. This 
has to either complete quick in atomic or be forked off to a worker. Can 
fix as a follow up.

Daniele

>   
>>> +		}
>>> +		goto retry;
>>> +	}
>>> +
>>> +	return err;
>>> +}
>>> +
>>>    static inline void intel_guc_to_host_event_handler(struct intel_guc *guc)
>>>    {
>>>    	intel_guc_ct_event_handler(&guc->ct);
>>> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct intel_guc *guc, u32 mask)
>>>    int intel_guc_reset_engine(struct intel_guc *guc,
>>>    			   struct intel_engine_cs *engine);
>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>> +					  const u32 *msg, u32 len);
>>> +
>>>    void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p);
>>>    #endif
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> index 83ec60ea3f89..28ff82c5be45 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>> @@ -928,6 +928,10 @@ static int ct_process_request(struct intel_guc_ct *ct, struct ct_incoming_msg *r
>>>    	case INTEL_GUC_ACTION_DEFAULT:
>>>    		ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>>>    		break;
>>> +	case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
>>> +		ret = intel_guc_deregister_done_process_msg(guc, payload,
>>> +							    len);
>>> +		break;
>>>    	default:
>>>    		ret = -EOPNOTSUPP;
>>>    		break;
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> index 53b4a5eb4a85..a47b3813b4d0 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>> @@ -13,7 +13,9 @@
>>>    #include "gt/intel_gt.h"
>>>    #include "gt/intel_gt_irq.h"
>>>    #include "gt/intel_gt_pm.h"
>>> +#include "gt/intel_gt_requests.h"
>>>    #include "gt/intel_lrc.h"
>>> +#include "gt/intel_lrc_reg.h"
>>>    #include "gt/intel_mocs.h"
>>>    #include "gt/intel_ring.h"
>>> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct intel_context *ce)
>>>    		   &ce->guc_sched_state_no_lock);
>>>    }
>>> +/*
>>> + * Below is a set of functions which control the GuC scheduling state which
>>> + * require a lock, aside from the special case where the functions are called
>>> + * from guc_lrc_desc_pin(). In that case it isn't possible for any other code
>>> + * path to be executing on the context.
>>> + */
>> Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even if it
>> isn't strictly needed? I'd prefer to avoid the asymmetry in the locking
> If this code is called from releasing a fence a circular dependency is
> created. Once we move the DRM scheduler the locking becomes way easier
> and we clean this up then. We already have task for locking clean up.
>
>> scheme if possible, as that might case trouble if thing change in the
>> future.
> Again this is temporary code, so I think we can live with this until the
> DRM scheduler lands.
>
>>> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER	BIT(0)
>>> +#define SCHED_STATE_DESTROYED				BIT(1)
>>> +static inline void init_sched_state(struct intel_context *ce)
>>> +{
>>> +	/* Only should be called from guc_lrc_desc_pin() */
>>> +	atomic_set(&ce->guc_sched_state_no_lock, 0);
>>> +	ce->guc_state.sched_state = 0;
>>> +}
>>> +
>>> +static inline bool
>>> +context_wait_for_deregister_to_register(struct intel_context *ce)
>>> +{
>>> +	return (ce->guc_state.sched_state &
>>> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
>> No need for (). Below as well.
>>
> Yep.
>
>>> +}
>>> +
>>> +static inline void
>>> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
>>> +{
>>> +	/* Only should be called from guc_lrc_desc_pin() */
>>> +	ce->guc_state.sched_state |=
>>> +		SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
>>> +}
>>> +
>>> +static inline void
>>> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
>>> +{
>>> +	lockdep_assert_held(&ce->guc_state.lock);
>>> +	ce->guc_state.sched_state =
>>> +		(ce->guc_state.sched_state &
>>> +		 ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
>> nit: can also use
>>
>> ce->guc_state.sched_state &=
>> 	~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER
>>
> Yep.
>   
>>> +}
>>> +
>>> +static inline bool
>>> +context_destroyed(struct intel_context *ce)
>>> +{
>>> +	return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
>>> +}
>>> +
>>> +static inline void
>>> +set_context_destroyed(struct intel_context *ce)
>>> +{
>>> +	lockdep_assert_held(&ce->guc_state.lock);
>>> +	ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
>>> +}
>>> +
>>> +static inline bool context_guc_id_invalid(struct intel_context *ce)
>>> +{
>>> +	return (ce->guc_id == GUC_INVALID_LRC_ID);
>>> +}
>>> +
>>> +static inline void set_context_guc_id_invalid(struct intel_context *ce)
>>> +{
>>> +	ce->guc_id = GUC_INVALID_LRC_ID;
>>> +}
>>> +
>>> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
>>> +{
>>> +	return &ce->engine->gt->uc.guc;
>>> +}
>>> +
>>>    static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>>>    {
>>>    	return rb_entry(rb, struct i915_priolist, node);
>>> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>>>    	int len = 0;
>>>    	bool enabled = context_enabled(ce);
>>> +	GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
>>> +	GEM_BUG_ON(context_guc_id_invalid(ce));
>>> +
>>>    	if (!enabled) {
>>>    		action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>>>    		action[len++] = ce->guc_id;
>>> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc *guc)
>>>    	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>>> +	spin_lock_init(&guc->contexts_lock);
>>> +	INIT_LIST_HEAD(&guc->guc_id_list);
>>> +	ida_init(&guc->guc_ids);
>>> +
>>>    	return 0;
>>>    }
>>> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct intel_guc *guc)
>>>    	i915_sched_engine_put(guc->sched_engine);
>>>    }
>>> -static int guc_context_alloc(struct intel_context *ce)
>>> +static inline void queue_request(struct i915_sched_engine *sched_engine,
>>> +				 struct i915_request *rq,
>>> +				 int prio)
>>>    {
>>> -	return lrc_alloc(ce, ce->engine);
>>> +	GEM_BUG_ON(!list_empty(&rq->sched.link));
>>> +	list_add_tail(&rq->sched.link,
>>> +		      i915_sched_lookup_priolist(sched_engine, prio));
>>> +	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>> +}
>>> +
>>> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>>> +				     struct i915_request *rq)
>>> +{
>>> +	int ret;
>>> +
>>> +	__i915_request_submit(rq);
>>> +
>>> +	trace_i915_request_in(rq, 0);
>>> +
>>> +	guc_set_lrc_tail(rq);
>>> +	ret = guc_add_request(guc, rq);
>>> +	if (ret == -EBUSY)
>>> +		guc->stalled_request = rq;
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void guc_submit_request(struct i915_request *rq)
>>> +{
>>> +	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>>> +	struct intel_guc *guc = &rq->engine->gt->uc.guc;
>>> +	unsigned long flags;
>>> +
>>> +	/* Will be called from irq-context when using foreign fences. */
>>> +	spin_lock_irqsave(&sched_engine->lock, flags);
>>> +
>>> +	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
>>> +		queue_request(sched_engine, rq, rq_prio(rq));
>>> +	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>>> +		tasklet_hi_schedule(&sched_engine->tasklet);
>>> +
>>> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
>>> +}
>>> +
>>> +#define GUC_ID_START	64	/* First 64 guc_ids reserved */
>>> +static int new_guc_id(struct intel_guc *guc)
>>> +{
>>> +	return ida_simple_get(&guc->guc_ids, GUC_ID_START,
>>> +			      GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
>>> +			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
>>> +}
>>> +
>>> +static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>> +{
>>> +	if (!context_guc_id_invalid(ce)) {
>>> +		ida_simple_remove(&guc->guc_ids, ce->guc_id);
>>> +		reset_lrc_desc(guc, ce->guc_id);
>>> +		set_context_guc_id_invalid(ce);
>>> +	}
>>> +	if (!list_empty(&ce->guc_id_link))
>>> +		list_del_init(&ce->guc_id_link);
>>> +}
>>> +
>>> +static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>> +{
>>> +	unsigned long flags;
>>> +
>>> +	spin_lock_irqsave(&guc->contexts_lock, flags);
>>> +	__release_guc_id(guc, ce);
>>> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +}
>>> +
>>> +static int steal_guc_id(struct intel_guc *guc)
>>> +{
>>> +	struct intel_context *ce;
>>> +	int guc_id;
>>> +
>>> +	if (!list_empty(&guc->guc_id_list)) {
>>> +		ce = list_first_entry(&guc->guc_id_list,
>>> +				      struct intel_context,
>>> +				      guc_id_link);
>>> +
>>> +		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
>>> +		GEM_BUG_ON(context_guc_id_invalid(ce));
>>> +
>>> +		list_del_init(&ce->guc_id_link);
>>> +		guc_id = ce->guc_id;
>>> +		set_context_guc_id_invalid(ce);
>>> +		return guc_id;
>>> +	} else {
>>> +		return -EAGAIN;
>>> +	}
>>> +}
>>> +
>>> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = new_guc_id(guc);
>>> +	if (unlikely(ret < 0)) {
>>> +		ret = steal_guc_id(guc);
>>> +		if (ret < 0)
>>> +			return ret;
>>> +	}
>>> +
>>> +	*out = ret;
>>> +	return 0;
>> Is it worth adding spinlock_held asserts for guc->contexts_lock in these ID
>> functions? Doubles up as a documentation of what locking we expect.
>>
> Never a bad idea to have asserts in the code. Will add.
>
>>> +}
>>> +
>>> +#define PIN_GUC_ID_TRIES	4
>>> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>> +{
>>> +	int ret = 0;
>>> +	unsigned long flags, tries = PIN_GUC_ID_TRIES;
>>> +
>>> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
>>> +
>>> +try_again:
>>> +	spin_lock_irqsave(&guc->contexts_lock, flags);
>>> +
>>> +	if (context_guc_id_invalid(ce)) {
>>> +		ret = assign_guc_id(guc, &ce->guc_id);
>>> +		if (ret)
>>> +			goto out_unlock;
>>> +		ret = 1;	/* Indidcates newly assigned guc_id */
>>> +	}
>>> +	if (!list_empty(&ce->guc_id_link))
>>> +		list_del_init(&ce->guc_id_link);
>>> +	atomic_inc(&ce->guc_id_ref);
>>> +
>>> +out_unlock:
>>> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +
>>> +	/*
>>> +	 * -EAGAIN indicates no guc_ids are available, let's retire any
>>> +	 * outstanding requests to see if that frees up a guc_id. If the first
>>> +	 * retire didn't help, insert a sleep with the timeslice duration before
>>> +	 * attempting to retire more requests. Double the sleep period each
>>> +	 * subsequent pass before finally giving up. The sleep period has max of
>>> +	 * 100ms and minimum of 1ms.
>>> +	 */
>>> +	if (ret == -EAGAIN && --tries) {
>>> +		if (PIN_GUC_ID_TRIES - tries > 1) {
>>> +			unsigned int timeslice_shifted =
>>> +				ce->engine->props.timeslice_duration_ms <<
>>> +				(PIN_GUC_ID_TRIES - tries - 2);
>>> +			unsigned int max = min_t(unsigned int, 100,
>>> +						 timeslice_shifted);
>>> +
>>> +			msleep(max_t(unsigned int, max, 1));
>>> +		}
>>> +		intel_gt_retire_requests(guc_to_gt(guc));
>>> +		goto try_again;
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>> +static void unpin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>> +{
>>> +	unsigned long flags;
>>> +
>>> +	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
>>> +
>>> +	spin_lock_irqsave(&guc->contexts_lock, flags);
>>> +	if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
>>> +	    !atomic_read(&ce->guc_id_ref))
>>> +		list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
>>> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +}
>>> +
>>> +static int __guc_action_register_context(struct intel_guc *guc,
>>> +					 u32 guc_id,
>>> +					 u32 offset)
>>> +{
>>> +	u32 action[] = {
>>> +		INTEL_GUC_ACTION_REGISTER_CONTEXT,
>>> +		guc_id,
>>> +		offset,
>>> +	};
>>> +
>>> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
>>> +}
>>> +
>>> +static int register_context(struct intel_context *ce)
>>> +{
>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>> +	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
>>> +		ce->guc_id * sizeof(struct guc_lrc_desc);
>>> +
>>> +	return __guc_action_register_context(guc, ce->guc_id, offset);
>>> +}
>>> +
>>> +static int __guc_action_deregister_context(struct intel_guc *guc,
>>> +					   u32 guc_id)
>>> +{
>>> +	u32 action[] = {
>>> +		INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
>>> +		guc_id,
>>> +	};
>>> +
>>> +	return intel_guc_send_busy_loop(guc, action, ARRAY_SIZE(action), true);
>>> +}
>>> +
>>> +static int deregister_context(struct intel_context *ce, u32 guc_id)
>>> +{
>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>> +
>>> +	return __guc_action_deregister_context(guc, guc_id);
>>> +}
>>> +
>>> +static intel_engine_mask_t adjust_engine_mask(u8 class, intel_engine_mask_t mask)
>>> +{
>>> +	switch (class) {
>>> +	case RENDER_CLASS:
>>> +		return mask >> RCS0;
>>> +	case VIDEO_ENHANCEMENT_CLASS:
>>> +		return mask >> VECS0;
>>> +	case VIDEO_DECODE_CLASS:
>>> +		return mask >> VCS0;
>>> +	case COPY_ENGINE_CLASS:
>>> +		return mask >> BCS0;
>>> +	default:
>>> +		GEM_BUG_ON("Invalid Class");
>> we usually use MISSING_CASE for this type of errors.
> As soon as start talking in logical space (series immediatly after this
> gets merged) this function gets deleted but will fix.
>
>>> +		return 0;
>>> +	}
>>> +}
>>> +
>>> +static void guc_context_policy_init(struct intel_engine_cs *engine,
>>> +				    struct guc_lrc_desc *desc)
>>> +{
>>> +	desc->policy_flags = 0;
>>> +
>>> +	desc->execution_quantum = CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
>>> +	desc->preemption_timeout = CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
>>> +}
>>> +
>>> +static int guc_lrc_desc_pin(struct intel_context *ce)
>>> +{
>>> +	struct intel_runtime_pm *runtime_pm =
>>> +		&ce->engine->gt->i915->runtime_pm;
>> If you move this after the engine var below you can skip the ce->engine
>> jump. Also, you can shorten the pointer chasing by using
>> engine->uncore->rpm.
>>
> Sure.
>
>>> +	struct intel_engine_cs *engine = ce->engine;
>>> +	struct intel_guc *guc = &engine->gt->uc.guc;
>>> +	u32 desc_idx = ce->guc_id;
>>> +	struct guc_lrc_desc *desc;
>>> +	bool context_registered;
>>> +	intel_wakeref_t wakeref;
>>> +	int ret = 0;
>>> +
>>> +	GEM_BUG_ON(!engine->mask);
>>> +
>>> +	/*
>>> +	 * Ensure LRC + CT vmas are is same region as write barrier is done
>>> +	 * based on CT vma region.
>>> +	 */
>>> +	GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
>>> +		   i915_gem_object_is_lmem(ce->ring->vma->obj));
>>> +
>>> +	context_registered = lrc_desc_registered(guc, desc_idx);
>>> +
>>> +	reset_lrc_desc(guc, desc_idx);
>>> +	set_lrc_desc_registered(guc, desc_idx, ce);
>>> +
>>> +	desc = __get_lrc_desc(guc, desc_idx);
>>> +	desc->engine_class = engine_class_to_guc_class(engine->class);
>>> +	desc->engine_submit_mask = adjust_engine_mask(engine->class,
>>> +						      engine->mask);
>>> +	desc->hw_context_desc = ce->lrc.lrca;
>>> +	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
>>> +	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
>>> +	guc_context_policy_init(engine, desc);
>>> +	init_sched_state(ce);
>>> +
>>> +	/*
>>> +	 * The context_lookup xarray is used to determine if the hardware
>>> +	 * context is currently registered. There are two cases in which it
>>> +	 * could be regisgered either the guc_id has been stole from from
>> typo regisgered
>>
> John got this one. Fixed.
>   
>>> +	 * another context or the lrc descriptor address of this context has
>>> +	 * changed. In either case the context needs to be deregistered with the
>>> +	 * GuC before registering this context.
>>> +	 */
>>> +	if (context_registered) {
>>> +		set_context_wait_for_deregister_to_register(ce);
>>> +		intel_context_get(ce);
>>> +
>>> +		/*
>>> +		 * If stealing the guc_id, this ce has the same guc_id as the
>>> +		 * context whos guc_id was stole.
>>> +		 */
>>> +		with_intel_runtime_pm(runtime_pm, wakeref)
>>> +			ret = deregister_context(ce, ce->guc_id);
>>> +	} else {
>>> +		with_intel_runtime_pm(runtime_pm, wakeref)
>>> +			ret = register_context(ce);
>>> +	}
>>> +
>>> +	return ret;
>>>    }
>>>    static int guc_context_pre_pin(struct intel_context *ce,
>>> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct intel_context *ce,
>>>    static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>>    {
>>> +	if (i915_ggtt_offset(ce->state) !=
>>> +	    (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
>>> +		set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>> Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being
>> re-assigned in there; it looks like it can only change if the context is
>> new, but for future-proofing IMO better to re-order here.
>>
> No this is right. ce->state is assigned in lr_pre_pin and ce->lrc.lrca is
> assigned in lrc_pin. To catch the change you have to check between
> those two functions.
>
>>> +
>>>    	return lrc_pin(ce, ce->engine, vaddr);
>> Could use a comment to say that the GuC context pinning call is delayed
>> until request alloc and to look at the comment in the latter function for
>> details (or move the explanation here and refer here from request alloc).
>>
> Yes.
>   
>>>    }
>>> +static void guc_context_unpin(struct intel_context *ce)
>>> +{
>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>> +
>>> +	unpin_guc_id(guc, ce);
>>> +	lrc_unpin(ce);
>>> +}
>>> +
>>> +static void guc_context_post_unpin(struct intel_context *ce)
>>> +{
>>> +	lrc_post_unpin(ce);
>> why do we need this function? we can just pass lrc_post_unpin to the ops
>> (like we already do).
>>
> We don't before style reasons I think having a GuC specific wrapper is
> correct.
>
>>> +}
>>> +
>>> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>>> +{
>>> +	struct intel_engine_cs *engine = ce->engine;
>>> +	struct intel_guc *guc = &engine->gt->uc.guc;
>>> +	unsigned long flags;
>>> +
>>> +	GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
>>> +	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
>>> +
>>> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
>>> +	set_context_destroyed(ce);
>>> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>> +
>>> +	deregister_context(ce, ce->guc_id);
>>> +}
>>> +
>>> +static void guc_context_destroy(struct kref *kref)
>>> +{
>>> +	struct intel_context *ce = container_of(kref, typeof(*ce), ref);
>>> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
>> same as above, going through engine->uncore->rpm is shorter
>>
> Sure.
>   
>>> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
>>> +	intel_wakeref_t wakeref;
>>> +	unsigned long flags;
>>> +
>>> +	/*
>>> +	 * If the guc_id is invalid this context has been stolen and we can free
>>> +	 * it immediately. Also can be freed immediately if the context is not
>>> +	 * registered with the GuC.
>>> +	 */
>>> +	if (context_guc_id_invalid(ce) ||
>>> +	    !lrc_desc_registered(guc, ce->guc_id)) {
>>> +		release_guc_id(guc, ce);
>> it feels a bit weird that we call release_guc_id in the case where the id is
>> invalid. The code handles it fine, but still the flow doesn't feel clean.
>> Not a blocker.
>>
> I understand. Let's see if this can get cleaned up at some point.
>
>>> +		lrc_destroy(kref);
>>> +		return;
>>> +	}
>>> +
>>> +	/*
>>> +	 * We have to acquire the context spinlock and check guc_id again, if it
>>> +	 * is valid it hasn't been stolen and needs to be deregistered. We
>>> +	 * delete this context from the list of unpinned guc_ids available to
>>> +	 * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
>>> +	 * returns indicating this context has been deregistered the guc_id is
>>> +	 * returned to the pool of available guc_ids.
>>> +	 */
>>> +	spin_lock_irqsave(&guc->contexts_lock, flags);
>>> +	if (context_guc_id_invalid(ce)) {
>>> +		__release_guc_id(guc, ce);
>> But here the call to __release_guc_id is unneded, right? the ce doesn't own
>> the ID anymore and both the steal and the release function already clean
>> ce->guc_id_link, so there should be nothing left to clean.
>>
> This is dead code. Will delete.
>
>>> +		spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +		lrc_destroy(kref);
>>> +		return;
>>> +	}
>>> +
>>> +	if (!list_empty(&ce->guc_id_link))
>>> +		list_del_init(&ce->guc_id_link);
>>> +	spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>> +
>>> +	/*
>>> +	 * We defer GuC context deregistration until the context is destroyed
>>> +	 * in order to save on CTBs. With this optimization ideally we only need
>>> +	 * 1 CTB to register the context during the first pin and 1 CTB to
>>> +	 * deregister the context when the context is destroyed. Without this
>>> +	 * optimization, a CTB would be needed every pin & unpin.
>>> +	 *
>>> +	 * XXX: Need to acqiure the runtime wakeref as this can be triggered
>>> +	 * from context_free_worker when not runtime wakeref is held.
>>> +	 * guc_lrc_desc_unpin requires the runtime as a GuC register is written
>>> +	 * in H2G CTB to deregister the context. A future patch may defer this
>>> +	 * H2G CTB if the runtime wakeref is zero.
>>> +	 */
>>> +	with_intel_runtime_pm(runtime_pm, wakeref)
>>> +		guc_lrc_desc_unpin(ce);
>>> +}
>>> +
>>> +static int guc_context_alloc(struct intel_context *ce)
>>> +{
>>> +	return lrc_alloc(ce, ce->engine);
>>> +}
>>> +
>>>    static const struct intel_context_ops guc_context_ops = {
>>>    	.alloc = guc_context_alloc,
>>>    	.pre_pin = guc_context_pre_pin,
>>>    	.pin = guc_context_pin,
>>> -	.unpin = lrc_unpin,
>>> -	.post_unpin = lrc_post_unpin,
>>> +	.unpin = guc_context_unpin,
>>> +	.post_unpin = guc_context_post_unpin,
>>>    	.enter = intel_context_enter_engine,
>>>    	.exit = intel_context_exit_engine,
>>>    	.reset = lrc_reset,
>>> -	.destroy = lrc_destroy,
>>> +	.destroy = guc_context_destroy,
>>>    };
>>> -static int guc_request_alloc(struct i915_request *request)
>>> +static bool context_needs_register(struct intel_context *ce, bool new_guc_id)
>>>    {
>>> +	return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
>>> +		!lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
>>> +}
>>> +
>>> +static int guc_request_alloc(struct i915_request *rq)
>>> +{
>>> +	struct intel_context *ce = rq->context;
>>> +	struct intel_guc *guc = ce_to_guc(ce);
>>>    	int ret;
>>> -	GEM_BUG_ON(!intel_context_is_pinned(request->context));
>>> +	GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>>>    	/*
>>>    	 * Flush enough space to reduce the likelihood of waiting after
>>>    	 * we start building the request - in which case we will just
>>>    	 * have to repeat work.
>>>    	 */
>>> -	request->reserved_space += GUC_REQUEST_SIZE;
>>> +	rq->reserved_space += GUC_REQUEST_SIZE;
>>>    	/*
>>>    	 * Note that after this point, we have committed to using
>>> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct i915_request *request)
>>>    	 */
>>>    	/* Unconditionally invalidate GPU caches and TLBs. */
>>> -	ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
>>> +	ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>>>    	if (ret)
>>>    		return ret;
>>> -	request->reserved_space -= GUC_REQUEST_SIZE;
>>> -	return 0;
>>> -}
>>> +	rq->reserved_space -= GUC_REQUEST_SIZE;
>>> -static inline void queue_request(struct i915_sched_engine *sched_engine,
>>> -				 struct i915_request *rq,
>>> -				 int prio)
>>> -{
>>> -	GEM_BUG_ON(!list_empty(&rq->sched.link));
>>> -	list_add_tail(&rq->sched.link,
>>> -		      i915_sched_lookup_priolist(sched_engine, prio));
>>> -	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>> -}
>>> -
>>> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>>> -				     struct i915_request *rq)
>>> -{
>>> -	int ret;
>>> -
>>> -	__i915_request_submit(rq);
>>> -
>>> -	trace_i915_request_in(rq, 0);
>>> -
>>> -	guc_set_lrc_tail(rq);
>>> -	ret = guc_add_request(guc, rq);
>>> -	if (ret == -EBUSY)
>>> -		guc->stalled_request = rq;
>>> -
>>> -	return ret;
>>> -}
>>> -
>>> -static void guc_submit_request(struct i915_request *rq)
>>> -{
>>> -	struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>>> -	struct intel_guc *guc = &rq->engine->gt->uc.guc;
>>> -	unsigned long flags;
>>> +	/*
>>> +	 * Call pin_guc_id here rather than in the pinning step as with
>>> +	 * dma_resv, contexts can be repeatedly pinned / unpinned trashing the
>>> +	 * guc_ids and creating horrible race conditions. This is especially bad
>>> +	 * when guc_ids are being stolen due to over subscription. By the time
>>> +	 * this function is reached, it is guaranteed that the guc_id will be
>>> +	 * persistent until the generated request is retired. Thus, sealing these
>>> +	 * race conditions. It is still safe to fail here if guc_ids are
>>> +	 * exhausted and return -EAGAIN to the user indicating that they can try
>>> +	 * again in the future.
>>> +	 *
>>> +	 * There is no need for a lock here as the timeline mutex ensures at
>>> +	 * most one context can be executing this code path at once. The
>>> +	 * guc_id_ref is incremented once for every request in flight and
>>> +	 * decremented on each retire. When it is zero, a lock around the
>>> +	 * increment (in pin_guc_id) is needed to seal a race with unpin_guc_id.
>>> +	 */
>>> +	if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
>>> +		return 0;
>>> -	/* Will be called from irq-context when using foreign fences. */
>>> -	spin_lock_irqsave(&sched_engine->lock, flags);
>>> +	ret = pin_guc_id(guc, ce);	/* returns 1 if new guc_id assigned */
>>> +	if (unlikely(ret < 0))
>>> +		return ret;;
>> typo ";;"
>>
> Yep.
>
>>> +	if (context_needs_register(ce, !!ret)) {
>>> +		ret = guc_lrc_desc_pin(ce);
>>> +		if (unlikely(ret)) {	/* unwind */
>>> +			atomic_dec(&ce->guc_id_ref);
>>> +			unpin_guc_id(guc, ce);
>>> +			return ret;
>>> +		}
>>> +	}
>>> -	if (guc->stalled_request || !i915_sched_engine_is_empty(sched_engine))
>>> -		queue_request(sched_engine, rq, rq_prio(rq));
>>> -	else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>>> -		tasklet_hi_schedule(&sched_engine->tasklet);
>>> +	clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>>
>> Might be worth moving the pinning to a separate function to keep the
>> request_alloc focused on the request. Can be done as a follow up.
>>
> Sure. Let me see how that looks in a follow up.
>   
>>> -	spin_unlock_irqrestore(&sched_engine->lock, flags);
>>> +	return 0;
>>>    }
>>>    static void sanitize_hwsp(struct intel_engine_cs *engine)
>>> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct intel_engine_cs *engine)
>>>    	engine->submit_request = guc_submit_request;
>>>    }
>>> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
>>> +					  struct intel_context *ce)
>>> +{
>>> +	if (context_guc_id_invalid(ce))
>>> +		pin_guc_id(guc, ce);
>>> +	guc_lrc_desc_pin(ce);
>>> +}
>>> +
>>> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
>>> +{
>>> +	struct intel_gt *gt = guc_to_gt(guc);
>>> +	struct intel_engine_cs *engine;
>>> +	enum intel_engine_id id;
>>> +
>>> +	/* make sure all descriptors are clean... */
>>> +	xa_destroy(&guc->context_lookup);
>>> +
>>> +	/*
>>> +	 * Some contexts might have been pinned before we enabled GuC
>>> +	 * submission, so we need to add them to the GuC bookeeping.
>>> +	 * Also, after a reset the GuC we want to make sure that the information
>>> +	 * shared with GuC is properly reset. The kernel lrcs are not attached
>>> +	 * to the gem_context, so they need to be added separately.
>>> +	 *
>>> +	 * Note: we purposely do not check the error return of
>>> +	 * guc_lrc_desc_pin, because that function can only fail in two cases.
>>> +	 * One, if there aren't enough free IDs, but we're guaranteed to have
>>> +	 * enough here (we're either only pinning a handful of lrc on first boot
>>> +	 * or we're re-pinning lrcs that were already pinned before the reset).
>>> +	 * Two, if the GuC has died and CTBs can't make forward progress.
>>> +	 * Presumably, the GuC should be alive as this function is called on
>>> +	 * driver load or after a reset. Even if it is dead, another full GPU
>>> +	 * reset will be triggered and this function would be called again.
>>> +	 */
>>> +
>>> +	for_each_engine(engine, gt, id)
>>> +		if (engine->kernel_context)
>>> +			guc_kernel_context_pin(guc, engine->kernel_context);
>>> +}
>>> +
>>>    static void guc_release(struct intel_engine_cs *engine)
>>>    {
>>>    	engine->sanitize = NULL; /* no longer in control, nothing to sanitize */
>>> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>>>    void intel_guc_submission_enable(struct intel_guc *guc)
>>>    {
>>> +	guc_init_lrc_mapping(guc);
>>>    }
>>>    void intel_guc_submission_disable(struct intel_guc *guc)
>>> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct intel_guc *guc)
>>>    {
>>>    	guc->submission_selected = __guc_submission_selected(guc);
>>>    }
>>> +
>>> +static inline struct intel_context *
>>> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>>> +{
>>> +	struct intel_context *ce;
>>> +
>>> +	if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
>>> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
>>> +			"Invalid desc_idx %u", desc_idx);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	ce = __get_context(guc, desc_idx);
>>> +	if (unlikely(!ce)) {
>>> +		drm_dbg(&guc_to_gt(guc)->i915->drm,
>>> +			"Context is NULL, desc_idx %u", desc_idx);
>>> +		return NULL;
>>> +	}
>>> +
>>> +	return ce;
>>> +}
>>> +
>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>> +					  const u32 *msg,
>>> +					  u32 len)
>>> +{
>>> +	struct intel_context *ce;
>>> +	u32 desc_idx = msg[0];
>>> +
>>> +	if (unlikely(len < 1)) {
>>> +		drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
>>> +		return -EPROTO;
>>> +	}
>>> +
>>> +	ce = g2h_context_lookup(guc, desc_idx);
>>> +	if (unlikely(!ce))
>>> +		return -EPROTO;
>>> +
>>> +	if (context_wait_for_deregister_to_register(ce)) {
>>> +		struct intel_runtime_pm *runtime_pm =
>>> +			&ce->engine->gt->i915->runtime_pm;
>>> +		intel_wakeref_t wakeref;
>>> +
>>> +		/*
>>> +		 * Previous owner of this guc_id has been deregistered, now safe
>>> +		 * register this context.
>>> +		 */
>>> +		with_intel_runtime_pm(runtime_pm, wakeref)
>>> +			register_context(ce);
>>> +		clr_context_wait_for_deregister_to_register(ce);
>>> +		intel_context_put(ce);
>>> +	} else if (context_destroyed(ce)) {
>>> +		/* Context has been destroyed */
>>> +		release_guc_id(guc, ce);
>>> +		lrc_destroy(&ce->ref);
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>>> index 943fe485c662..204c95c39353 100644
>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>> @@ -4142,6 +4142,7 @@ enum {
>>>    	FAULT_AND_CONTINUE /* Unsupported */
>>>    };
>>> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>>>    #define GEN8_CTX_VALID (1 << 0)
>>>    #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>>>    #define GEN8_CTX_FORCE_RESTORE (1 << 2)
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 86b4c9f2613d..b48c4905d3fc 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
>>>    	 */
>>>    	if (!list_empty(&rq->sched.link))
>>>    		remove_from_engine(rq);
>>> +	atomic_dec(&rq->context->guc_id_ref);
>> Does this work/make sense  if GuC is disabled?
>>
> It doesn't matter if the GuC is disabled as this var isn't used if it
> is. A following patch adds a vfunc here and pushes this in the GuC
> backend.
>
> Matt
>
>> Daniele
>>
>>>    	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>>>    	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 33/51] drm/i915/guc: Provide mmio list to be saved/restored on engine reset
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-22  4:47     ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22  4:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: daniele.ceraolospurio, john.c.harrison

On Fri, Jul 16, 2021 at 01:17:06PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The driver must provide GuC with a list of mmio registers
> that should be saved/restored during a GuC-based engine reset.
> Unfortunately, the list must be dynamically allocated as its size is
> variable. That means the driver must generate the list twice - once to
> work out the size and a second time to actually save it.
> 
> v2:
>  (Alan / CI)
>   - GEN7_GT_MODE -> GEN6_GT_MODE to fix WA selftest failure
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Everything looks structurally correct. Feel confident on my below RB but
W/A are not my area of expertise. If any one else wanted to give it a
look, I wouldn't mind.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>


> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c   |  46 ++--
>  .../gpu/drm/i915/gt/intel_workarounds_types.h |   1 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   1 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 199 +++++++++++++++++-
>  drivers/gpu/drm/i915/i915_reg.h               |   1 +
>  5 files changed, 222 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 72562c233ad2..34738ccab8bd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -150,13 +150,14 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
>  }
>  
>  static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
> -		   u32 clear, u32 set, u32 read_mask)
> +		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
>  {
>  	struct i915_wa wa = {
>  		.reg  = reg,
>  		.clr  = clear,
>  		.set  = set,
>  		.read = read_mask,
> +		.masked_reg = masked_reg,
>  	};
>  
>  	_wa_add(wal, &wa);
> @@ -165,7 +166,7 @@ static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
>  static void
>  wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set)
>  {
> -	wa_add(wal, reg, clear, set, clear);
> +	wa_add(wal, reg, clear, set, clear, false);
>  }
>  
>  static void
> @@ -200,20 +201,20 @@ wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr)
>  static void
>  wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
>  {
> -	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val);
> +	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true);
>  }
>  
>  static void
>  wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
>  {
> -	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val);
> +	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true);
>  }
>  
>  static void
>  wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg,
>  		    u32 mask, u32 val)
>  {
> -	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask);
> +	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true);
>  }
>  
>  static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
> @@ -583,10 +584,10 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs *engine,
>  			     GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC);
>  
>  	/* WaEnableFloatBlendOptimization:icl */
> -	wa_write_clr_set(wal,
> -			 GEN10_CACHE_MODE_SS,
> -			 0, /* write-only, so skip validation */
> -			 _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE));
> +	wa_add(wal, GEN10_CACHE_MODE_SS, 0,
> +	       _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE),
> +	       0 /* write-only, so skip validation */,
> +	       true);
>  
>  	/* WaDisableGPGPUMidThreadPreemption:icl */
>  	wa_masked_field_set(wal, GEN8_CS_CHICKEN1,
> @@ -631,7 +632,7 @@ static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine,
>  	       FF_MODE2,
>  	       FF_MODE2_TDS_TIMER_MASK,
>  	       FF_MODE2_TDS_TIMER_128,
> -	       0);
> +	       0, false);
>  }
>  
>  static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
> @@ -669,7 +670,7 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
>  	       FF_MODE2,
>  	       FF_MODE2_GS_TIMER_MASK,
>  	       FF_MODE2_GS_TIMER_224,
> -	       0);
> +	       0, false);
>  
>  	/*
>  	 * Wa_14012131227:dg1
> @@ -847,7 +848,7 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
>  	wa_add(wal,
>  	       HSW_ROW_CHICKEN3, 0,
>  	       _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE),
> -		0 /* XXX does this reg exist? */);
> +	       0 /* XXX does this reg exist? */, true);
>  
>  	/* WaVSRefCountFullforceMissDisable:hsw */
>  	wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
> @@ -1937,10 +1938,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		 * disable bit, which we don't touch here, but it's good
>  		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
>  		 */
> -		wa_add(wal, GEN7_GT_MODE, 0,
> -		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK,
> -				     GEN6_WIZ_HASHING_16x4),
> -		       GEN6_WIZ_HASHING_16x4);
> +		wa_masked_field_set(wal,
> +				    GEN7_GT_MODE,
> +				    GEN6_WIZ_HASHING_MASK,
> +				    GEN6_WIZ_HASHING_16x4);
>  	}
>  
>  	if (IS_GRAPHICS_VER(i915, 6, 7))
> @@ -1990,10 +1991,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		 * disable bit, which we don't touch here, but it's good
>  		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
>  		 */
> -		wa_add(wal,
> -		       GEN6_GT_MODE, 0,
> -		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4),
> -		       GEN6_WIZ_HASHING_16x4);
> +		wa_masked_field_set(wal,
> +				    GEN6_GT_MODE,
> +				    GEN6_WIZ_HASHING_MASK,
> +				    GEN6_WIZ_HASHING_16x4);
>  
>  		/* WaDisable_RenderCache_OperationalFlush:snb */
>  		wa_masked_dis(wal, CACHE_MODE_0, RC_OP_FLUSH_ENABLE);
> @@ -2014,7 +2015,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		wa_add(wal, MI_MODE,
>  		       0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH),
>  		       /* XXX bit doesn't stick on Broadwater */
> -		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH);
> +		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH, true);
>  
>  	if (GRAPHICS_VER(i915) == 4)
>  		/*
> @@ -2029,7 +2030,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		 */
>  		wa_add(wal, ECOSKPD,
>  		       0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE),
> -		       0 /* XXX bit doesn't stick on Broadwater */);
> +		       0 /* XXX bit doesn't stick on Broadwater */,
> +		       true);
>  }
>  
>  static void
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
> index c214111ea367..1e873681795d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
> @@ -15,6 +15,7 @@ struct i915_wa {
>  	u32		clr;
>  	u32		set;
>  	u32		read;
> +	bool		masked_reg;
>  };
>  
>  struct i915_wa_list {
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 7f14e1873010..3897abb59dba 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -59,6 +59,7 @@ struct intel_guc {
>  
>  	struct i915_vma *ads_vma;
>  	struct __guc_ads_blob *ads_blob;
> +	u32 ads_regset_size;
>  
>  	struct i915_vma *lrc_desc_pool;
>  	void *lrc_desc_pool_vaddr;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index b82145652d57..9fd3c911f5fb 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -3,6 +3,8 @@
>   * Copyright © 2014-2019 Intel Corporation
>   */
>  
> +#include <linux/bsearch.h>
> +
>  #include "gt/intel_gt.h"
>  #include "gt/intel_lrc.h"
>  #include "intel_guc_ads.h"
> @@ -23,7 +25,12 @@
>   *      | guc_policies                          |
>   *      +---------------------------------------+
>   *      | guc_gt_system_info                    |
> - *      +---------------------------------------+
> + *      +---------------------------------------+ <== static
> + *      | guc_mmio_reg[countA] (engine 0.0)     |
> + *      | guc_mmio_reg[countB] (engine 0.1)     |
> + *      | guc_mmio_reg[countC] (engine 1.0)     |
> + *      |   ...                                 |
> + *      +---------------------------------------+ <== dynamic
>   *      | padding                               |
>   *      +---------------------------------------+ <== 4K aligned
>   *      | private data                          |
> @@ -35,16 +42,33 @@ struct __guc_ads_blob {
>  	struct guc_ads ads;
>  	struct guc_policies policies;
>  	struct guc_gt_system_info system_info;
> +	/* From here on, location is dynamic! Refer to above diagram. */
> +	struct guc_mmio_reg regset[0];
>  } __packed;
>  
> +static u32 guc_ads_regset_size(struct intel_guc *guc)
> +{
> +	GEM_BUG_ON(!guc->ads_regset_size);
> +	return guc->ads_regset_size;
> +}
> +
>  static u32 guc_ads_private_data_size(struct intel_guc *guc)
>  {
>  	return PAGE_ALIGN(guc->fw.private_data_size);
>  }
>  
> +static u32 guc_ads_regset_offset(struct intel_guc *guc)
> +{
> +	return offsetof(struct __guc_ads_blob, regset);
> +}
> +
>  static u32 guc_ads_private_data_offset(struct intel_guc *guc)
>  {
> -	return PAGE_ALIGN(sizeof(struct __guc_ads_blob));
> +	u32 offset;
> +
> +	offset = guc_ads_regset_offset(guc) +
> +		 guc_ads_regset_size(guc);
> +	return PAGE_ALIGN(offset);
>  }
>  
>  static u32 guc_ads_blob_size(struct intel_guc *guc)
> @@ -83,6 +107,165 @@ static void guc_mapping_table_init(struct intel_gt *gt,
>  	}
>  }
>  
> +/*
> + * The save/restore register list must be pre-calculated to a temporary
> + * buffer of driver defined size before it can be generated in place
> + * inside the ADS.
> + */
> +#define MAX_MMIO_REGS	128	/* Arbitrary size, increase as needed */
> +struct temp_regset {
> +	struct guc_mmio_reg *registers;
> +	u32 used;
> +	u32 size;
> +};
> +
> +static int guc_mmio_reg_cmp(const void *a, const void *b)
> +{
> +	const struct guc_mmio_reg *ra = a;
> +	const struct guc_mmio_reg *rb = b;
> +
> +	return (int)ra->offset - (int)rb->offset;
> +}
> +
> +static void guc_mmio_reg_add(struct temp_regset *regset,
> +			     u32 offset, u32 flags)
> +{
> +	u32 count = regset->used;
> +	struct guc_mmio_reg reg = {
> +		.offset = offset,
> +		.flags = flags,
> +	};
> +	struct guc_mmio_reg *slot;
> +
> +	GEM_BUG_ON(count >= regset->size);
> +
> +	/*
> +	 * The mmio list is built using separate lists within the driver.
> +	 * It's possible that at some point we may attempt to add the same
> +	 * register more than once. Do not consider this an error; silently
> +	 * move on if the register is already in the list.
> +	 */
> +	if (bsearch(&reg, regset->registers, count,
> +		    sizeof(reg), guc_mmio_reg_cmp))
> +		return;
> +
> +	slot = &regset->registers[count];
> +	regset->used++;
> +	*slot = reg;
> +
> +	while (slot-- > regset->registers) {
> +		GEM_BUG_ON(slot[0].offset == slot[1].offset);
> +		if (slot[1].offset > slot[0].offset)
> +			break;
> +
> +		swap(slot[1], slot[0]);
> +	}
> +}
> +
> +#define GUC_MMIO_REG_ADD(regset, reg, masked) \
> +	guc_mmio_reg_add(regset, \
> +			 i915_mmio_reg_offset((reg)), \
> +			 (masked) ? GUC_REGSET_MASKED : 0)
> +
> +static void guc_mmio_regset_init(struct temp_regset *regset,
> +				 struct intel_engine_cs *engine)
> +{
> +	const u32 base = engine->mmio_base;
> +	struct i915_wa_list *wal = &engine->wa_list;
> +	struct i915_wa *wa;
> +	unsigned int i;
> +
> +	regset->used = 0;
> +
> +	GUC_MMIO_REG_ADD(regset, RING_MODE_GEN7(base), true);
> +	GUC_MMIO_REG_ADD(regset, RING_HWS_PGA(base), false);
> +	GUC_MMIO_REG_ADD(regset, RING_IMR(base), false);
> +
> +	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
> +		GUC_MMIO_REG_ADD(regset, wa->reg, wa->masked_reg);
> +
> +	/* Be extra paranoid and include all whitelist registers. */
> +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
> +		GUC_MMIO_REG_ADD(regset,
> +				 RING_FORCE_TO_NONPRIV(base, i),
> +				 false);
> +
> +	/* add in local MOCS registers */
> +	for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
> +		GUC_MMIO_REG_ADD(regset, GEN9_LNCFCMOCS(i), false);
> +}
> +
> +static int guc_mmio_reg_state_query(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	struct temp_regset temp_set;
> +	u32 total;
> +
> +	/*
> +	 * Need to actually build the list in order to filter out
> +	 * duplicates and other such data dependent constructions.
> +	 */
> +	temp_set.size = MAX_MMIO_REGS;
> +	temp_set.registers = kmalloc_array(temp_set.size,
> +					  sizeof(*temp_set.registers),
> +					  GFP_KERNEL);
> +	if (!temp_set.registers)
> +		return -ENOMEM;
> +
> +	total = 0;
> +	for_each_engine(engine, gt, id) {
> +		guc_mmio_regset_init(&temp_set, engine);
> +		total += temp_set.used;
> +	}
> +
> +	kfree(temp_set.registers);
> +
> +	return total * sizeof(struct guc_mmio_reg);
> +}
> +
> +static void guc_mmio_reg_state_init(struct intel_guc *guc,
> +				    struct __guc_ads_blob *blob)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	struct temp_regset temp_set;
> +	struct guc_mmio_reg_set *ads_reg_set;
> +	u32 addr_ggtt, offset;
> +	u8 guc_class;
> +
> +	offset = guc_ads_regset_offset(guc);
> +	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> +	temp_set.registers = (struct guc_mmio_reg *) (((u8 *) blob) + offset);
> +	temp_set.size = guc->ads_regset_size / sizeof(temp_set.registers[0]);
> +
> +	for_each_engine(engine, gt, id) {
> +		/* Class index is checked in class converter */
> +		GEM_BUG_ON(engine->instance >= GUC_MAX_INSTANCES_PER_CLASS);
> +
> +		guc_class = engine_class_to_guc_class(engine->class);
> +		ads_reg_set = &blob->ads.reg_state_list[guc_class][engine->instance];
> +
> +		guc_mmio_regset_init(&temp_set, engine);
> +		if (!temp_set.used) {
> +			ads_reg_set->address = 0;
> +			ads_reg_set->count = 0;
> +			continue;
> +		}
> +
> +		ads_reg_set->address = addr_ggtt;
> +		ads_reg_set->count = temp_set.used;
> +
> +		temp_set.size -= temp_set.used;
> +		temp_set.registers += temp_set.used;
> +		addr_ggtt += temp_set.used * sizeof(struct guc_mmio_reg);
> +	}
> +
> +	GEM_BUG_ON(temp_set.size);
> +}
> +
>  /*
>   * The first 80 dwords of the register state context, containing the
>   * execlists and ppgtt registers.
> @@ -121,8 +304,7 @@ static void __guc_ads_init(struct intel_guc *guc)
>  		 */
>  		blob->ads.golden_context_lrca[guc_class] = 0;
>  		blob->ads.eng_state_size[guc_class] =
> -			intel_engine_context_size(guc_to_gt(guc),
> -						  engine_class) -
> +			intel_engine_context_size(gt, engine_class) -
>  			skipped_size;
>  	}
>  
> @@ -153,6 +335,9 @@ static void __guc_ads_init(struct intel_guc *guc)
>  	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
>  	blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
>  
> +	/* MMIO save/restore list */
> +	guc_mmio_reg_state_init(guc, blob);
> +
>  	/* Private Data */
>  	blob->ads.private_data = base + guc_ads_private_data_offset(guc);
>  
> @@ -173,6 +358,12 @@ int intel_guc_ads_create(struct intel_guc *guc)
>  
>  	GEM_BUG_ON(guc->ads_vma);
>  
> +	/* Need to calculate the reg state size dynamically: */
> +	ret = guc_mmio_reg_state_query(guc);
> +	if (ret < 0)
> +		return ret;
> +	guc->ads_regset_size = ret;
> +
>  	size = guc_ads_blob_size(guc);
>  
>  	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 204c95c39353..3584e4d03dc3 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -12309,6 +12309,7 @@ enum skl_power_gate {
>  
>  /* MOCS (Memory Object Control State) registers */
>  #define GEN9_LNCFCMOCS(i)	_MMIO(0xb020 + (i) * 4)	/* L3 Cache Control */
> +#define GEN9_LNCFCMOCS_REG_COUNT	32
>  
>  #define __GEN9_RCS0_MOCS0	0xc800
>  #define GEN9_GFX_MOCS(i)	_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 33/51] drm/i915/guc: Provide mmio list to be saved/restored on engine reset
@ 2021-07-22  4:47     ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22  4:47 UTC (permalink / raw)
  To: intel-gfx, dri-devel

On Fri, Jul 16, 2021 at 01:17:06PM -0700, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The driver must provide GuC with a list of mmio registers
> that should be saved/restored during a GuC-based engine reset.
> Unfortunately, the list must be dynamically allocated as its size is
> variable. That means the driver must generate the list twice - once to
> work out the size and a second time to actually save it.
> 
> v2:
>  (Alan / CI)
>   - GEN7_GT_MODE -> GEN6_GT_MODE to fix WA selftest failure
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>

Everything looks structurally correct. Feel confident on my below RB but
W/A are not my area of expertise. If any one else wanted to give it a
look, I wouldn't mind.

With that:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>


> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_workarounds.c   |  46 ++--
>  .../gpu/drm/i915/gt/intel_workarounds_types.h |   1 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   1 +
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c    | 199 +++++++++++++++++-
>  drivers/gpu/drm/i915/i915_reg.h               |   1 +
>  5 files changed, 222 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> index 72562c233ad2..34738ccab8bd 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
> @@ -150,13 +150,14 @@ static void _wa_add(struct i915_wa_list *wal, const struct i915_wa *wa)
>  }
>  
>  static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
> -		   u32 clear, u32 set, u32 read_mask)
> +		   u32 clear, u32 set, u32 read_mask, bool masked_reg)
>  {
>  	struct i915_wa wa = {
>  		.reg  = reg,
>  		.clr  = clear,
>  		.set  = set,
>  		.read = read_mask,
> +		.masked_reg = masked_reg,
>  	};
>  
>  	_wa_add(wal, &wa);
> @@ -165,7 +166,7 @@ static void wa_add(struct i915_wa_list *wal, i915_reg_t reg,
>  static void
>  wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, u32 set)
>  {
> -	wa_add(wal, reg, clear, set, clear);
> +	wa_add(wal, reg, clear, set, clear, false);
>  }
>  
>  static void
> @@ -200,20 +201,20 @@ wa_write_clr(struct i915_wa_list *wal, i915_reg_t reg, u32 clr)
>  static void
>  wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
>  {
> -	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val);
> +	wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val, true);
>  }
>  
>  static void
>  wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
>  {
> -	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val);
> +	wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val, true);
>  }
>  
>  static void
>  wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg,
>  		    u32 mask, u32 val)
>  {
> -	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask);
> +	wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask, true);
>  }
>  
>  static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
> @@ -583,10 +584,10 @@ static void icl_ctx_workarounds_init(struct intel_engine_cs *engine,
>  			     GEN11_BLEND_EMB_FIX_DISABLE_IN_RCC);
>  
>  	/* WaEnableFloatBlendOptimization:icl */
> -	wa_write_clr_set(wal,
> -			 GEN10_CACHE_MODE_SS,
> -			 0, /* write-only, so skip validation */
> -			 _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE));
> +	wa_add(wal, GEN10_CACHE_MODE_SS, 0,
> +	       _MASKED_BIT_ENABLE(FLOAT_BLEND_OPTIMIZATION_ENABLE),
> +	       0 /* write-only, so skip validation */,
> +	       true);
>  
>  	/* WaDisableGPGPUMidThreadPreemption:icl */
>  	wa_masked_field_set(wal, GEN8_CS_CHICKEN1,
> @@ -631,7 +632,7 @@ static void gen12_ctx_gt_tuning_init(struct intel_engine_cs *engine,
>  	       FF_MODE2,
>  	       FF_MODE2_TDS_TIMER_MASK,
>  	       FF_MODE2_TDS_TIMER_128,
> -	       0);
> +	       0, false);
>  }
>  
>  static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
> @@ -669,7 +670,7 @@ static void gen12_ctx_workarounds_init(struct intel_engine_cs *engine,
>  	       FF_MODE2,
>  	       FF_MODE2_GS_TIMER_MASK,
>  	       FF_MODE2_GS_TIMER_224,
> -	       0);
> +	       0, false);
>  
>  	/*
>  	 * Wa_14012131227:dg1
> @@ -847,7 +848,7 @@ hsw_gt_workarounds_init(struct drm_i915_private *i915, struct i915_wa_list *wal)
>  	wa_add(wal,
>  	       HSW_ROW_CHICKEN3, 0,
>  	       _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE),
> -		0 /* XXX does this reg exist? */);
> +	       0 /* XXX does this reg exist? */, true);
>  
>  	/* WaVSRefCountFullforceMissDisable:hsw */
>  	wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
> @@ -1937,10 +1938,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		 * disable bit, which we don't touch here, but it's good
>  		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
>  		 */
> -		wa_add(wal, GEN7_GT_MODE, 0,
> -		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK,
> -				     GEN6_WIZ_HASHING_16x4),
> -		       GEN6_WIZ_HASHING_16x4);
> +		wa_masked_field_set(wal,
> +				    GEN7_GT_MODE,
> +				    GEN6_WIZ_HASHING_MASK,
> +				    GEN6_WIZ_HASHING_16x4);
>  	}
>  
>  	if (IS_GRAPHICS_VER(i915, 6, 7))
> @@ -1990,10 +1991,10 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		 * disable bit, which we don't touch here, but it's good
>  		 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
>  		 */
> -		wa_add(wal,
> -		       GEN6_GT_MODE, 0,
> -		       _MASKED_FIELD(GEN6_WIZ_HASHING_MASK, GEN6_WIZ_HASHING_16x4),
> -		       GEN6_WIZ_HASHING_16x4);
> +		wa_masked_field_set(wal,
> +				    GEN6_GT_MODE,
> +				    GEN6_WIZ_HASHING_MASK,
> +				    GEN6_WIZ_HASHING_16x4);
>  
>  		/* WaDisable_RenderCache_OperationalFlush:snb */
>  		wa_masked_dis(wal, CACHE_MODE_0, RC_OP_FLUSH_ENABLE);
> @@ -2014,7 +2015,7 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		wa_add(wal, MI_MODE,
>  		       0, _MASKED_BIT_ENABLE(VS_TIMER_DISPATCH),
>  		       /* XXX bit doesn't stick on Broadwater */
> -		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH);
> +		       IS_I965G(i915) ? 0 : VS_TIMER_DISPATCH, true);
>  
>  	if (GRAPHICS_VER(i915) == 4)
>  		/*
> @@ -2029,7 +2030,8 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
>  		 */
>  		wa_add(wal, ECOSKPD,
>  		       0, _MASKED_BIT_ENABLE(ECO_CONSTANT_BUFFER_SR_DISABLE),
> -		       0 /* XXX bit doesn't stick on Broadwater */);
> +		       0 /* XXX bit doesn't stick on Broadwater */,
> +		       true);
>  }
>  
>  static void
> diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
> index c214111ea367..1e873681795d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_workarounds_types.h
> @@ -15,6 +15,7 @@ struct i915_wa {
>  	u32		clr;
>  	u32		set;
>  	u32		read;
> +	bool		masked_reg;
>  };
>  
>  struct i915_wa_list {
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> index 7f14e1873010..3897abb59dba 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> @@ -59,6 +59,7 @@ struct intel_guc {
>  
>  	struct i915_vma *ads_vma;
>  	struct __guc_ads_blob *ads_blob;
> +	u32 ads_regset_size;
>  
>  	struct i915_vma *lrc_desc_pool;
>  	void *lrc_desc_pool_vaddr;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index b82145652d57..9fd3c911f5fb 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -3,6 +3,8 @@
>   * Copyright © 2014-2019 Intel Corporation
>   */
>  
> +#include <linux/bsearch.h>
> +
>  #include "gt/intel_gt.h"
>  #include "gt/intel_lrc.h"
>  #include "intel_guc_ads.h"
> @@ -23,7 +25,12 @@
>   *      | guc_policies                          |
>   *      +---------------------------------------+
>   *      | guc_gt_system_info                    |
> - *      +---------------------------------------+
> + *      +---------------------------------------+ <== static
> + *      | guc_mmio_reg[countA] (engine 0.0)     |
> + *      | guc_mmio_reg[countB] (engine 0.1)     |
> + *      | guc_mmio_reg[countC] (engine 1.0)     |
> + *      |   ...                                 |
> + *      +---------------------------------------+ <== dynamic
>   *      | padding                               |
>   *      +---------------------------------------+ <== 4K aligned
>   *      | private data                          |
> @@ -35,16 +42,33 @@ struct __guc_ads_blob {
>  	struct guc_ads ads;
>  	struct guc_policies policies;
>  	struct guc_gt_system_info system_info;
> +	/* From here on, location is dynamic! Refer to above diagram. */
> +	struct guc_mmio_reg regset[0];
>  } __packed;
>  
> +static u32 guc_ads_regset_size(struct intel_guc *guc)
> +{
> +	GEM_BUG_ON(!guc->ads_regset_size);
> +	return guc->ads_regset_size;
> +}
> +
>  static u32 guc_ads_private_data_size(struct intel_guc *guc)
>  {
>  	return PAGE_ALIGN(guc->fw.private_data_size);
>  }
>  
> +static u32 guc_ads_regset_offset(struct intel_guc *guc)
> +{
> +	return offsetof(struct __guc_ads_blob, regset);
> +}
> +
>  static u32 guc_ads_private_data_offset(struct intel_guc *guc)
>  {
> -	return PAGE_ALIGN(sizeof(struct __guc_ads_blob));
> +	u32 offset;
> +
> +	offset = guc_ads_regset_offset(guc) +
> +		 guc_ads_regset_size(guc);
> +	return PAGE_ALIGN(offset);
>  }
>  
>  static u32 guc_ads_blob_size(struct intel_guc *guc)
> @@ -83,6 +107,165 @@ static void guc_mapping_table_init(struct intel_gt *gt,
>  	}
>  }
>  
> +/*
> + * The save/restore register list must be pre-calculated to a temporary
> + * buffer of driver defined size before it can be generated in place
> + * inside the ADS.
> + */
> +#define MAX_MMIO_REGS	128	/* Arbitrary size, increase as needed */
> +struct temp_regset {
> +	struct guc_mmio_reg *registers;
> +	u32 used;
> +	u32 size;
> +};
> +
> +static int guc_mmio_reg_cmp(const void *a, const void *b)
> +{
> +	const struct guc_mmio_reg *ra = a;
> +	const struct guc_mmio_reg *rb = b;
> +
> +	return (int)ra->offset - (int)rb->offset;
> +}
> +
> +static void guc_mmio_reg_add(struct temp_regset *regset,
> +			     u32 offset, u32 flags)
> +{
> +	u32 count = regset->used;
> +	struct guc_mmio_reg reg = {
> +		.offset = offset,
> +		.flags = flags,
> +	};
> +	struct guc_mmio_reg *slot;
> +
> +	GEM_BUG_ON(count >= regset->size);
> +
> +	/*
> +	 * The mmio list is built using separate lists within the driver.
> +	 * It's possible that at some point we may attempt to add the same
> +	 * register more than once. Do not consider this an error; silently
> +	 * move on if the register is already in the list.
> +	 */
> +	if (bsearch(&reg, regset->registers, count,
> +		    sizeof(reg), guc_mmio_reg_cmp))
> +		return;
> +
> +	slot = &regset->registers[count];
> +	regset->used++;
> +	*slot = reg;
> +
> +	while (slot-- > regset->registers) {
> +		GEM_BUG_ON(slot[0].offset == slot[1].offset);
> +		if (slot[1].offset > slot[0].offset)
> +			break;
> +
> +		swap(slot[1], slot[0]);
> +	}
> +}
> +
> +#define GUC_MMIO_REG_ADD(regset, reg, masked) \
> +	guc_mmio_reg_add(regset, \
> +			 i915_mmio_reg_offset((reg)), \
> +			 (masked) ? GUC_REGSET_MASKED : 0)
> +
> +static void guc_mmio_regset_init(struct temp_regset *regset,
> +				 struct intel_engine_cs *engine)
> +{
> +	const u32 base = engine->mmio_base;
> +	struct i915_wa_list *wal = &engine->wa_list;
> +	struct i915_wa *wa;
> +	unsigned int i;
> +
> +	regset->used = 0;
> +
> +	GUC_MMIO_REG_ADD(regset, RING_MODE_GEN7(base), true);
> +	GUC_MMIO_REG_ADD(regset, RING_HWS_PGA(base), false);
> +	GUC_MMIO_REG_ADD(regset, RING_IMR(base), false);
> +
> +	for (i = 0, wa = wal->list; i < wal->count; i++, wa++)
> +		GUC_MMIO_REG_ADD(regset, wa->reg, wa->masked_reg);
> +
> +	/* Be extra paranoid and include all whitelist registers. */
> +	for (i = 0; i < RING_MAX_NONPRIV_SLOTS; i++)
> +		GUC_MMIO_REG_ADD(regset,
> +				 RING_FORCE_TO_NONPRIV(base, i),
> +				 false);
> +
> +	/* add in local MOCS registers */
> +	for (i = 0; i < GEN9_LNCFCMOCS_REG_COUNT; i++)
> +		GUC_MMIO_REG_ADD(regset, GEN9_LNCFCMOCS(i), false);
> +}
> +
> +static int guc_mmio_reg_state_query(struct intel_guc *guc)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	struct temp_regset temp_set;
> +	u32 total;
> +
> +	/*
> +	 * Need to actually build the list in order to filter out
> +	 * duplicates and other such data dependent constructions.
> +	 */
> +	temp_set.size = MAX_MMIO_REGS;
> +	temp_set.registers = kmalloc_array(temp_set.size,
> +					  sizeof(*temp_set.registers),
> +					  GFP_KERNEL);
> +	if (!temp_set.registers)
> +		return -ENOMEM;
> +
> +	total = 0;
> +	for_each_engine(engine, gt, id) {
> +		guc_mmio_regset_init(&temp_set, engine);
> +		total += temp_set.used;
> +	}
> +
> +	kfree(temp_set.registers);
> +
> +	return total * sizeof(struct guc_mmio_reg);
> +}
> +
> +static void guc_mmio_reg_state_init(struct intel_guc *guc,
> +				    struct __guc_ads_blob *blob)
> +{
> +	struct intel_gt *gt = guc_to_gt(guc);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	struct temp_regset temp_set;
> +	struct guc_mmio_reg_set *ads_reg_set;
> +	u32 addr_ggtt, offset;
> +	u8 guc_class;
> +
> +	offset = guc_ads_regset_offset(guc);
> +	addr_ggtt = intel_guc_ggtt_offset(guc, guc->ads_vma) + offset;
> +	temp_set.registers = (struct guc_mmio_reg *) (((u8 *) blob) + offset);
> +	temp_set.size = guc->ads_regset_size / sizeof(temp_set.registers[0]);
> +
> +	for_each_engine(engine, gt, id) {
> +		/* Class index is checked in class converter */
> +		GEM_BUG_ON(engine->instance >= GUC_MAX_INSTANCES_PER_CLASS);
> +
> +		guc_class = engine_class_to_guc_class(engine->class);
> +		ads_reg_set = &blob->ads.reg_state_list[guc_class][engine->instance];
> +
> +		guc_mmio_regset_init(&temp_set, engine);
> +		if (!temp_set.used) {
> +			ads_reg_set->address = 0;
> +			ads_reg_set->count = 0;
> +			continue;
> +		}
> +
> +		ads_reg_set->address = addr_ggtt;
> +		ads_reg_set->count = temp_set.used;
> +
> +		temp_set.size -= temp_set.used;
> +		temp_set.registers += temp_set.used;
> +		addr_ggtt += temp_set.used * sizeof(struct guc_mmio_reg);
> +	}
> +
> +	GEM_BUG_ON(temp_set.size);
> +}
> +
>  /*
>   * The first 80 dwords of the register state context, containing the
>   * execlists and ppgtt registers.
> @@ -121,8 +304,7 @@ static void __guc_ads_init(struct intel_guc *guc)
>  		 */
>  		blob->ads.golden_context_lrca[guc_class] = 0;
>  		blob->ads.eng_state_size[guc_class] =
> -			intel_engine_context_size(guc_to_gt(guc),
> -						  engine_class) -
> +			intel_engine_context_size(gt, engine_class) -
>  			skipped_size;
>  	}
>  
> @@ -153,6 +335,9 @@ static void __guc_ads_init(struct intel_guc *guc)
>  	blob->ads.scheduler_policies = base + ptr_offset(blob, policies);
>  	blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
>  
> +	/* MMIO save/restore list */
> +	guc_mmio_reg_state_init(guc, blob);
> +
>  	/* Private Data */
>  	blob->ads.private_data = base + guc_ads_private_data_offset(guc);
>  
> @@ -173,6 +358,12 @@ int intel_guc_ads_create(struct intel_guc *guc)
>  
>  	GEM_BUG_ON(guc->ads_vma);
>  
> +	/* Need to calculate the reg state size dynamically: */
> +	ret = guc_mmio_reg_state_query(guc);
> +	if (ret < 0)
> +		return ret;
> +	guc->ads_regset_size = ret;
> +
>  	size = guc_ads_blob_size(guc);
>  
>  	ret = intel_guc_allocate_and_map_vma(guc, size, &guc->ads_vma,
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 204c95c39353..3584e4d03dc3 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -12309,6 +12309,7 @@ enum skl_power_gate {
>  
>  /* MOCS (Memory Object Control State) registers */
>  #define GEN9_LNCFCMOCS(i)	_MMIO(0xb020 + (i) * 4)	/* L3 Cache Control */
> +#define GEN9_LNCFCMOCS_REG_COUNT	32
>  
>  #define __GEN9_RCS0_MOCS0	0xc800
>  #define GEN9_GFX_MOCS(i)	_MMIO(__GEN9_RCS0_MOCS0 + (i) * 4)
> -- 
> 2.28.0
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
  2021-07-21 23:51         ` [Intel-gfx] " Daniele Ceraolo Spurio
@ 2021-07-22  7:57           ` Michal Wajdeczko
  -1 siblings, 0 replies; 221+ messages in thread
From: Michal Wajdeczko @ 2021-07-22  7:57 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Matthew Brost; +Cc: intel-gfx, dri-devel



On 22.07.2021 01:51, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/19/2021 9:04 PM, Matthew Brost wrote:
>> On Mon, Jul 19, 2021 at 05:51:46PM -0700, Daniele Ceraolo Spurio wrote:
>>>
>>> On 7/16/2021 1:16 PM, Matthew Brost wrote:
>>>> Implement GuC context operations which includes GuC specific operations
>>>> alloc, pin, unpin, and destroy.
>>>>
>>>> v2:
>>>>    (Daniel Vetter)
>>>>     - Use msleep_interruptible rather than cond_resched in busy loop
>>>>    (Michal)
>>>>     - Remove C++ style comment
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>>>>    drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>>>>    drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
>>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666
>>>> ++++++++++++++++--
>>>>    drivers/gpu/drm/i915/i915_reg.h               |   1 +
>>>>    drivers/gpu/drm/i915/i915_request.c           |   1 +
>>>>    8 files changed, 685 insertions(+), 55 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c
>>>> b/drivers/gpu/drm/i915/gt/intel_context.c
>>>> index bd63813c8a80..32fd6647154b 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
>>>> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce,
>>>> struct intel_engine_cs *engine)
>>>>        mutex_init(&ce->pin_mutex);
>>>> +    spin_lock_init(&ce->guc_state.lock);
>>>> +
>>>> +    ce->guc_id = GUC_INVALID_LRC_ID;
>>>> +    INIT_LIST_HEAD(&ce->guc_id_link);
>>>> +
>>>>        i915_active_init(&ce->active,
>>>>                 __intel_context_active, __intel_context_retire, 0);
>>>>    }
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h
>>>> b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>>> index 6d99631d19b9..606c480aec26 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>>> @@ -96,6 +96,7 @@ struct intel_context {
>>>>    #define CONTEXT_BANNED            6
>>>>    #define CONTEXT_FORCE_SINGLE_SUBMISSION    7
>>>>    #define CONTEXT_NOPREEMPT        8
>>>> +#define CONTEXT_LRCA_DIRTY        9
>>>>        struct {
>>>>            u64 timeout_us;
>>>> @@ -138,14 +139,29 @@ struct intel_context {
>>>>        u8 wa_bb_page; /* if set, page num reserved for context
>>>> workarounds */
>>>> +    struct {
>>>> +        /** lock: protects everything in guc_state */
>>>> +        spinlock_t lock;
>>>> +        /**
>>>> +         * sched_state: scheduling state of this context using GuC
>>>> +         * submission
>>>> +         */
>>>> +        u8 sched_state;
>>>> +    } guc_state;
>>>> +
>>>>        /* GuC scheduling state flags that do not require a lock. */
>>>>        atomic_t guc_sched_state_no_lock;
>>>> +    /* GuC LRC descriptor ID */
>>>> +    u16 guc_id;
>>>> +
>>>> +    /* GuC LRC descriptor reference count */
>>>> +    atomic_t guc_id_ref;
>>>> +
>>>>        /*
>>>> -     * GuC LRC descriptor ID - Not assigned in this patch but
>>>> future patches
>>>> -     * in the series will.
>>>> +     * GuC ID link - in list when unpinned but guc_id still valid
>>>> in GuC
>>>>         */
>>>> -    u16 guc_id;
>>>> +    struct list_head guc_id_link;
>>>>    };
>>>>    #endif /* __INTEL_CONTEXT_TYPES__ */
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>>> b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>>> index 41e5350a7a05..49d4857ad9b7 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>>> @@ -87,7 +87,6 @@
>>>>    #define GEN11_CSB_WRITE_PTR_MASK    (GEN11_CSB_PTR_MASK << 0)
>>>>    #define MAX_CONTEXT_HW_ID    (1 << 21) /* exclusive */
>>>> -#define MAX_GUC_CONTEXT_HW_ID    (1 << 20) /* exclusive */
>>>>    #define GEN11_MAX_CONTEXT_HW_ID    (1 << 11) /* exclusive */
>>>>    /* in Gen12 ID 0x7FF is reserved to indicate idle */
>>>>    #define GEN12_MAX_CONTEXT_HW_ID    (GEN11_MAX_CONTEXT_HW_ID - 1)
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> index 8c7b92f699f1..30773cd699f5 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> @@ -7,6 +7,7 @@
>>>>    #define _INTEL_GUC_H_
>>>>    #include <linux/xarray.h>
>>>> +#include <linux/delay.h>
>>>>    #include "intel_uncore.h"
>>>>    #include "intel_guc_fw.h"
>>>> @@ -44,6 +45,14 @@ struct intel_guc {
>>>>            void (*disable)(struct intel_guc *guc);
>>>>        } interrupts;
>>>> +    /*
>>>> +     * contexts_lock protects the pool of free guc ids and a linked
>>>> list of
>>>> +     * guc ids available to be stolen
>>>> +     */
>>>> +    spinlock_t contexts_lock;
>>>> +    struct ida guc_ids;
>>>> +    struct list_head guc_id_list;
>>>> +
>>>>        bool submission_selected;
>>>>        struct i915_vma *ads_vma;
>>>> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc
>>>> *guc, const u32 *action, u32 len,
>>>>                     response_buf, response_buf_size, 0);
>>>>    }
>>>> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
>>>> +                       const u32 *action,
>>>> +                       u32 len,
>>>> +                       bool loop)
>>>> +{
>>>> +    int err;
>>>> +    unsigned int sleep_period_ms = 1;
>>>> +    bool not_atomic = !in_atomic() && !irqs_disabled();
>>>> +
>>>> +    /* No sleeping with spin locks, just busy loop */
>>>> +    might_sleep_if(loop && not_atomic);
>>>> +
>>>> +retry:
>>>> +    err = intel_guc_send_nb(guc, action, len);
>>>> +    if (unlikely(err == -EBUSY && loop)) {
>>>> +        if (likely(not_atomic)) {
>>>> +            if (msleep_interruptible(sleep_period_ms))
>>>> +                return -EINTR;
>>>> +            sleep_period_ms = sleep_period_ms << 1;
>>>> +        } else {
>>>> +            cpu_relax();
>>> Potentially something we can change later, but if we're in atomic
>>> context we
>>> can't keep looping without a timeout while we get -EBUSY, we need to
>>> bail
>>> out early.
>>>
>> Eventually intel_guc_send_nb will pop with a different return code after
>> 1 second I think if the GuC is hung.
> 
> I know, the point is that 1 second is too long in atomic context. This
> has to either complete quick in atomic or be forked off to a worker. Can
> fix as a follow up.

if we implement fallback to a worker, then maybe we can get rid of this
busy loop completely, as it was supposed to be only needed for rare
cases when CTBs were full due to unlikely congestion either caused by us
(host driver didn't read G2H on time or filled up H2G faster than GuC
could proceed it) or GuC actual failure (no H2G processing at all).

Michal

> 
> Daniele
> 
>>  
>>>> +        }
>>>> +        goto retry;
>>>> +    }
>>>> +
>>>> +    return err;
>>>> +}
>>>> +
>>>>    static inline void intel_guc_to_host_event_handler(struct
>>>> intel_guc *guc)
>>>>    {
>>>>        intel_guc_ct_event_handler(&guc->ct);
>>>> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct
>>>> intel_guc *guc, u32 mask)
>>>>    int intel_guc_reset_engine(struct intel_guc *guc,
>>>>                   struct intel_engine_cs *engine);
>>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>> +                      const u32 *msg, u32 len);
>>>> +
>>>>    void intel_guc_load_status(struct intel_guc *guc, struct
>>>> drm_printer *p);
>>>>    #endif
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> index 83ec60ea3f89..28ff82c5be45 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> @@ -928,6 +928,10 @@ static int ct_process_request(struct
>>>> intel_guc_ct *ct, struct ct_incoming_msg *r
>>>>        case INTEL_GUC_ACTION_DEFAULT:
>>>>            ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>>>>            break;
>>>> +    case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
>>>> +        ret = intel_guc_deregister_done_process_msg(guc, payload,
>>>> +                                len);
>>>> +        break;
>>>>        default:
>>>>            ret = -EOPNOTSUPP;
>>>>            break;
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> index 53b4a5eb4a85..a47b3813b4d0 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> @@ -13,7 +13,9 @@
>>>>    #include "gt/intel_gt.h"
>>>>    #include "gt/intel_gt_irq.h"
>>>>    #include "gt/intel_gt_pm.h"
>>>> +#include "gt/intel_gt_requests.h"
>>>>    #include "gt/intel_lrc.h"
>>>> +#include "gt/intel_lrc_reg.h"
>>>>    #include "gt/intel_mocs.h"
>>>>    #include "gt/intel_ring.h"
>>>> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct
>>>> intel_context *ce)
>>>>               &ce->guc_sched_state_no_lock);
>>>>    }
>>>> +/*
>>>> + * Below is a set of functions which control the GuC scheduling
>>>> state which
>>>> + * require a lock, aside from the special case where the functions
>>>> are called
>>>> + * from guc_lrc_desc_pin(). In that case it isn't possible for any
>>>> other code
>>>> + * path to be executing on the context.
>>>> + */
>>> Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even
>>> if it
>>> isn't strictly needed? I'd prefer to avoid the asymmetry in the locking
>> If this code is called from releasing a fence a circular dependency is
>> created. Once we move the DRM scheduler the locking becomes way easier
>> and we clean this up then. We already have task for locking clean up.
>>
>>> scheme if possible, as that might case trouble if thing change in the
>>> future.
>> Again this is temporary code, so I think we can live with this until the
>> DRM scheduler lands.
>>
>>>> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER    BIT(0)
>>>> +#define SCHED_STATE_DESTROYED                BIT(1)
>>>> +static inline void init_sched_state(struct intel_context *ce)
>>>> +{
>>>> +    /* Only should be called from guc_lrc_desc_pin() */
>>>> +    atomic_set(&ce->guc_sched_state_no_lock, 0);
>>>> +    ce->guc_state.sched_state = 0;
>>>> +}
>>>> +
>>>> +static inline bool
>>>> +context_wait_for_deregister_to_register(struct intel_context *ce)
>>>> +{
>>>> +    return (ce->guc_state.sched_state &
>>>> +        SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
>>> No need for (). Below as well.
>>>
>> Yep.
>>
>>>> +}
>>>> +
>>>> +static inline void
>>>> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
>>>> +{
>>>> +    /* Only should be called from guc_lrc_desc_pin() */
>>>> +    ce->guc_state.sched_state |=
>>>> +        SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
>>>> +}
>>>> +
>>>> +static inline void
>>>> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
>>>> +{
>>>> +    lockdep_assert_held(&ce->guc_state.lock);
>>>> +    ce->guc_state.sched_state =
>>>> +        (ce->guc_state.sched_state &
>>>> +         ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
>>> nit: can also use
>>>
>>> ce->guc_state.sched_state &=
>>>     ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER
>>>
>> Yep.
>>  
>>>> +}
>>>> +
>>>> +static inline bool
>>>> +context_destroyed(struct intel_context *ce)
>>>> +{
>>>> +    return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
>>>> +}
>>>> +
>>>> +static inline void
>>>> +set_context_destroyed(struct intel_context *ce)
>>>> +{
>>>> +    lockdep_assert_held(&ce->guc_state.lock);
>>>> +    ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
>>>> +}
>>>> +
>>>> +static inline bool context_guc_id_invalid(struct intel_context *ce)
>>>> +{
>>>> +    return (ce->guc_id == GUC_INVALID_LRC_ID);
>>>> +}
>>>> +
>>>> +static inline void set_context_guc_id_invalid(struct intel_context
>>>> *ce)
>>>> +{
>>>> +    ce->guc_id = GUC_INVALID_LRC_ID;
>>>> +}
>>>> +
>>>> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
>>>> +{
>>>> +    return &ce->engine->gt->uc.guc;
>>>> +}
>>>> +
>>>>    static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>>>>    {
>>>>        return rb_entry(rb, struct i915_priolist, node);
>>>> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc
>>>> *guc, struct i915_request *rq)
>>>>        int len = 0;
>>>>        bool enabled = context_enabled(ce);
>>>> +    GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
>>>> +    GEM_BUG_ON(context_guc_id_invalid(ce));
>>>> +
>>>>        if (!enabled) {
>>>>            action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>>>>            action[len++] = ce->guc_id;
>>>> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc
>>>> *guc)
>>>>        xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>>>> +    spin_lock_init(&guc->contexts_lock);
>>>> +    INIT_LIST_HEAD(&guc->guc_id_list);
>>>> +    ida_init(&guc->guc_ids);
>>>> +
>>>>        return 0;
>>>>    }
>>>> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct
>>>> intel_guc *guc)
>>>>        i915_sched_engine_put(guc->sched_engine);
>>>>    }
>>>> -static int guc_context_alloc(struct intel_context *ce)
>>>> +static inline void queue_request(struct i915_sched_engine
>>>> *sched_engine,
>>>> +                 struct i915_request *rq,
>>>> +                 int prio)
>>>>    {
>>>> -    return lrc_alloc(ce, ce->engine);
>>>> +    GEM_BUG_ON(!list_empty(&rq->sched.link));
>>>> +    list_add_tail(&rq->sched.link,
>>>> +              i915_sched_lookup_priolist(sched_engine, prio));
>>>> +    set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>>> +}
>>>> +
>>>> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>>>> +                     struct i915_request *rq)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    __i915_request_submit(rq);
>>>> +
>>>> +    trace_i915_request_in(rq, 0);
>>>> +
>>>> +    guc_set_lrc_tail(rq);
>>>> +    ret = guc_add_request(guc, rq);
>>>> +    if (ret == -EBUSY)
>>>> +        guc->stalled_request = rq;
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void guc_submit_request(struct i915_request *rq)
>>>> +{
>>>> +    struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>>>> +    struct intel_guc *guc = &rq->engine->gt->uc.guc;
>>>> +    unsigned long flags;
>>>> +
>>>> +    /* Will be called from irq-context when using foreign fences. */
>>>> +    spin_lock_irqsave(&sched_engine->lock, flags);
>>>> +
>>>> +    if (guc->stalled_request ||
>>>> !i915_sched_engine_is_empty(sched_engine))
>>>> +        queue_request(sched_engine, rq, rq_prio(rq));
>>>> +    else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>>>> +        tasklet_hi_schedule(&sched_engine->tasklet);
>>>> +
>>>> +    spin_unlock_irqrestore(&sched_engine->lock, flags);
>>>> +}
>>>> +
>>>> +#define GUC_ID_START    64    /* First 64 guc_ids reserved */
>>>> +static int new_guc_id(struct intel_guc *guc)
>>>> +{
>>>> +    return ida_simple_get(&guc->guc_ids, GUC_ID_START,
>>>> +                  GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
>>>> +                  __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
>>>> +}
>>>> +
>>>> +static void __release_guc_id(struct intel_guc *guc, struct
>>>> intel_context *ce)
>>>> +{
>>>> +    if (!context_guc_id_invalid(ce)) {
>>>> +        ida_simple_remove(&guc->guc_ids, ce->guc_id);
>>>> +        reset_lrc_desc(guc, ce->guc_id);
>>>> +        set_context_guc_id_invalid(ce);
>>>> +    }
>>>> +    if (!list_empty(&ce->guc_id_link))
>>>> +        list_del_init(&ce->guc_id_link);
>>>> +}
>>>> +
>>>> +static void release_guc_id(struct intel_guc *guc, struct
>>>> intel_context *ce)
>>>> +{
>>>> +    unsigned long flags;
>>>> +
>>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
>>>> +    __release_guc_id(guc, ce);
>>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +}
>>>> +
>>>> +static int steal_guc_id(struct intel_guc *guc)
>>>> +{
>>>> +    struct intel_context *ce;
>>>> +    int guc_id;
>>>> +
>>>> +    if (!list_empty(&guc->guc_id_list)) {
>>>> +        ce = list_first_entry(&guc->guc_id_list,
>>>> +                      struct intel_context,
>>>> +                      guc_id_link);
>>>> +
>>>> +        GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
>>>> +        GEM_BUG_ON(context_guc_id_invalid(ce));
>>>> +
>>>> +        list_del_init(&ce->guc_id_link);
>>>> +        guc_id = ce->guc_id;
>>>> +        set_context_guc_id_invalid(ce);
>>>> +        return guc_id;
>>>> +    } else {
>>>> +        return -EAGAIN;
>>>> +    }
>>>> +}
>>>> +
>>>> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    ret = new_guc_id(guc);
>>>> +    if (unlikely(ret < 0)) {
>>>> +        ret = steal_guc_id(guc);
>>>> +        if (ret < 0)
>>>> +            return ret;
>>>> +    }
>>>> +
>>>> +    *out = ret;
>>>> +    return 0;
>>> Is it worth adding spinlock_held asserts for guc->contexts_lock in
>>> these ID
>>> functions? Doubles up as a documentation of what locking we expect.
>>>
>> Never a bad idea to have asserts in the code. Will add.
>>
>>>> +}
>>>> +
>>>> +#define PIN_GUC_ID_TRIES    4
>>>> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>>> +{
>>>> +    int ret = 0;
>>>> +    unsigned long flags, tries = PIN_GUC_ID_TRIES;
>>>> +
>>>> +    GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
>>>> +
>>>> +try_again:
>>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
>>>> +
>>>> +    if (context_guc_id_invalid(ce)) {
>>>> +        ret = assign_guc_id(guc, &ce->guc_id);
>>>> +        if (ret)
>>>> +            goto out_unlock;
>>>> +        ret = 1;    /* Indidcates newly assigned guc_id */
>>>> +    }
>>>> +    if (!list_empty(&ce->guc_id_link))
>>>> +        list_del_init(&ce->guc_id_link);
>>>> +    atomic_inc(&ce->guc_id_ref);
>>>> +
>>>> +out_unlock:
>>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +
>>>> +    /*
>>>> +     * -EAGAIN indicates no guc_ids are available, let's retire any
>>>> +     * outstanding requests to see if that frees up a guc_id. If
>>>> the first
>>>> +     * retire didn't help, insert a sleep with the timeslice
>>>> duration before
>>>> +     * attempting to retire more requests. Double the sleep period
>>>> each
>>>> +     * subsequent pass before finally giving up. The sleep period
>>>> has max of
>>>> +     * 100ms and minimum of 1ms.
>>>> +     */
>>>> +    if (ret == -EAGAIN && --tries) {
>>>> +        if (PIN_GUC_ID_TRIES - tries > 1) {
>>>> +            unsigned int timeslice_shifted =
>>>> +                ce->engine->props.timeslice_duration_ms <<
>>>> +                (PIN_GUC_ID_TRIES - tries - 2);
>>>> +            unsigned int max = min_t(unsigned int, 100,
>>>> +                         timeslice_shifted);
>>>> +
>>>> +            msleep(max_t(unsigned int, max, 1));
>>>> +        }
>>>> +        intel_gt_retire_requests(guc_to_gt(guc));
>>>> +        goto try_again;
>>>> +    }
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void unpin_guc_id(struct intel_guc *guc, struct
>>>> intel_context *ce)
>>>> +{
>>>> +    unsigned long flags;
>>>> +
>>>> +    GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
>>>> +
>>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
>>>> +    if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
>>>> +        !atomic_read(&ce->guc_id_ref))
>>>> +        list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
>>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +}
>>>> +
>>>> +static int __guc_action_register_context(struct intel_guc *guc,
>>>> +                     u32 guc_id,
>>>> +                     u32 offset)
>>>> +{
>>>> +    u32 action[] = {
>>>> +        INTEL_GUC_ACTION_REGISTER_CONTEXT,
>>>> +        guc_id,
>>>> +        offset,
>>>> +    };
>>>> +
>>>> +    return intel_guc_send_busy_loop(guc, action,
>>>> ARRAY_SIZE(action), true);
>>>> +}
>>>> +
>>>> +static int register_context(struct intel_context *ce)
>>>> +{
>>>> +    struct intel_guc *guc = ce_to_guc(ce);
>>>> +    u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
>>>> +        ce->guc_id * sizeof(struct guc_lrc_desc);
>>>> +
>>>> +    return __guc_action_register_context(guc, ce->guc_id, offset);
>>>> +}
>>>> +
>>>> +static int __guc_action_deregister_context(struct intel_guc *guc,
>>>> +                       u32 guc_id)
>>>> +{
>>>> +    u32 action[] = {
>>>> +        INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
>>>> +        guc_id,
>>>> +    };
>>>> +
>>>> +    return intel_guc_send_busy_loop(guc, action,
>>>> ARRAY_SIZE(action), true);
>>>> +}
>>>> +
>>>> +static int deregister_context(struct intel_context *ce, u32 guc_id)
>>>> +{
>>>> +    struct intel_guc *guc = ce_to_guc(ce);
>>>> +
>>>> +    return __guc_action_deregister_context(guc, guc_id);
>>>> +}
>>>> +
>>>> +static intel_engine_mask_t adjust_engine_mask(u8 class,
>>>> intel_engine_mask_t mask)
>>>> +{
>>>> +    switch (class) {
>>>> +    case RENDER_CLASS:
>>>> +        return mask >> RCS0;
>>>> +    case VIDEO_ENHANCEMENT_CLASS:
>>>> +        return mask >> VECS0;
>>>> +    case VIDEO_DECODE_CLASS:
>>>> +        return mask >> VCS0;
>>>> +    case COPY_ENGINE_CLASS:
>>>> +        return mask >> BCS0;
>>>> +    default:
>>>> +        GEM_BUG_ON("Invalid Class");
>>> we usually use MISSING_CASE for this type of errors.
>> As soon as start talking in logical space (series immediatly after this
>> gets merged) this function gets deleted but will fix.
>>
>>>> +        return 0;
>>>> +    }
>>>> +}
>>>> +
>>>> +static void guc_context_policy_init(struct intel_engine_cs *engine,
>>>> +                    struct guc_lrc_desc *desc)
>>>> +{
>>>> +    desc->policy_flags = 0;
>>>> +
>>>> +    desc->execution_quantum =
>>>> CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
>>>> +    desc->preemption_timeout =
>>>> CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
>>>> +}
>>>> +
>>>> +static int guc_lrc_desc_pin(struct intel_context *ce)
>>>> +{
>>>> +    struct intel_runtime_pm *runtime_pm =
>>>> +        &ce->engine->gt->i915->runtime_pm;
>>> If you move this after the engine var below you can skip the ce->engine
>>> jump. Also, you can shorten the pointer chasing by using
>>> engine->uncore->rpm.
>>>
>> Sure.
>>
>>>> +    struct intel_engine_cs *engine = ce->engine;
>>>> +    struct intel_guc *guc = &engine->gt->uc.guc;
>>>> +    u32 desc_idx = ce->guc_id;
>>>> +    struct guc_lrc_desc *desc;
>>>> +    bool context_registered;
>>>> +    intel_wakeref_t wakeref;
>>>> +    int ret = 0;
>>>> +
>>>> +    GEM_BUG_ON(!engine->mask);
>>>> +
>>>> +    /*
>>>> +     * Ensure LRC + CT vmas are is same region as write barrier is
>>>> done
>>>> +     * based on CT vma region.
>>>> +     */
>>>> +    GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
>>>> +           i915_gem_object_is_lmem(ce->ring->vma->obj));
>>>> +
>>>> +    context_registered = lrc_desc_registered(guc, desc_idx);
>>>> +
>>>> +    reset_lrc_desc(guc, desc_idx);
>>>> +    set_lrc_desc_registered(guc, desc_idx, ce);
>>>> +
>>>> +    desc = __get_lrc_desc(guc, desc_idx);
>>>> +    desc->engine_class = engine_class_to_guc_class(engine->class);
>>>> +    desc->engine_submit_mask = adjust_engine_mask(engine->class,
>>>> +                              engine->mask);
>>>> +    desc->hw_context_desc = ce->lrc.lrca;
>>>> +    desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
>>>> +    desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
>>>> +    guc_context_policy_init(engine, desc);
>>>> +    init_sched_state(ce);
>>>> +
>>>> +    /*
>>>> +     * The context_lookup xarray is used to determine if the hardware
>>>> +     * context is currently registered. There are two cases in
>>>> which it
>>>> +     * could be regisgered either the guc_id has been stole from from
>>> typo regisgered
>>>
>> John got this one. Fixed.
>>  
>>>> +     * another context or the lrc descriptor address of this
>>>> context has
>>>> +     * changed. In either case the context needs to be deregistered
>>>> with the
>>>> +     * GuC before registering this context.
>>>> +     */
>>>> +    if (context_registered) {
>>>> +        set_context_wait_for_deregister_to_register(ce);
>>>> +        intel_context_get(ce);
>>>> +
>>>> +        /*
>>>> +         * If stealing the guc_id, this ce has the same guc_id as the
>>>> +         * context whos guc_id was stole.
>>>> +         */
>>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
>>>> +            ret = deregister_context(ce, ce->guc_id);
>>>> +    } else {
>>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
>>>> +            ret = register_context(ce);
>>>> +    }
>>>> +
>>>> +    return ret;
>>>>    }
>>>>    static int guc_context_pre_pin(struct intel_context *ce,
>>>> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct
>>>> intel_context *ce,
>>>>    static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>>>    {
>>>> +    if (i915_ggtt_offset(ce->state) !=
>>>> +        (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
>>>> +        set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>>> Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being
>>> re-assigned in there; it looks like it can only change if the context is
>>> new, but for future-proofing IMO better to re-order here.
>>>
>> No this is right. ce->state is assigned in lr_pre_pin and ce->lrc.lrca is
>> assigned in lrc_pin. To catch the change you have to check between
>> those two functions.
>>
>>>> +
>>>>        return lrc_pin(ce, ce->engine, vaddr);
>>> Could use a comment to say that the GuC context pinning call is delayed
>>> until request alloc and to look at the comment in the latter function
>>> for
>>> details (or move the explanation here and refer here from request
>>> alloc).
>>>
>> Yes.
>>  
>>>>    }
>>>> +static void guc_context_unpin(struct intel_context *ce)
>>>> +{
>>>> +    struct intel_guc *guc = ce_to_guc(ce);
>>>> +
>>>> +    unpin_guc_id(guc, ce);
>>>> +    lrc_unpin(ce);
>>>> +}
>>>> +
>>>> +static void guc_context_post_unpin(struct intel_context *ce)
>>>> +{
>>>> +    lrc_post_unpin(ce);
>>> why do we need this function? we can just pass lrc_post_unpin to the ops
>>> (like we already do).
>>>
>> We don't before style reasons I think having a GuC specific wrapper is
>> correct.
>>
>>>> +}
>>>> +
>>>> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>>>> +{
>>>> +    struct intel_engine_cs *engine = ce->engine;
>>>> +    struct intel_guc *guc = &engine->gt->uc.guc;
>>>> +    unsigned long flags;
>>>> +
>>>> +    GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
>>>> +    GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
>>>> +
>>>> +    spin_lock_irqsave(&ce->guc_state.lock, flags);
>>>> +    set_context_destroyed(ce);
>>>> +    spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>>> +
>>>> +    deregister_context(ce, ce->guc_id);
>>>> +}
>>>> +
>>>> +static void guc_context_destroy(struct kref *kref)
>>>> +{
>>>> +    struct intel_context *ce = container_of(kref, typeof(*ce), ref);
>>>> +    struct intel_runtime_pm *runtime_pm =
>>>> &ce->engine->gt->i915->runtime_pm;
>>> same as above, going through engine->uncore->rpm is shorter
>>>
>> Sure.
>>  
>>>> +    struct intel_guc *guc = &ce->engine->gt->uc.guc;
>>>> +    intel_wakeref_t wakeref;
>>>> +    unsigned long flags;
>>>> +
>>>> +    /*
>>>> +     * If the guc_id is invalid this context has been stolen and we
>>>> can free
>>>> +     * it immediately. Also can be freed immediately if the context
>>>> is not
>>>> +     * registered with the GuC.
>>>> +     */
>>>> +    if (context_guc_id_invalid(ce) ||
>>>> +        !lrc_desc_registered(guc, ce->guc_id)) {
>>>> +        release_guc_id(guc, ce);
>>> it feels a bit weird that we call release_guc_id in the case where
>>> the id is
>>> invalid. The code handles it fine, but still the flow doesn't feel
>>> clean.
>>> Not a blocker.
>>>
>> I understand. Let's see if this can get cleaned up at some point.
>>
>>>> +        lrc_destroy(kref);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * We have to acquire the context spinlock and check guc_id
>>>> again, if it
>>>> +     * is valid it hasn't been stolen and needs to be deregistered. We
>>>> +     * delete this context from the list of unpinned guc_ids
>>>> available to
>>>> +     * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
>>>> +     * returns indicating this context has been deregistered the
>>>> guc_id is
>>>> +     * returned to the pool of available guc_ids.
>>>> +     */
>>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
>>>> +    if (context_guc_id_invalid(ce)) {
>>>> +        __release_guc_id(guc, ce);
>>> But here the call to __release_guc_id is unneded, right? the ce
>>> doesn't own
>>> the ID anymore and both the steal and the release function already clean
>>> ce->guc_id_link, so there should be nothing left to clean.
>>>
>> This is dead code. Will delete.
>>
>>>> +        spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +        lrc_destroy(kref);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (!list_empty(&ce->guc_id_link))
>>>> +        list_del_init(&ce->guc_id_link);
>>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +
>>>> +    /*
>>>> +     * We defer GuC context deregistration until the context is
>>>> destroyed
>>>> +     * in order to save on CTBs. With this optimization ideally we
>>>> only need
>>>> +     * 1 CTB to register the context during the first pin and 1 CTB to
>>>> +     * deregister the context when the context is destroyed.
>>>> Without this
>>>> +     * optimization, a CTB would be needed every pin & unpin.
>>>> +     *
>>>> +     * XXX: Need to acqiure the runtime wakeref as this can be
>>>> triggered
>>>> +     * from context_free_worker when not runtime wakeref is held.
>>>> +     * guc_lrc_desc_unpin requires the runtime as a GuC register is
>>>> written
>>>> +     * in H2G CTB to deregister the context. A future patch may
>>>> defer this
>>>> +     * H2G CTB if the runtime wakeref is zero.
>>>> +     */
>>>> +    with_intel_runtime_pm(runtime_pm, wakeref)
>>>> +        guc_lrc_desc_unpin(ce);
>>>> +}
>>>> +
>>>> +static int guc_context_alloc(struct intel_context *ce)
>>>> +{
>>>> +    return lrc_alloc(ce, ce->engine);
>>>> +}
>>>> +
>>>>    static const struct intel_context_ops guc_context_ops = {
>>>>        .alloc = guc_context_alloc,
>>>>        .pre_pin = guc_context_pre_pin,
>>>>        .pin = guc_context_pin,
>>>> -    .unpin = lrc_unpin,
>>>> -    .post_unpin = lrc_post_unpin,
>>>> +    .unpin = guc_context_unpin,
>>>> +    .post_unpin = guc_context_post_unpin,
>>>>        .enter = intel_context_enter_engine,
>>>>        .exit = intel_context_exit_engine,
>>>>        .reset = lrc_reset,
>>>> -    .destroy = lrc_destroy,
>>>> +    .destroy = guc_context_destroy,
>>>>    };
>>>> -static int guc_request_alloc(struct i915_request *request)
>>>> +static bool context_needs_register(struct intel_context *ce, bool
>>>> new_guc_id)
>>>>    {
>>>> +    return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
>>>> +        !lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
>>>> +}
>>>> +
>>>> +static int guc_request_alloc(struct i915_request *rq)
>>>> +{
>>>> +    struct intel_context *ce = rq->context;
>>>> +    struct intel_guc *guc = ce_to_guc(ce);
>>>>        int ret;
>>>> -    GEM_BUG_ON(!intel_context_is_pinned(request->context));
>>>> +    GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>>>>        /*
>>>>         * Flush enough space to reduce the likelihood of waiting after
>>>>         * we start building the request - in which case we will just
>>>>         * have to repeat work.
>>>>         */
>>>> -    request->reserved_space += GUC_REQUEST_SIZE;
>>>> +    rq->reserved_space += GUC_REQUEST_SIZE;
>>>>        /*
>>>>         * Note that after this point, we have committed to using
>>>> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct
>>>> i915_request *request)
>>>>         */
>>>>        /* Unconditionally invalidate GPU caches and TLBs. */
>>>> -    ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
>>>> +    ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>>>>        if (ret)
>>>>            return ret;
>>>> -    request->reserved_space -= GUC_REQUEST_SIZE;
>>>> -    return 0;
>>>> -}
>>>> +    rq->reserved_space -= GUC_REQUEST_SIZE;
>>>> -static inline void queue_request(struct i915_sched_engine
>>>> *sched_engine,
>>>> -                 struct i915_request *rq,
>>>> -                 int prio)
>>>> -{
>>>> -    GEM_BUG_ON(!list_empty(&rq->sched.link));
>>>> -    list_add_tail(&rq->sched.link,
>>>> -              i915_sched_lookup_priolist(sched_engine, prio));
>>>> -    set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>>> -}
>>>> -
>>>> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>>>> -                     struct i915_request *rq)
>>>> -{
>>>> -    int ret;
>>>> -
>>>> -    __i915_request_submit(rq);
>>>> -
>>>> -    trace_i915_request_in(rq, 0);
>>>> -
>>>> -    guc_set_lrc_tail(rq);
>>>> -    ret = guc_add_request(guc, rq);
>>>> -    if (ret == -EBUSY)
>>>> -        guc->stalled_request = rq;
>>>> -
>>>> -    return ret;
>>>> -}
>>>> -
>>>> -static void guc_submit_request(struct i915_request *rq)
>>>> -{
>>>> -    struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>>>> -    struct intel_guc *guc = &rq->engine->gt->uc.guc;
>>>> -    unsigned long flags;
>>>> +    /*
>>>> +     * Call pin_guc_id here rather than in the pinning step as with
>>>> +     * dma_resv, contexts can be repeatedly pinned / unpinned
>>>> trashing the
>>>> +     * guc_ids and creating horrible race conditions. This is
>>>> especially bad
>>>> +     * when guc_ids are being stolen due to over subscription. By
>>>> the time
>>>> +     * this function is reached, it is guaranteed that the guc_id
>>>> will be
>>>> +     * persistent until the generated request is retired. Thus,
>>>> sealing these
>>>> +     * race conditions. It is still safe to fail here if guc_ids are
>>>> +     * exhausted and return -EAGAIN to the user indicating that
>>>> they can try
>>>> +     * again in the future.
>>>> +     *
>>>> +     * There is no need for a lock here as the timeline mutex
>>>> ensures at
>>>> +     * most one context can be executing this code path at once. The
>>>> +     * guc_id_ref is incremented once for every request in flight and
>>>> +     * decremented on each retire. When it is zero, a lock around the
>>>> +     * increment (in pin_guc_id) is needed to seal a race with
>>>> unpin_guc_id.
>>>> +     */
>>>> +    if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
>>>> +        return 0;
>>>> -    /* Will be called from irq-context when using foreign fences. */
>>>> -    spin_lock_irqsave(&sched_engine->lock, flags);
>>>> +    ret = pin_guc_id(guc, ce);    /* returns 1 if new guc_id
>>>> assigned */
>>>> +    if (unlikely(ret < 0))
>>>> +        return ret;;
>>> typo ";;"
>>>
>> Yep.
>>
>>>> +    if (context_needs_register(ce, !!ret)) {
>>>> +        ret = guc_lrc_desc_pin(ce);
>>>> +        if (unlikely(ret)) {    /* unwind */
>>>> +            atomic_dec(&ce->guc_id_ref);
>>>> +            unpin_guc_id(guc, ce);
>>>> +            return ret;
>>>> +        }
>>>> +    }
>>>> -    if (guc->stalled_request ||
>>>> !i915_sched_engine_is_empty(sched_engine))
>>>> -        queue_request(sched_engine, rq, rq_prio(rq));
>>>> -    else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>>>> -        tasklet_hi_schedule(&sched_engine->tasklet);
>>>> +    clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>>>
>>> Might be worth moving the pinning to a separate function to keep the
>>> request_alloc focused on the request. Can be done as a follow up.
>>>
>> Sure. Let me see how that looks in a follow up.
>>  
>>>> -    spin_unlock_irqrestore(&sched_engine->lock, flags);
>>>> +    return 0;
>>>>    }
>>>>    static void sanitize_hwsp(struct intel_engine_cs *engine)
>>>> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct
>>>> intel_engine_cs *engine)
>>>>        engine->submit_request = guc_submit_request;
>>>>    }
>>>> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
>>>> +                      struct intel_context *ce)
>>>> +{
>>>> +    if (context_guc_id_invalid(ce))
>>>> +        pin_guc_id(guc, ce);
>>>> +    guc_lrc_desc_pin(ce);
>>>> +}
>>>> +
>>>> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
>>>> +{
>>>> +    struct intel_gt *gt = guc_to_gt(guc);
>>>> +    struct intel_engine_cs *engine;
>>>> +    enum intel_engine_id id;
>>>> +
>>>> +    /* make sure all descriptors are clean... */
>>>> +    xa_destroy(&guc->context_lookup);
>>>> +
>>>> +    /*
>>>> +     * Some contexts might have been pinned before we enabled GuC
>>>> +     * submission, so we need to add them to the GuC bookeeping.
>>>> +     * Also, after a reset the GuC we want to make sure that the
>>>> information
>>>> +     * shared with GuC is properly reset. The kernel lrcs are not
>>>> attached
>>>> +     * to the gem_context, so they need to be added separately.
>>>> +     *
>>>> +     * Note: we purposely do not check the error return of
>>>> +     * guc_lrc_desc_pin, because that function can only fail in two
>>>> cases.
>>>> +     * One, if there aren't enough free IDs, but we're guaranteed
>>>> to have
>>>> +     * enough here (we're either only pinning a handful of lrc on
>>>> first boot
>>>> +     * or we're re-pinning lrcs that were already pinned before the
>>>> reset).
>>>> +     * Two, if the GuC has died and CTBs can't make forward progress.
>>>> +     * Presumably, the GuC should be alive as this function is
>>>> called on
>>>> +     * driver load or after a reset. Even if it is dead, another
>>>> full GPU
>>>> +     * reset will be triggered and this function would be called
>>>> again.
>>>> +     */
>>>> +
>>>> +    for_each_engine(engine, gt, id)
>>>> +        if (engine->kernel_context)
>>>> +            guc_kernel_context_pin(guc, engine->kernel_context);
>>>> +}
>>>> +
>>>>    static void guc_release(struct intel_engine_cs *engine)
>>>>    {
>>>>        engine->sanitize = NULL; /* no longer in control, nothing to
>>>> sanitize */
>>>> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct
>>>> intel_engine_cs *engine)
>>>>    void intel_guc_submission_enable(struct intel_guc *guc)
>>>>    {
>>>> +    guc_init_lrc_mapping(guc);
>>>>    }
>>>>    void intel_guc_submission_disable(struct intel_guc *guc)
>>>> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct
>>>> intel_guc *guc)
>>>>    {
>>>>        guc->submission_selected = __guc_submission_selected(guc);
>>>>    }
>>>> +
>>>> +static inline struct intel_context *
>>>> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>>>> +{
>>>> +    struct intel_context *ce;
>>>> +
>>>> +    if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
>>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm,
>>>> +            "Invalid desc_idx %u", desc_idx);
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    ce = __get_context(guc, desc_idx);
>>>> +    if (unlikely(!ce)) {
>>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm,
>>>> +            "Context is NULL, desc_idx %u", desc_idx);
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    return ce;
>>>> +}
>>>> +
>>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>> +                      const u32 *msg,
>>>> +                      u32 len)
>>>> +{
>>>> +    struct intel_context *ce;
>>>> +    u32 desc_idx = msg[0];
>>>> +
>>>> +    if (unlikely(len < 1)) {
>>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
>>>> +        return -EPROTO;
>>>> +    }
>>>> +
>>>> +    ce = g2h_context_lookup(guc, desc_idx);
>>>> +    if (unlikely(!ce))
>>>> +        return -EPROTO;
>>>> +
>>>> +    if (context_wait_for_deregister_to_register(ce)) {
>>>> +        struct intel_runtime_pm *runtime_pm =
>>>> +            &ce->engine->gt->i915->runtime_pm;
>>>> +        intel_wakeref_t wakeref;
>>>> +
>>>> +        /*
>>>> +         * Previous owner of this guc_id has been deregistered, now
>>>> safe
>>>> +         * register this context.
>>>> +         */
>>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
>>>> +            register_context(ce);
>>>> +        clr_context_wait_for_deregister_to_register(ce);
>>>> +        intel_context_put(ce);
>>>> +    } else if (context_destroyed(ce)) {
>>>> +        /* Context has been destroyed */
>>>> +        release_guc_id(guc, ce);
>>>> +        lrc_destroy(&ce->ref);
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>>> b/drivers/gpu/drm/i915/i915_reg.h
>>>> index 943fe485c662..204c95c39353 100644
>>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>>> @@ -4142,6 +4142,7 @@ enum {
>>>>        FAULT_AND_CONTINUE /* Unsupported */
>>>>    };
>>>> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>>>>    #define GEN8_CTX_VALID (1 << 0)
>>>>    #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>>>>    #define GEN8_CTX_FORCE_RESTORE (1 << 2)
>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c
>>>> b/drivers/gpu/drm/i915/i915_request.c
>>>> index 86b4c9f2613d..b48c4905d3fc 100644
>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
>>>>         */
>>>>        if (!list_empty(&rq->sched.link))
>>>>            remove_from_engine(rq);
>>>> +    atomic_dec(&rq->context->guc_id_ref);
>>> Does this work/make sense  if GuC is disabled?
>>>
>> It doesn't matter if the GuC is disabled as this var isn't used if it
>> is. A following patch adds a vfunc here and pushes this in the GuC
>> backend.
>>
>> Matt
>>
>>> Daniele
>>>
>>>>        GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>>>>        __list_del_entry(&rq->link); /* poison neither prev/next (RCU
>>>> walks) */
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
@ 2021-07-22  7:57           ` Michal Wajdeczko
  0 siblings, 0 replies; 221+ messages in thread
From: Michal Wajdeczko @ 2021-07-22  7:57 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, Matthew Brost; +Cc: intel-gfx, dri-devel



On 22.07.2021 01:51, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/19/2021 9:04 PM, Matthew Brost wrote:
>> On Mon, Jul 19, 2021 at 05:51:46PM -0700, Daniele Ceraolo Spurio wrote:
>>>
>>> On 7/16/2021 1:16 PM, Matthew Brost wrote:
>>>> Implement GuC context operations which includes GuC specific operations
>>>> alloc, pin, unpin, and destroy.
>>>>
>>>> v2:
>>>>    (Daniel Vetter)
>>>>     - Use msleep_interruptible rather than cond_resched in busy loop
>>>>    (Michal)
>>>>     - Remove C++ style comment
>>>>
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
>>>>    drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
>>>>    drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
>>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
>>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666
>>>> ++++++++++++++++--
>>>>    drivers/gpu/drm/i915/i915_reg.h               |   1 +
>>>>    drivers/gpu/drm/i915/i915_request.c           |   1 +
>>>>    8 files changed, 685 insertions(+), 55 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c
>>>> b/drivers/gpu/drm/i915/gt/intel_context.c
>>>> index bd63813c8a80..32fd6647154b 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
>>>> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce,
>>>> struct intel_engine_cs *engine)
>>>>        mutex_init(&ce->pin_mutex);
>>>> +    spin_lock_init(&ce->guc_state.lock);
>>>> +
>>>> +    ce->guc_id = GUC_INVALID_LRC_ID;
>>>> +    INIT_LIST_HEAD(&ce->guc_id_link);
>>>> +
>>>>        i915_active_init(&ce->active,
>>>>                 __intel_context_active, __intel_context_retire, 0);
>>>>    }
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h
>>>> b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>>> index 6d99631d19b9..606c480aec26 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
>>>> @@ -96,6 +96,7 @@ struct intel_context {
>>>>    #define CONTEXT_BANNED            6
>>>>    #define CONTEXT_FORCE_SINGLE_SUBMISSION    7
>>>>    #define CONTEXT_NOPREEMPT        8
>>>> +#define CONTEXT_LRCA_DIRTY        9
>>>>        struct {
>>>>            u64 timeout_us;
>>>> @@ -138,14 +139,29 @@ struct intel_context {
>>>>        u8 wa_bb_page; /* if set, page num reserved for context
>>>> workarounds */
>>>> +    struct {
>>>> +        /** lock: protects everything in guc_state */
>>>> +        spinlock_t lock;
>>>> +        /**
>>>> +         * sched_state: scheduling state of this context using GuC
>>>> +         * submission
>>>> +         */
>>>> +        u8 sched_state;
>>>> +    } guc_state;
>>>> +
>>>>        /* GuC scheduling state flags that do not require a lock. */
>>>>        atomic_t guc_sched_state_no_lock;
>>>> +    /* GuC LRC descriptor ID */
>>>> +    u16 guc_id;
>>>> +
>>>> +    /* GuC LRC descriptor reference count */
>>>> +    atomic_t guc_id_ref;
>>>> +
>>>>        /*
>>>> -     * GuC LRC descriptor ID - Not assigned in this patch but
>>>> future patches
>>>> -     * in the series will.
>>>> +     * GuC ID link - in list when unpinned but guc_id still valid
>>>> in GuC
>>>>         */
>>>> -    u16 guc_id;
>>>> +    struct list_head guc_id_link;
>>>>    };
>>>>    #endif /* __INTEL_CONTEXT_TYPES__ */
>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>>> b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>>> index 41e5350a7a05..49d4857ad9b7 100644
>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
>>>> @@ -87,7 +87,6 @@
>>>>    #define GEN11_CSB_WRITE_PTR_MASK    (GEN11_CSB_PTR_MASK << 0)
>>>>    #define MAX_CONTEXT_HW_ID    (1 << 21) /* exclusive */
>>>> -#define MAX_GUC_CONTEXT_HW_ID    (1 << 20) /* exclusive */
>>>>    #define GEN11_MAX_CONTEXT_HW_ID    (1 << 11) /* exclusive */
>>>>    /* in Gen12 ID 0x7FF is reserved to indicate idle */
>>>>    #define GEN12_MAX_CONTEXT_HW_ID    (GEN11_MAX_CONTEXT_HW_ID - 1)
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> index 8c7b92f699f1..30773cd699f5 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
>>>> @@ -7,6 +7,7 @@
>>>>    #define _INTEL_GUC_H_
>>>>    #include <linux/xarray.h>
>>>> +#include <linux/delay.h>
>>>>    #include "intel_uncore.h"
>>>>    #include "intel_guc_fw.h"
>>>> @@ -44,6 +45,14 @@ struct intel_guc {
>>>>            void (*disable)(struct intel_guc *guc);
>>>>        } interrupts;
>>>> +    /*
>>>> +     * contexts_lock protects the pool of free guc ids and a linked
>>>> list of
>>>> +     * guc ids available to be stolen
>>>> +     */
>>>> +    spinlock_t contexts_lock;
>>>> +    struct ida guc_ids;
>>>> +    struct list_head guc_id_list;
>>>> +
>>>>        bool submission_selected;
>>>>        struct i915_vma *ads_vma;
>>>> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc
>>>> *guc, const u32 *action, u32 len,
>>>>                     response_buf, response_buf_size, 0);
>>>>    }
>>>> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
>>>> +                       const u32 *action,
>>>> +                       u32 len,
>>>> +                       bool loop)
>>>> +{
>>>> +    int err;
>>>> +    unsigned int sleep_period_ms = 1;
>>>> +    bool not_atomic = !in_atomic() && !irqs_disabled();
>>>> +
>>>> +    /* No sleeping with spin locks, just busy loop */
>>>> +    might_sleep_if(loop && not_atomic);
>>>> +
>>>> +retry:
>>>> +    err = intel_guc_send_nb(guc, action, len);
>>>> +    if (unlikely(err == -EBUSY && loop)) {
>>>> +        if (likely(not_atomic)) {
>>>> +            if (msleep_interruptible(sleep_period_ms))
>>>> +                return -EINTR;
>>>> +            sleep_period_ms = sleep_period_ms << 1;
>>>> +        } else {
>>>> +            cpu_relax();
>>> Potentially something we can change later, but if we're in atomic
>>> context we
>>> can't keep looping without a timeout while we get -EBUSY, we need to
>>> bail
>>> out early.
>>>
>> Eventually intel_guc_send_nb will pop with a different return code after
>> 1 second I think if the GuC is hung.
> 
> I know, the point is that 1 second is too long in atomic context. This
> has to either complete quick in atomic or be forked off to a worker. Can
> fix as a follow up.

if we implement fallback to a worker, then maybe we can get rid of this
busy loop completely, as it was supposed to be only needed for rare
cases when CTBs were full due to unlikely congestion either caused by us
(host driver didn't read G2H on time or filled up H2G faster than GuC
could proceed it) or GuC actual failure (no H2G processing at all).

Michal

> 
> Daniele
> 
>>  
>>>> +        }
>>>> +        goto retry;
>>>> +    }
>>>> +
>>>> +    return err;
>>>> +}
>>>> +
>>>>    static inline void intel_guc_to_host_event_handler(struct
>>>> intel_guc *guc)
>>>>    {
>>>>        intel_guc_ct_event_handler(&guc->ct);
>>>> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct
>>>> intel_guc *guc, u32 mask)
>>>>    int intel_guc_reset_engine(struct intel_guc *guc,
>>>>                   struct intel_engine_cs *engine);
>>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>> +                      const u32 *msg, u32 len);
>>>> +
>>>>    void intel_guc_load_status(struct intel_guc *guc, struct
>>>> drm_printer *p);
>>>>    #endif
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> index 83ec60ea3f89..28ff82c5be45 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>>>> @@ -928,6 +928,10 @@ static int ct_process_request(struct
>>>> intel_guc_ct *ct, struct ct_incoming_msg *r
>>>>        case INTEL_GUC_ACTION_DEFAULT:
>>>>            ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
>>>>            break;
>>>> +    case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
>>>> +        ret = intel_guc_deregister_done_process_msg(guc, payload,
>>>> +                                len);
>>>> +        break;
>>>>        default:
>>>>            ret = -EOPNOTSUPP;
>>>>            break;
>>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> index 53b4a5eb4a85..a47b3813b4d0 100644
>>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
>>>> @@ -13,7 +13,9 @@
>>>>    #include "gt/intel_gt.h"
>>>>    #include "gt/intel_gt_irq.h"
>>>>    #include "gt/intel_gt_pm.h"
>>>> +#include "gt/intel_gt_requests.h"
>>>>    #include "gt/intel_lrc.h"
>>>> +#include "gt/intel_lrc_reg.h"
>>>>    #include "gt/intel_mocs.h"
>>>>    #include "gt/intel_ring.h"
>>>> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct
>>>> intel_context *ce)
>>>>               &ce->guc_sched_state_no_lock);
>>>>    }
>>>> +/*
>>>> + * Below is a set of functions which control the GuC scheduling
>>>> state which
>>>> + * require a lock, aside from the special case where the functions
>>>> are called
>>>> + * from guc_lrc_desc_pin(). In that case it isn't possible for any
>>>> other code
>>>> + * path to be executing on the context.
>>>> + */
>>> Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even
>>> if it
>>> isn't strictly needed? I'd prefer to avoid the asymmetry in the locking
>> If this code is called from releasing a fence a circular dependency is
>> created. Once we move the DRM scheduler the locking becomes way easier
>> and we clean this up then. We already have task for locking clean up.
>>
>>> scheme if possible, as that might case trouble if thing change in the
>>> future.
>> Again this is temporary code, so I think we can live with this until the
>> DRM scheduler lands.
>>
>>>> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER    BIT(0)
>>>> +#define SCHED_STATE_DESTROYED                BIT(1)
>>>> +static inline void init_sched_state(struct intel_context *ce)
>>>> +{
>>>> +    /* Only should be called from guc_lrc_desc_pin() */
>>>> +    atomic_set(&ce->guc_sched_state_no_lock, 0);
>>>> +    ce->guc_state.sched_state = 0;
>>>> +}
>>>> +
>>>> +static inline bool
>>>> +context_wait_for_deregister_to_register(struct intel_context *ce)
>>>> +{
>>>> +    return (ce->guc_state.sched_state &
>>>> +        SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
>>> No need for (). Below as well.
>>>
>> Yep.
>>
>>>> +}
>>>> +
>>>> +static inline void
>>>> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
>>>> +{
>>>> +    /* Only should be called from guc_lrc_desc_pin() */
>>>> +    ce->guc_state.sched_state |=
>>>> +        SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
>>>> +}
>>>> +
>>>> +static inline void
>>>> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
>>>> +{
>>>> +    lockdep_assert_held(&ce->guc_state.lock);
>>>> +    ce->guc_state.sched_state =
>>>> +        (ce->guc_state.sched_state &
>>>> +         ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
>>> nit: can also use
>>>
>>> ce->guc_state.sched_state &=
>>>     ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER
>>>
>> Yep.
>>  
>>>> +}
>>>> +
>>>> +static inline bool
>>>> +context_destroyed(struct intel_context *ce)
>>>> +{
>>>> +    return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
>>>> +}
>>>> +
>>>> +static inline void
>>>> +set_context_destroyed(struct intel_context *ce)
>>>> +{
>>>> +    lockdep_assert_held(&ce->guc_state.lock);
>>>> +    ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
>>>> +}
>>>> +
>>>> +static inline bool context_guc_id_invalid(struct intel_context *ce)
>>>> +{
>>>> +    return (ce->guc_id == GUC_INVALID_LRC_ID);
>>>> +}
>>>> +
>>>> +static inline void set_context_guc_id_invalid(struct intel_context
>>>> *ce)
>>>> +{
>>>> +    ce->guc_id = GUC_INVALID_LRC_ID;
>>>> +}
>>>> +
>>>> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
>>>> +{
>>>> +    return &ce->engine->gt->uc.guc;
>>>> +}
>>>> +
>>>>    static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>>>>    {
>>>>        return rb_entry(rb, struct i915_priolist, node);
>>>> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc
>>>> *guc, struct i915_request *rq)
>>>>        int len = 0;
>>>>        bool enabled = context_enabled(ce);
>>>> +    GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
>>>> +    GEM_BUG_ON(context_guc_id_invalid(ce));
>>>> +
>>>>        if (!enabled) {
>>>>            action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
>>>>            action[len++] = ce->guc_id;
>>>> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc
>>>> *guc)
>>>>        xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
>>>> +    spin_lock_init(&guc->contexts_lock);
>>>> +    INIT_LIST_HEAD(&guc->guc_id_list);
>>>> +    ida_init(&guc->guc_ids);
>>>> +
>>>>        return 0;
>>>>    }
>>>> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct
>>>> intel_guc *guc)
>>>>        i915_sched_engine_put(guc->sched_engine);
>>>>    }
>>>> -static int guc_context_alloc(struct intel_context *ce)
>>>> +static inline void queue_request(struct i915_sched_engine
>>>> *sched_engine,
>>>> +                 struct i915_request *rq,
>>>> +                 int prio)
>>>>    {
>>>> -    return lrc_alloc(ce, ce->engine);
>>>> +    GEM_BUG_ON(!list_empty(&rq->sched.link));
>>>> +    list_add_tail(&rq->sched.link,
>>>> +              i915_sched_lookup_priolist(sched_engine, prio));
>>>> +    set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>>> +}
>>>> +
>>>> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>>>> +                     struct i915_request *rq)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    __i915_request_submit(rq);
>>>> +
>>>> +    trace_i915_request_in(rq, 0);
>>>> +
>>>> +    guc_set_lrc_tail(rq);
>>>> +    ret = guc_add_request(guc, rq);
>>>> +    if (ret == -EBUSY)
>>>> +        guc->stalled_request = rq;
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void guc_submit_request(struct i915_request *rq)
>>>> +{
>>>> +    struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>>>> +    struct intel_guc *guc = &rq->engine->gt->uc.guc;
>>>> +    unsigned long flags;
>>>> +
>>>> +    /* Will be called from irq-context when using foreign fences. */
>>>> +    spin_lock_irqsave(&sched_engine->lock, flags);
>>>> +
>>>> +    if (guc->stalled_request ||
>>>> !i915_sched_engine_is_empty(sched_engine))
>>>> +        queue_request(sched_engine, rq, rq_prio(rq));
>>>> +    else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>>>> +        tasklet_hi_schedule(&sched_engine->tasklet);
>>>> +
>>>> +    spin_unlock_irqrestore(&sched_engine->lock, flags);
>>>> +}
>>>> +
>>>> +#define GUC_ID_START    64    /* First 64 guc_ids reserved */
>>>> +static int new_guc_id(struct intel_guc *guc)
>>>> +{
>>>> +    return ida_simple_get(&guc->guc_ids, GUC_ID_START,
>>>> +                  GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
>>>> +                  __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
>>>> +}
>>>> +
>>>> +static void __release_guc_id(struct intel_guc *guc, struct
>>>> intel_context *ce)
>>>> +{
>>>> +    if (!context_guc_id_invalid(ce)) {
>>>> +        ida_simple_remove(&guc->guc_ids, ce->guc_id);
>>>> +        reset_lrc_desc(guc, ce->guc_id);
>>>> +        set_context_guc_id_invalid(ce);
>>>> +    }
>>>> +    if (!list_empty(&ce->guc_id_link))
>>>> +        list_del_init(&ce->guc_id_link);
>>>> +}
>>>> +
>>>> +static void release_guc_id(struct intel_guc *guc, struct
>>>> intel_context *ce)
>>>> +{
>>>> +    unsigned long flags;
>>>> +
>>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
>>>> +    __release_guc_id(guc, ce);
>>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +}
>>>> +
>>>> +static int steal_guc_id(struct intel_guc *guc)
>>>> +{
>>>> +    struct intel_context *ce;
>>>> +    int guc_id;
>>>> +
>>>> +    if (!list_empty(&guc->guc_id_list)) {
>>>> +        ce = list_first_entry(&guc->guc_id_list,
>>>> +                      struct intel_context,
>>>> +                      guc_id_link);
>>>> +
>>>> +        GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
>>>> +        GEM_BUG_ON(context_guc_id_invalid(ce));
>>>> +
>>>> +        list_del_init(&ce->guc_id_link);
>>>> +        guc_id = ce->guc_id;
>>>> +        set_context_guc_id_invalid(ce);
>>>> +        return guc_id;
>>>> +    } else {
>>>> +        return -EAGAIN;
>>>> +    }
>>>> +}
>>>> +
>>>> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
>>>> +{
>>>> +    int ret;
>>>> +
>>>> +    ret = new_guc_id(guc);
>>>> +    if (unlikely(ret < 0)) {
>>>> +        ret = steal_guc_id(guc);
>>>> +        if (ret < 0)
>>>> +            return ret;
>>>> +    }
>>>> +
>>>> +    *out = ret;
>>>> +    return 0;
>>> Is it worth adding spinlock_held asserts for guc->contexts_lock in
>>> these ID
>>> functions? Doubles up as a documentation of what locking we expect.
>>>
>> Never a bad idea to have asserts in the code. Will add.
>>
>>>> +}
>>>> +
>>>> +#define PIN_GUC_ID_TRIES    4
>>>> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
>>>> +{
>>>> +    int ret = 0;
>>>> +    unsigned long flags, tries = PIN_GUC_ID_TRIES;
>>>> +
>>>> +    GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
>>>> +
>>>> +try_again:
>>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
>>>> +
>>>> +    if (context_guc_id_invalid(ce)) {
>>>> +        ret = assign_guc_id(guc, &ce->guc_id);
>>>> +        if (ret)
>>>> +            goto out_unlock;
>>>> +        ret = 1;    /* Indidcates newly assigned guc_id */
>>>> +    }
>>>> +    if (!list_empty(&ce->guc_id_link))
>>>> +        list_del_init(&ce->guc_id_link);
>>>> +    atomic_inc(&ce->guc_id_ref);
>>>> +
>>>> +out_unlock:
>>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +
>>>> +    /*
>>>> +     * -EAGAIN indicates no guc_ids are available, let's retire any
>>>> +     * outstanding requests to see if that frees up a guc_id. If
>>>> the first
>>>> +     * retire didn't help, insert a sleep with the timeslice
>>>> duration before
>>>> +     * attempting to retire more requests. Double the sleep period
>>>> each
>>>> +     * subsequent pass before finally giving up. The sleep period
>>>> has max of
>>>> +     * 100ms and minimum of 1ms.
>>>> +     */
>>>> +    if (ret == -EAGAIN && --tries) {
>>>> +        if (PIN_GUC_ID_TRIES - tries > 1) {
>>>> +            unsigned int timeslice_shifted =
>>>> +                ce->engine->props.timeslice_duration_ms <<
>>>> +                (PIN_GUC_ID_TRIES - tries - 2);
>>>> +            unsigned int max = min_t(unsigned int, 100,
>>>> +                         timeslice_shifted);
>>>> +
>>>> +            msleep(max_t(unsigned int, max, 1));
>>>> +        }
>>>> +        intel_gt_retire_requests(guc_to_gt(guc));
>>>> +        goto try_again;
>>>> +    }
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +static void unpin_guc_id(struct intel_guc *guc, struct
>>>> intel_context *ce)
>>>> +{
>>>> +    unsigned long flags;
>>>> +
>>>> +    GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
>>>> +
>>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
>>>> +    if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
>>>> +        !atomic_read(&ce->guc_id_ref))
>>>> +        list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
>>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +}
>>>> +
>>>> +static int __guc_action_register_context(struct intel_guc *guc,
>>>> +                     u32 guc_id,
>>>> +                     u32 offset)
>>>> +{
>>>> +    u32 action[] = {
>>>> +        INTEL_GUC_ACTION_REGISTER_CONTEXT,
>>>> +        guc_id,
>>>> +        offset,
>>>> +    };
>>>> +
>>>> +    return intel_guc_send_busy_loop(guc, action,
>>>> ARRAY_SIZE(action), true);
>>>> +}
>>>> +
>>>> +static int register_context(struct intel_context *ce)
>>>> +{
>>>> +    struct intel_guc *guc = ce_to_guc(ce);
>>>> +    u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
>>>> +        ce->guc_id * sizeof(struct guc_lrc_desc);
>>>> +
>>>> +    return __guc_action_register_context(guc, ce->guc_id, offset);
>>>> +}
>>>> +
>>>> +static int __guc_action_deregister_context(struct intel_guc *guc,
>>>> +                       u32 guc_id)
>>>> +{
>>>> +    u32 action[] = {
>>>> +        INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
>>>> +        guc_id,
>>>> +    };
>>>> +
>>>> +    return intel_guc_send_busy_loop(guc, action,
>>>> ARRAY_SIZE(action), true);
>>>> +}
>>>> +
>>>> +static int deregister_context(struct intel_context *ce, u32 guc_id)
>>>> +{
>>>> +    struct intel_guc *guc = ce_to_guc(ce);
>>>> +
>>>> +    return __guc_action_deregister_context(guc, guc_id);
>>>> +}
>>>> +
>>>> +static intel_engine_mask_t adjust_engine_mask(u8 class,
>>>> intel_engine_mask_t mask)
>>>> +{
>>>> +    switch (class) {
>>>> +    case RENDER_CLASS:
>>>> +        return mask >> RCS0;
>>>> +    case VIDEO_ENHANCEMENT_CLASS:
>>>> +        return mask >> VECS0;
>>>> +    case VIDEO_DECODE_CLASS:
>>>> +        return mask >> VCS0;
>>>> +    case COPY_ENGINE_CLASS:
>>>> +        return mask >> BCS0;
>>>> +    default:
>>>> +        GEM_BUG_ON("Invalid Class");
>>> we usually use MISSING_CASE for this type of errors.
>> As soon as start talking in logical space (series immediatly after this
>> gets merged) this function gets deleted but will fix.
>>
>>>> +        return 0;
>>>> +    }
>>>> +}
>>>> +
>>>> +static void guc_context_policy_init(struct intel_engine_cs *engine,
>>>> +                    struct guc_lrc_desc *desc)
>>>> +{
>>>> +    desc->policy_flags = 0;
>>>> +
>>>> +    desc->execution_quantum =
>>>> CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
>>>> +    desc->preemption_timeout =
>>>> CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
>>>> +}
>>>> +
>>>> +static int guc_lrc_desc_pin(struct intel_context *ce)
>>>> +{
>>>> +    struct intel_runtime_pm *runtime_pm =
>>>> +        &ce->engine->gt->i915->runtime_pm;
>>> If you move this after the engine var below you can skip the ce->engine
>>> jump. Also, you can shorten the pointer chasing by using
>>> engine->uncore->rpm.
>>>
>> Sure.
>>
>>>> +    struct intel_engine_cs *engine = ce->engine;
>>>> +    struct intel_guc *guc = &engine->gt->uc.guc;
>>>> +    u32 desc_idx = ce->guc_id;
>>>> +    struct guc_lrc_desc *desc;
>>>> +    bool context_registered;
>>>> +    intel_wakeref_t wakeref;
>>>> +    int ret = 0;
>>>> +
>>>> +    GEM_BUG_ON(!engine->mask);
>>>> +
>>>> +    /*
>>>> +     * Ensure LRC + CT vmas are is same region as write barrier is
>>>> done
>>>> +     * based on CT vma region.
>>>> +     */
>>>> +    GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
>>>> +           i915_gem_object_is_lmem(ce->ring->vma->obj));
>>>> +
>>>> +    context_registered = lrc_desc_registered(guc, desc_idx);
>>>> +
>>>> +    reset_lrc_desc(guc, desc_idx);
>>>> +    set_lrc_desc_registered(guc, desc_idx, ce);
>>>> +
>>>> +    desc = __get_lrc_desc(guc, desc_idx);
>>>> +    desc->engine_class = engine_class_to_guc_class(engine->class);
>>>> +    desc->engine_submit_mask = adjust_engine_mask(engine->class,
>>>> +                              engine->mask);
>>>> +    desc->hw_context_desc = ce->lrc.lrca;
>>>> +    desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
>>>> +    desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
>>>> +    guc_context_policy_init(engine, desc);
>>>> +    init_sched_state(ce);
>>>> +
>>>> +    /*
>>>> +     * The context_lookup xarray is used to determine if the hardware
>>>> +     * context is currently registered. There are two cases in
>>>> which it
>>>> +     * could be regisgered either the guc_id has been stole from from
>>> typo regisgered
>>>
>> John got this one. Fixed.
>>  
>>>> +     * another context or the lrc descriptor address of this
>>>> context has
>>>> +     * changed. In either case the context needs to be deregistered
>>>> with the
>>>> +     * GuC before registering this context.
>>>> +     */
>>>> +    if (context_registered) {
>>>> +        set_context_wait_for_deregister_to_register(ce);
>>>> +        intel_context_get(ce);
>>>> +
>>>> +        /*
>>>> +         * If stealing the guc_id, this ce has the same guc_id as the
>>>> +         * context whos guc_id was stole.
>>>> +         */
>>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
>>>> +            ret = deregister_context(ce, ce->guc_id);
>>>> +    } else {
>>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
>>>> +            ret = register_context(ce);
>>>> +    }
>>>> +
>>>> +    return ret;
>>>>    }
>>>>    static int guc_context_pre_pin(struct intel_context *ce,
>>>> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct
>>>> intel_context *ce,
>>>>    static int guc_context_pin(struct intel_context *ce, void *vaddr)
>>>>    {
>>>> +    if (i915_ggtt_offset(ce->state) !=
>>>> +        (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
>>>> +        set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>>> Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being
>>> re-assigned in there; it looks like it can only change if the context is
>>> new, but for future-proofing IMO better to re-order here.
>>>
>> No this is right. ce->state is assigned in lr_pre_pin and ce->lrc.lrca is
>> assigned in lrc_pin. To catch the change you have to check between
>> those two functions.
>>
>>>> +
>>>>        return lrc_pin(ce, ce->engine, vaddr);
>>> Could use a comment to say that the GuC context pinning call is delayed
>>> until request alloc and to look at the comment in the latter function
>>> for
>>> details (or move the explanation here and refer here from request
>>> alloc).
>>>
>> Yes.
>>  
>>>>    }
>>>> +static void guc_context_unpin(struct intel_context *ce)
>>>> +{
>>>> +    struct intel_guc *guc = ce_to_guc(ce);
>>>> +
>>>> +    unpin_guc_id(guc, ce);
>>>> +    lrc_unpin(ce);
>>>> +}
>>>> +
>>>> +static void guc_context_post_unpin(struct intel_context *ce)
>>>> +{
>>>> +    lrc_post_unpin(ce);
>>> why do we need this function? we can just pass lrc_post_unpin to the ops
>>> (like we already do).
>>>
>> We don't before style reasons I think having a GuC specific wrapper is
>> correct.
>>
>>>> +}
>>>> +
>>>> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>>>> +{
>>>> +    struct intel_engine_cs *engine = ce->engine;
>>>> +    struct intel_guc *guc = &engine->gt->uc.guc;
>>>> +    unsigned long flags;
>>>> +
>>>> +    GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
>>>> +    GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
>>>> +
>>>> +    spin_lock_irqsave(&ce->guc_state.lock, flags);
>>>> +    set_context_destroyed(ce);
>>>> +    spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>>>> +
>>>> +    deregister_context(ce, ce->guc_id);
>>>> +}
>>>> +
>>>> +static void guc_context_destroy(struct kref *kref)
>>>> +{
>>>> +    struct intel_context *ce = container_of(kref, typeof(*ce), ref);
>>>> +    struct intel_runtime_pm *runtime_pm =
>>>> &ce->engine->gt->i915->runtime_pm;
>>> same as above, going through engine->uncore->rpm is shorter
>>>
>> Sure.
>>  
>>>> +    struct intel_guc *guc = &ce->engine->gt->uc.guc;
>>>> +    intel_wakeref_t wakeref;
>>>> +    unsigned long flags;
>>>> +
>>>> +    /*
>>>> +     * If the guc_id is invalid this context has been stolen and we
>>>> can free
>>>> +     * it immediately. Also can be freed immediately if the context
>>>> is not
>>>> +     * registered with the GuC.
>>>> +     */
>>>> +    if (context_guc_id_invalid(ce) ||
>>>> +        !lrc_desc_registered(guc, ce->guc_id)) {
>>>> +        release_guc_id(guc, ce);
>>> it feels a bit weird that we call release_guc_id in the case where
>>> the id is
>>> invalid. The code handles it fine, but still the flow doesn't feel
>>> clean.
>>> Not a blocker.
>>>
>> I understand. Let's see if this can get cleaned up at some point.
>>
>>>> +        lrc_destroy(kref);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * We have to acquire the context spinlock and check guc_id
>>>> again, if it
>>>> +     * is valid it hasn't been stolen and needs to be deregistered. We
>>>> +     * delete this context from the list of unpinned guc_ids
>>>> available to
>>>> +     * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
>>>> +     * returns indicating this context has been deregistered the
>>>> guc_id is
>>>> +     * returned to the pool of available guc_ids.
>>>> +     */
>>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
>>>> +    if (context_guc_id_invalid(ce)) {
>>>> +        __release_guc_id(guc, ce);
>>> But here the call to __release_guc_id is unneded, right? the ce
>>> doesn't own
>>> the ID anymore and both the steal and the release function already clean
>>> ce->guc_id_link, so there should be nothing left to clean.
>>>
>> This is dead code. Will delete.
>>
>>>> +        spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +        lrc_destroy(kref);
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    if (!list_empty(&ce->guc_id_link))
>>>> +        list_del_init(&ce->guc_id_link);
>>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
>>>> +
>>>> +    /*
>>>> +     * We defer GuC context deregistration until the context is
>>>> destroyed
>>>> +     * in order to save on CTBs. With this optimization ideally we
>>>> only need
>>>> +     * 1 CTB to register the context during the first pin and 1 CTB to
>>>> +     * deregister the context when the context is destroyed.
>>>> Without this
>>>> +     * optimization, a CTB would be needed every pin & unpin.
>>>> +     *
>>>> +     * XXX: Need to acqiure the runtime wakeref as this can be
>>>> triggered
>>>> +     * from context_free_worker when not runtime wakeref is held.
>>>> +     * guc_lrc_desc_unpin requires the runtime as a GuC register is
>>>> written
>>>> +     * in H2G CTB to deregister the context. A future patch may
>>>> defer this
>>>> +     * H2G CTB if the runtime wakeref is zero.
>>>> +     */
>>>> +    with_intel_runtime_pm(runtime_pm, wakeref)
>>>> +        guc_lrc_desc_unpin(ce);
>>>> +}
>>>> +
>>>> +static int guc_context_alloc(struct intel_context *ce)
>>>> +{
>>>> +    return lrc_alloc(ce, ce->engine);
>>>> +}
>>>> +
>>>>    static const struct intel_context_ops guc_context_ops = {
>>>>        .alloc = guc_context_alloc,
>>>>        .pre_pin = guc_context_pre_pin,
>>>>        .pin = guc_context_pin,
>>>> -    .unpin = lrc_unpin,
>>>> -    .post_unpin = lrc_post_unpin,
>>>> +    .unpin = guc_context_unpin,
>>>> +    .post_unpin = guc_context_post_unpin,
>>>>        .enter = intel_context_enter_engine,
>>>>        .exit = intel_context_exit_engine,
>>>>        .reset = lrc_reset,
>>>> -    .destroy = lrc_destroy,
>>>> +    .destroy = guc_context_destroy,
>>>>    };
>>>> -static int guc_request_alloc(struct i915_request *request)
>>>> +static bool context_needs_register(struct intel_context *ce, bool
>>>> new_guc_id)
>>>>    {
>>>> +    return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
>>>> +        !lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
>>>> +}
>>>> +
>>>> +static int guc_request_alloc(struct i915_request *rq)
>>>> +{
>>>> +    struct intel_context *ce = rq->context;
>>>> +    struct intel_guc *guc = ce_to_guc(ce);
>>>>        int ret;
>>>> -    GEM_BUG_ON(!intel_context_is_pinned(request->context));
>>>> +    GEM_BUG_ON(!intel_context_is_pinned(rq->context));
>>>>        /*
>>>>         * Flush enough space to reduce the likelihood of waiting after
>>>>         * we start building the request - in which case we will just
>>>>         * have to repeat work.
>>>>         */
>>>> -    request->reserved_space += GUC_REQUEST_SIZE;
>>>> +    rq->reserved_space += GUC_REQUEST_SIZE;
>>>>        /*
>>>>         * Note that after this point, we have committed to using
>>>> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct
>>>> i915_request *request)
>>>>         */
>>>>        /* Unconditionally invalidate GPU caches and TLBs. */
>>>> -    ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
>>>> +    ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
>>>>        if (ret)
>>>>            return ret;
>>>> -    request->reserved_space -= GUC_REQUEST_SIZE;
>>>> -    return 0;
>>>> -}
>>>> +    rq->reserved_space -= GUC_REQUEST_SIZE;
>>>> -static inline void queue_request(struct i915_sched_engine
>>>> *sched_engine,
>>>> -                 struct i915_request *rq,
>>>> -                 int prio)
>>>> -{
>>>> -    GEM_BUG_ON(!list_empty(&rq->sched.link));
>>>> -    list_add_tail(&rq->sched.link,
>>>> -              i915_sched_lookup_priolist(sched_engine, prio));
>>>> -    set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>>> -}
>>>> -
>>>> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
>>>> -                     struct i915_request *rq)
>>>> -{
>>>> -    int ret;
>>>> -
>>>> -    __i915_request_submit(rq);
>>>> -
>>>> -    trace_i915_request_in(rq, 0);
>>>> -
>>>> -    guc_set_lrc_tail(rq);
>>>> -    ret = guc_add_request(guc, rq);
>>>> -    if (ret == -EBUSY)
>>>> -        guc->stalled_request = rq;
>>>> -
>>>> -    return ret;
>>>> -}
>>>> -
>>>> -static void guc_submit_request(struct i915_request *rq)
>>>> -{
>>>> -    struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
>>>> -    struct intel_guc *guc = &rq->engine->gt->uc.guc;
>>>> -    unsigned long flags;
>>>> +    /*
>>>> +     * Call pin_guc_id here rather than in the pinning step as with
>>>> +     * dma_resv, contexts can be repeatedly pinned / unpinned
>>>> trashing the
>>>> +     * guc_ids and creating horrible race conditions. This is
>>>> especially bad
>>>> +     * when guc_ids are being stolen due to over subscription. By
>>>> the time
>>>> +     * this function is reached, it is guaranteed that the guc_id
>>>> will be
>>>> +     * persistent until the generated request is retired. Thus,
>>>> sealing these
>>>> +     * race conditions. It is still safe to fail here if guc_ids are
>>>> +     * exhausted and return -EAGAIN to the user indicating that
>>>> they can try
>>>> +     * again in the future.
>>>> +     *
>>>> +     * There is no need for a lock here as the timeline mutex
>>>> ensures at
>>>> +     * most one context can be executing this code path at once. The
>>>> +     * guc_id_ref is incremented once for every request in flight and
>>>> +     * decremented on each retire. When it is zero, a lock around the
>>>> +     * increment (in pin_guc_id) is needed to seal a race with
>>>> unpin_guc_id.
>>>> +     */
>>>> +    if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
>>>> +        return 0;
>>>> -    /* Will be called from irq-context when using foreign fences. */
>>>> -    spin_lock_irqsave(&sched_engine->lock, flags);
>>>> +    ret = pin_guc_id(guc, ce);    /* returns 1 if new guc_id
>>>> assigned */
>>>> +    if (unlikely(ret < 0))
>>>> +        return ret;;
>>> typo ";;"
>>>
>> Yep.
>>
>>>> +    if (context_needs_register(ce, !!ret)) {
>>>> +        ret = guc_lrc_desc_pin(ce);
>>>> +        if (unlikely(ret)) {    /* unwind */
>>>> +            atomic_dec(&ce->guc_id_ref);
>>>> +            unpin_guc_id(guc, ce);
>>>> +            return ret;
>>>> +        }
>>>> +    }
>>>> -    if (guc->stalled_request ||
>>>> !i915_sched_engine_is_empty(sched_engine))
>>>> -        queue_request(sched_engine, rq, rq_prio(rq));
>>>> -    else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
>>>> -        tasklet_hi_schedule(&sched_engine->tasklet);
>>>> +    clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
>>>
>>> Might be worth moving the pinning to a separate function to keep the
>>> request_alloc focused on the request. Can be done as a follow up.
>>>
>> Sure. Let me see how that looks in a follow up.
>>  
>>>> -    spin_unlock_irqrestore(&sched_engine->lock, flags);
>>>> +    return 0;
>>>>    }
>>>>    static void sanitize_hwsp(struct intel_engine_cs *engine)
>>>> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct
>>>> intel_engine_cs *engine)
>>>>        engine->submit_request = guc_submit_request;
>>>>    }
>>>> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
>>>> +                      struct intel_context *ce)
>>>> +{
>>>> +    if (context_guc_id_invalid(ce))
>>>> +        pin_guc_id(guc, ce);
>>>> +    guc_lrc_desc_pin(ce);
>>>> +}
>>>> +
>>>> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
>>>> +{
>>>> +    struct intel_gt *gt = guc_to_gt(guc);
>>>> +    struct intel_engine_cs *engine;
>>>> +    enum intel_engine_id id;
>>>> +
>>>> +    /* make sure all descriptors are clean... */
>>>> +    xa_destroy(&guc->context_lookup);
>>>> +
>>>> +    /*
>>>> +     * Some contexts might have been pinned before we enabled GuC
>>>> +     * submission, so we need to add them to the GuC bookeeping.
>>>> +     * Also, after a reset the GuC we want to make sure that the
>>>> information
>>>> +     * shared with GuC is properly reset. The kernel lrcs are not
>>>> attached
>>>> +     * to the gem_context, so they need to be added separately.
>>>> +     *
>>>> +     * Note: we purposely do not check the error return of
>>>> +     * guc_lrc_desc_pin, because that function can only fail in two
>>>> cases.
>>>> +     * One, if there aren't enough free IDs, but we're guaranteed
>>>> to have
>>>> +     * enough here (we're either only pinning a handful of lrc on
>>>> first boot
>>>> +     * or we're re-pinning lrcs that were already pinned before the
>>>> reset).
>>>> +     * Two, if the GuC has died and CTBs can't make forward progress.
>>>> +     * Presumably, the GuC should be alive as this function is
>>>> called on
>>>> +     * driver load or after a reset. Even if it is dead, another
>>>> full GPU
>>>> +     * reset will be triggered and this function would be called
>>>> again.
>>>> +     */
>>>> +
>>>> +    for_each_engine(engine, gt, id)
>>>> +        if (engine->kernel_context)
>>>> +            guc_kernel_context_pin(guc, engine->kernel_context);
>>>> +}
>>>> +
>>>>    static void guc_release(struct intel_engine_cs *engine)
>>>>    {
>>>>        engine->sanitize = NULL; /* no longer in control, nothing to
>>>> sanitize */
>>>> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct
>>>> intel_engine_cs *engine)
>>>>    void intel_guc_submission_enable(struct intel_guc *guc)
>>>>    {
>>>> +    guc_init_lrc_mapping(guc);
>>>>    }
>>>>    void intel_guc_submission_disable(struct intel_guc *guc)
>>>> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct
>>>> intel_guc *guc)
>>>>    {
>>>>        guc->submission_selected = __guc_submission_selected(guc);
>>>>    }
>>>> +
>>>> +static inline struct intel_context *
>>>> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
>>>> +{
>>>> +    struct intel_context *ce;
>>>> +
>>>> +    if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
>>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm,
>>>> +            "Invalid desc_idx %u", desc_idx);
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    ce = __get_context(guc, desc_idx);
>>>> +    if (unlikely(!ce)) {
>>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm,
>>>> +            "Context is NULL, desc_idx %u", desc_idx);
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    return ce;
>>>> +}
>>>> +
>>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
>>>> +                      const u32 *msg,
>>>> +                      u32 len)
>>>> +{
>>>> +    struct intel_context *ce;
>>>> +    u32 desc_idx = msg[0];
>>>> +
>>>> +    if (unlikely(len < 1)) {
>>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
>>>> +        return -EPROTO;
>>>> +    }
>>>> +
>>>> +    ce = g2h_context_lookup(guc, desc_idx);
>>>> +    if (unlikely(!ce))
>>>> +        return -EPROTO;
>>>> +
>>>> +    if (context_wait_for_deregister_to_register(ce)) {
>>>> +        struct intel_runtime_pm *runtime_pm =
>>>> +            &ce->engine->gt->i915->runtime_pm;
>>>> +        intel_wakeref_t wakeref;
>>>> +
>>>> +        /*
>>>> +         * Previous owner of this guc_id has been deregistered, now
>>>> safe
>>>> +         * register this context.
>>>> +         */
>>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
>>>> +            register_context(ce);
>>>> +        clr_context_wait_for_deregister_to_register(ce);
>>>> +        intel_context_put(ce);
>>>> +    } else if (context_destroyed(ce)) {
>>>> +        /* Context has been destroyed */
>>>> +        release_guc_id(guc, ce);
>>>> +        lrc_destroy(&ce->ref);
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
>>>> b/drivers/gpu/drm/i915/i915_reg.h
>>>> index 943fe485c662..204c95c39353 100644
>>>> --- a/drivers/gpu/drm/i915/i915_reg.h
>>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>>>> @@ -4142,6 +4142,7 @@ enum {
>>>>        FAULT_AND_CONTINUE /* Unsupported */
>>>>    };
>>>> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
>>>>    #define GEN8_CTX_VALID (1 << 0)
>>>>    #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
>>>>    #define GEN8_CTX_FORCE_RESTORE (1 << 2)
>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c
>>>> b/drivers/gpu/drm/i915/i915_request.c
>>>> index 86b4c9f2613d..b48c4905d3fc 100644
>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
>>>>         */
>>>>        if (!list_empty(&rq->sched.link))
>>>>            remove_from_engine(rq);
>>>> +    atomic_dec(&rq->context->guc_id_ref);
>>> Does this work/make sense  if GuC is disabled?
>>>
>> It doesn't matter if the GuC is disabled as this var isn't used if it
>> is. A following patch adds a vfunc here and pushes this in the GuC
>> backend.
>>
>> Matt
>>
>>> Daniele
>>>
>>>>        GEM_BUG_ON(!llist_empty(&rq->execute_cb));
>>>>        __list_del_entry(&rq->link); /* poison neither prev/next (RCU
>>>> walks) */
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-22  8:13     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 221+ messages in thread
From: Tvrtko Ursulin @ 2021-07-22  8:13 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 16/07/2021 21:17, Matthew Brost wrote:
> Requests may take slightly longer with GuC submission, let's increase
> the timeouts in live_requests.

Hm "slightly" ends up 5x longer here and one second feels a lot.

Out of curiosity, given this is about a simple submit of a no-op batches 
in a tight loop, where does the huge amount of extra latency come from? 
Is it single H2G channel shared for all engines, the firmware, or GuC 
speed itself?

Regards,

Tvrtko

> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index bd5c96a77ba3..d67710d10615 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg)
>   		i915_request_add(rq);
>   
>   		err = 0;
> -		if (i915_request_wait(rq, 0, HZ / 5) < 0)
> +		if (i915_request_wait(rq, 0, HZ) < 0)
>   			err = -ETIME;
>   		i915_request_put(rq);
>   		if (err)
> @@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg)
>   	}
>   	igt_spinner_end(&spin);
>   
> -	if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0)
> +	if (err == 0 && i915_request_wait(rq, 0, HZ) < 0)
>   		err = -EIO;
>   	i915_request_put(rq);
>   
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests
@ 2021-07-22  8:13     ` Tvrtko Ursulin
  0 siblings, 0 replies; 221+ messages in thread
From: Tvrtko Ursulin @ 2021-07-22  8:13 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 16/07/2021 21:17, Matthew Brost wrote:
> Requests may take slightly longer with GuC submission, let's increase
> the timeouts in live_requests.

Hm "slightly" ends up 5x longer here and one second feels a lot.

Out of curiosity, given this is about a simple submit of a no-op batches 
in a tight loop, where does the huge amount of extra latency come from? 
Is it single H2G channel shared for all engines, the firmware, or GuC 
speed itself?

Regards,

Tvrtko

> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> ---
>   drivers/gpu/drm/i915/selftests/i915_request.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index bd5c96a77ba3..d67710d10615 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -1313,7 +1313,7 @@ static int __live_parallel_engine1(void *arg)
>   		i915_request_add(rq);
>   
>   		err = 0;
> -		if (i915_request_wait(rq, 0, HZ / 5) < 0)
> +		if (i915_request_wait(rq, 0, HZ) < 0)
>   			err = -ETIME;
>   		i915_request_put(rq);
>   		if (err)
> @@ -1419,7 +1419,7 @@ static int __live_parallel_spin(void *arg)
>   	}
>   	igt_spinner_end(&spin);
>   
> -	if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0)
> +	if (err == 0 && i915_request_wait(rq, 0, HZ) < 0)
>   		err = -EIO;
>   	i915_request_put(rq);
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 49/51] drm/i915/selftest: Bump selftest timeouts for hangcheck
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-22  8:17     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 221+ messages in thread
From: Tvrtko Ursulin @ 2021-07-22  8:17 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 16/07/2021 21:17, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Some testing environments and some heavier tests are slower than

What testing environments are they? It's not a simulation patch which 
"escaped" by accident I am wondering. If not then it's just GuC which is 
so slow, like that other patch two steps previous in the series?

Regards,

Tvrtko

> previous limits allowed for. For example, it can take multiple seconds
> for the 'context has been reset' notification handler to reach the
> 'kill the requests' code in the 'active' version of the 'reset
> engines' test. During which time the selftest gets bored, gives up
> waiting and fails the test.
> 
> There is also an async thread that the selftest uses to pump work
> through the hardware in parallel to the context that is marked for
> reset. That also could get bored waiting for completions and kill the
> test off.
> 
> Lastly, the flush at the of various test sections can also see
> timeouts due to the large amount of work backed up. This is also true
> of the live_hwsp_read test.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c             | 2 +-
>   drivers/gpu/drm/i915/selftests/igt_flush_test.c          | 2 +-
>   drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c | 2 +-
>   3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 971c0c249eb0..a93a9b0d258e 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -876,7 +876,7 @@ static int active_request_put(struct i915_request *rq)
>   	if (!rq)
>   		return 0;
>   
> -	if (i915_request_wait(rq, 0, 5 * HZ) < 0) {
> +	if (i915_request_wait(rq, 0, 10 * HZ) < 0) {
>   		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld\n",
>   			  rq->engine->name,
>   			  rq->fence.context,
> diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> index 7b0939e3f007..a6c71fca61aa 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> @@ -19,7 +19,7 @@ int igt_flush_test(struct drm_i915_private *i915)
>   
>   	cond_resched();
>   
> -	if (intel_gt_wait_for_idle(gt, HZ / 5) == -ETIME) {
> +	if (intel_gt_wait_for_idle(gt, HZ) == -ETIME) {
>   		pr_err("%pS timed out, cancelling all further testing.\n",
>   		       __builtin_return_address(0));
>   
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> index 69db139f9e0d..ebd6d69b3315 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> @@ -13,7 +13,7 @@
>   
>   #define REDUCED_TIMESLICE	5
>   #define REDUCED_PREEMPT		10
> -#define WAIT_FOR_RESET_TIME	1000
> +#define WAIT_FOR_RESET_TIME	10000
>   
>   int intel_selftest_modify_policy(struct intel_engine_cs *engine,
>   				 struct intel_selftest_saved_policy *saved,
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 49/51] drm/i915/selftest: Bump selftest timeouts for hangcheck
@ 2021-07-22  8:17     ` Tvrtko Ursulin
  0 siblings, 0 replies; 221+ messages in thread
From: Tvrtko Ursulin @ 2021-07-22  8:17 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 16/07/2021 21:17, Matthew Brost wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> Some testing environments and some heavier tests are slower than

What testing environments are they? It's not a simulation patch which 
"escaped" by accident I am wondering. If not then it's just GuC which is 
so slow, like that other patch two steps previous in the series?

Regards,

Tvrtko

> previous limits allowed for. For example, it can take multiple seconds
> for the 'context has been reset' notification handler to reach the
> 'kill the requests' code in the 'active' version of the 'reset
> engines' test. During which time the selftest gets bored, gives up
> waiting and fails the test.
> 
> There is also an async thread that the selftest uses to pump work
> through the hardware in parallel to the context that is marked for
> reset. That also could get bored waiting for completions and kill the
> test off.
> 
> Lastly, the flush at the of various test sections can also see
> timeouts due to the large amount of work backed up. This is also true
> of the live_hwsp_read test.
> 
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c             | 2 +-
>   drivers/gpu/drm/i915/selftests/igt_flush_test.c          | 2 +-
>   drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c | 2 +-
>   3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 971c0c249eb0..a93a9b0d258e 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -876,7 +876,7 @@ static int active_request_put(struct i915_request *rq)
>   	if (!rq)
>   		return 0;
>   
> -	if (i915_request_wait(rq, 0, 5 * HZ) < 0) {
> +	if (i915_request_wait(rq, 0, 10 * HZ) < 0) {
>   		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld\n",
>   			  rq->engine->name,
>   			  rq->fence.context,
> diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> index 7b0939e3f007..a6c71fca61aa 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
> @@ -19,7 +19,7 @@ int igt_flush_test(struct drm_i915_private *i915)
>   
>   	cond_resched();
>   
> -	if (intel_gt_wait_for_idle(gt, HZ / 5) == -ETIME) {
> +	if (intel_gt_wait_for_idle(gt, HZ) == -ETIME) {
>   		pr_err("%pS timed out, cancelling all further testing.\n",
>   		       __builtin_return_address(0));
>   
> diff --git a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> index 69db139f9e0d..ebd6d69b3315 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_scheduler_helpers.c
> @@ -13,7 +13,7 @@
>   
>   #define REDUCED_TIMESLICE	5
>   #define REDUCED_PREEMPT		10
> -#define WAIT_FOR_RESET_TIME	1000
> +#define WAIT_FOR_RESET_TIME	10000
>   
>   int intel_selftest_modify_policy(struct intel_engine_cs *engine,
>   				 struct intel_selftest_saved_policy *saved,
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
@ 2021-07-22 12:46     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 221+ messages in thread
From: Tvrtko Ursulin @ 2021-07-22 12:46 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 16/07/2021 21:16, Matthew Brost wrote:
> With GuC virtual engines the physical engine which a request executes
> and completes on isn't known to the i915. Therefore we can't attach a
> request to a physical engines breadcrumbs. To work around this we create
> a single breadcrumbs per engine class when using GuC submission and
> direct all physical engine interrupts to this breadcrumbs.
> 
> v2:
>   (John H)
>    - Rework header file structure so intel_engine_mask_t can be in
>      intel_engine_types.h
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> CC: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 ++++-
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
>   drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
>   .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
>   drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
>   9 files changed, 133 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 38cc42783dfb..2007dc6f6b99 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -15,28 +15,14 @@
>   #include "intel_gt_pm.h"
>   #include "intel_gt_requests.h"
>   
> -static bool irq_enable(struct intel_engine_cs *engine)
> +static bool irq_enable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_enable)
> -		return false;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_enable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> -
> -	return true;
> +	return intel_engine_irq_enable(b->irq_engine);
>   }
>   
> -static void irq_disable(struct intel_engine_cs *engine)
> +static void irq_disable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_disable)
> -		return;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_disable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> +	intel_engine_irq_disable(b->irq_engine);
>   }
>   
>   static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   	WRITE_ONCE(b->irq_armed, true);
>   
>   	/* Requests may have completed before we could enable the interrupt. */
> -	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
> +	if (!b->irq_enabled++ && b->irq_enable(b))
>   		irq_work_queue(&b->irq_work);
>   }
>   
> @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
>   {
>   	GEM_BUG_ON(!b->irq_enabled);
>   	if (!--b->irq_enabled)
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	WRITE_ONCE(b->irq_armed, false);
>   	intel_gt_pm_put_async(b->irq_engine->gt);
> @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	if (!b)
>   		return NULL;
>   
> -	b->irq_engine = irq_engine;
> +	kref_init(&b->ref);
>   
>   	spin_lock_init(&b->signalers_lock);
>   	INIT_LIST_HEAD(&b->signalers);
> @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	spin_lock_init(&b->irq_lock);
>   	init_irq_work(&b->irq_work, signal_irq_work);
>   
> +	b->irq_engine = irq_engine;
> +	b->irq_enable = irq_enable;
> +	b->irq_disable = irq_disable;
> +
>   	return b;
>   }
>   
> @@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
>   	spin_lock_irqsave(&b->irq_lock, flags);
>   
>   	if (b->irq_enabled)
> -		irq_enable(b->irq_engine);
> +		b->irq_enable(b);
>   	else
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	spin_unlock_irqrestore(&b->irq_lock, flags);
>   }
> @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
>   	}
>   }
>   
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
> +void intel_breadcrumbs_free(struct kref *kref)
>   {
> +	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
> +
>   	irq_work_sync(&b->irq_work);
>   	GEM_BUG_ON(!list_empty(&b->signalers));
>   	GEM_BUG_ON(b->irq_armed);
> +
>   	kfree(b);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> index 3ce5ce270b04..be0d4f379a85 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> @@ -9,7 +9,7 @@
>   #include <linux/atomic.h>
>   #include <linux/irq_work.h>
>   
> -#include "intel_engine_types.h"
> +#include "intel_breadcrumbs_types.h"
>   
>   struct drm_printer;
>   struct i915_request;
> @@ -17,7 +17,7 @@ struct intel_breadcrumbs;
>   
>   struct intel_breadcrumbs *
>   intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
> +void intel_breadcrumbs_free(struct kref *kref);
>   
>   void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
>   void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
> @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
>   void intel_context_remove_breadcrumbs(struct intel_context *ce,
>   				      struct intel_breadcrumbs *b);
>   
> +static inline struct intel_breadcrumbs *
> +intel_breadcrumbs_get(struct intel_breadcrumbs *b)
> +{
> +	kref_get(&b->ref);
> +	return b;
> +}
> +
> +static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
> +{
> +	kref_put(&b->ref, intel_breadcrumbs_free);
> +}
> +
>   #endif /* __INTEL_BREADCRUMBS__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> index 3a084ce8ff5e..72dfd3748c4c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> @@ -7,10 +7,13 @@
>   #define __INTEL_BREADCRUMBS_TYPES__
>   
>   #include <linux/irq_work.h>
> +#include <linux/kref.h>
>   #include <linux/list.h>
>   #include <linux/spinlock.h>
>   #include <linux/types.h>
>   
> +#include "intel_engine_types.h"
> +
>   /*
>    * Rather than have every client wait upon all user interrupts,
>    * with the herd waking after every interrupt and each doing the
> @@ -29,6 +32,7 @@
>    * the overhead of waking that client is much preferred.
>    */
>   struct intel_breadcrumbs {
> +	struct kref ref;
>   	atomic_t active;
>   
>   	spinlock_t signalers_lock; /* protects the list of signalers */
> @@ -42,7 +46,10 @@ struct intel_breadcrumbs {
>   	bool irq_armed;
>   
>   	/* Not all breadcrumbs are attached to physical HW */
> +	intel_engine_mask_t	engine_mask;
>   	struct intel_engine_cs *irq_engine;
> +	bool	(*irq_enable)(struct intel_breadcrumbs *b);
> +	void	(*irq_disable)(struct intel_breadcrumbs *b);
>   };
>   
>   #endif /* __INTEL_BREADCRUMBS_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 9fec0aca5f4b..edbde6171bca 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
>   
>   void intel_engine_init_execlists(struct intel_engine_cs *engine);
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine);
> +void intel_engine_irq_disable(struct intel_engine_cs *engine);
> +
>   static inline void __intel_engine_reset(struct intel_engine_cs *engine,
>   					bool stalled)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index b7292d5cb7da..d95d666407f5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
>   err_cmd_parser:
>   	i915_sched_engine_put(engine->sched_engine);
>   err_sched_engine:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_status:
>   	cleanup_status_page(engine);
>   	return err;
> @@ -948,7 +948,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
>   
>   	i915_sched_engine_put(engine->sched_engine);
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   
>   	intel_engine_fini_retire(engine);
>   	intel_engine_cleanup_cmd_parser(engine);
> @@ -1265,6 +1265,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
>   	return true;
>   }
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_enable)
> +		return false;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_enable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +
> +	return true;
> +}
> +
> +void intel_engine_irq_disable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_disable)
> +		return;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_disable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +}
> +
>   void intel_engines_reset_default_submission(struct intel_gt *gt)
>   {
>   	struct intel_engine_cs *engine;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 8ad304b2f2e4..03a81e8d87f4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -21,7 +21,6 @@
>   #include "i915_pmu.h"
>   #include "i915_priolist_types.h"
>   #include "i915_selftest.h"
> -#include "intel_breadcrumbs_types.h"
>   #include "intel_sseu.h"
>   #include "intel_timeline_types.h"
>   #include "intel_uncore.h"
> @@ -63,6 +62,7 @@ struct i915_sched_engine;
>   struct intel_gt;
>   struct intel_ring;
>   struct intel_uncore;
> +struct intel_breadcrumbs;
>   
>   typedef u8 intel_engine_mask_t;
>   #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 920707e22eb0..abe48421fd7a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3407,7 +3407,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>   	intel_context_fini(&ve->context);
>   
>   	if (ve->base.breadcrumbs)
> -		intel_breadcrumbs_free(ve->base.breadcrumbs);
> +		intel_breadcrumbs_put(ve->base.breadcrumbs);
>   	if (ve->base.sched_engine)
>   		i915_sched_engine_put(ve->base.sched_engine);
>   	intel_engine_free_request_pool(&ve->base);
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 9203c766db80..fc5a65ab1937 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(timer_pending(&mock->hw_delay));
>   
>   	i915_sched_engine_put(engine->sched_engine);
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   
>   	intel_context_unpin(engine->kernel_context);
>   	intel_context_put(engine->kernel_context);
> @@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
>   	return 0;
>   
>   err_breadcrumbs:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_schedule:
>   	i915_sched_engine_put(engine->sched_engine);
>   	return -ENOMEM;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 372e0dc7617a..9f28899ff17f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1076,6 +1076,9 @@ static void __guc_context_destroy(struct intel_context *ce)
>   		struct guc_virtual_engine *ve =
>   			container_of(ce, typeof(*ve), context);
>   
> +		if (ve->base.breadcrumbs)
> +			intel_breadcrumbs_put(ve->base.breadcrumbs);
> +
>   		kfree(ve);
>   	} else {
>   		intel_context_free(ce);
> @@ -1366,6 +1369,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>   	.get_sibling = guc_virtual_get_sibling,
>   };
>   
> +static bool
> +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +	bool result = false;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		result |= intel_engine_irq_enable(sibling);
> +
> +	return result;
> +}
> +
> +static void
> +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		intel_engine_irq_disable(sibling);
> +}
> +
> +static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	int i;
> +
> +       /*
> +        * In GuC submission mode we do not know which physical engine a request
> +        * will be scheduled on, this creates a problem because the breadcrumb
> +        * interrupt is per physical engine. To work around this we attach
> +        * requests and direct all breadcrumb interrupts to the first instance
> +        * of an engine per class. In addition all breadcrumb interrupts are
> +	* enabled / disabled across an engine class in unison.
> +        */
> +	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
> +		struct intel_engine_cs *sibling =
> +			engine->gt->engine_class[engine->class][i];
> +
> +		if (sibling) {
> +			if (engine->breadcrumbs != sibling->breadcrumbs) {
> +				intel_breadcrumbs_put(engine->breadcrumbs);

So it's still only me that is bothered by the lazyness of the proposed 
init path?

intel_engines_init
   for_each_engine
     engine_setup_common
       engine->breadcrumbs = intel_breadcrumbs_create(engine);
     intel_guc_submission_setup
       intel_breadcrumbs_put(engine->breadcrumbs);
       ...

In other words leaves the code to allocate N objects in common and then 
it frees all but one of them, overriding vfuncs in the remaining instance.

Why not pull out the setup out of common, when it is obviously not 
common any more?

Regards,

Tvrtko

> +				engine->breadcrumbs =
> +					intel_breadcrumbs_get(sibling->breadcrumbs);
> +			}
> +			break;
> +		}
> +	}
> +
> +	if (engine->breadcrumbs) {
> +		engine->breadcrumbs->engine_mask |= engine->mask;
> +		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
> +		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
> +	}
> +}
> +
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
>   {
>   	struct intel_timeline *tl;
> @@ -1589,6 +1648,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   	guc_default_vfuncs(engine);
>   	guc_default_irqs(engine);
> +	guc_init_breadcrumbs(engine);
>   
>   	if (engine->class == RENDER_CLASS)
>   		rcs_submission_override(engine);
> @@ -1831,11 +1891,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.saturated = ALL_ENGINES;
> -	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> -	if (!ve->base.breadcrumbs) {
> -		kfree(ve);
> -		return ERR_PTR(-ENOMEM);
> -	}
>   
>   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>   
> @@ -1884,6 +1939,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   				sibling->emit_fini_breadcrumb;
>   			ve->base.emit_fini_breadcrumb_dw =
>   				sibling->emit_fini_breadcrumb_dw;
> +			ve->base.breadcrumbs =
> +				intel_breadcrumbs_get(sibling->breadcrumbs);
>   
>   			ve->base.flags |= sibling->flags;
>   
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
@ 2021-07-22 12:46     ` Tvrtko Ursulin
  0 siblings, 0 replies; 221+ messages in thread
From: Tvrtko Ursulin @ 2021-07-22 12:46 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel


On 16/07/2021 21:16, Matthew Brost wrote:
> With GuC virtual engines the physical engine which a request executes
> and completes on isn't known to the i915. Therefore we can't attach a
> request to a physical engines breadcrumbs. To work around this we create
> a single breadcrumbs per engine class when using GuC submission and
> direct all physical engine interrupts to this breadcrumbs.
> 
> v2:
>   (John H)
>    - Rework header file structure so intel_engine_mask_t can be in
>      intel_engine_types.h
> 
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> CC: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 ++++-
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
>   drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
>   .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
>   drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
>   9 files changed, 133 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 38cc42783dfb..2007dc6f6b99 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -15,28 +15,14 @@
>   #include "intel_gt_pm.h"
>   #include "intel_gt_requests.h"
>   
> -static bool irq_enable(struct intel_engine_cs *engine)
> +static bool irq_enable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_enable)
> -		return false;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_enable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> -
> -	return true;
> +	return intel_engine_irq_enable(b->irq_engine);
>   }
>   
> -static void irq_disable(struct intel_engine_cs *engine)
> +static void irq_disable(struct intel_breadcrumbs *b)
>   {
> -	if (!engine->irq_disable)
> -		return;
> -
> -	/* Caller disables interrupts */
> -	spin_lock(&engine->gt->irq_lock);
> -	engine->irq_disable(engine);
> -	spin_unlock(&engine->gt->irq_lock);
> +	intel_engine_irq_disable(b->irq_engine);
>   }
>   
>   static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
>   	WRITE_ONCE(b->irq_armed, true);
>   
>   	/* Requests may have completed before we could enable the interrupt. */
> -	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
> +	if (!b->irq_enabled++ && b->irq_enable(b))
>   		irq_work_queue(&b->irq_work);
>   }
>   
> @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
>   {
>   	GEM_BUG_ON(!b->irq_enabled);
>   	if (!--b->irq_enabled)
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	WRITE_ONCE(b->irq_armed, false);
>   	intel_gt_pm_put_async(b->irq_engine->gt);
> @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	if (!b)
>   		return NULL;
>   
> -	b->irq_engine = irq_engine;
> +	kref_init(&b->ref);
>   
>   	spin_lock_init(&b->signalers_lock);
>   	INIT_LIST_HEAD(&b->signalers);
> @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	spin_lock_init(&b->irq_lock);
>   	init_irq_work(&b->irq_work, signal_irq_work);
>   
> +	b->irq_engine = irq_engine;
> +	b->irq_enable = irq_enable;
> +	b->irq_disable = irq_disable;
> +
>   	return b;
>   }
>   
> @@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
>   	spin_lock_irqsave(&b->irq_lock, flags);
>   
>   	if (b->irq_enabled)
> -		irq_enable(b->irq_engine);
> +		b->irq_enable(b);
>   	else
> -		irq_disable(b->irq_engine);
> +		b->irq_disable(b);
>   
>   	spin_unlock_irqrestore(&b->irq_lock, flags);
>   }
> @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
>   	}
>   }
>   
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
> +void intel_breadcrumbs_free(struct kref *kref)
>   {
> +	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
> +
>   	irq_work_sync(&b->irq_work);
>   	GEM_BUG_ON(!list_empty(&b->signalers));
>   	GEM_BUG_ON(b->irq_armed);
> +
>   	kfree(b);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> index 3ce5ce270b04..be0d4f379a85 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> @@ -9,7 +9,7 @@
>   #include <linux/atomic.h>
>   #include <linux/irq_work.h>
>   
> -#include "intel_engine_types.h"
> +#include "intel_breadcrumbs_types.h"
>   
>   struct drm_printer;
>   struct i915_request;
> @@ -17,7 +17,7 @@ struct intel_breadcrumbs;
>   
>   struct intel_breadcrumbs *
>   intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
> -void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
> +void intel_breadcrumbs_free(struct kref *kref);
>   
>   void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
>   void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
> @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
>   void intel_context_remove_breadcrumbs(struct intel_context *ce,
>   				      struct intel_breadcrumbs *b);
>   
> +static inline struct intel_breadcrumbs *
> +intel_breadcrumbs_get(struct intel_breadcrumbs *b)
> +{
> +	kref_get(&b->ref);
> +	return b;
> +}
> +
> +static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
> +{
> +	kref_put(&b->ref, intel_breadcrumbs_free);
> +}
> +
>   #endif /* __INTEL_BREADCRUMBS__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> index 3a084ce8ff5e..72dfd3748c4c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> @@ -7,10 +7,13 @@
>   #define __INTEL_BREADCRUMBS_TYPES__
>   
>   #include <linux/irq_work.h>
> +#include <linux/kref.h>
>   #include <linux/list.h>
>   #include <linux/spinlock.h>
>   #include <linux/types.h>
>   
> +#include "intel_engine_types.h"
> +
>   /*
>    * Rather than have every client wait upon all user interrupts,
>    * with the herd waking after every interrupt and each doing the
> @@ -29,6 +32,7 @@
>    * the overhead of waking that client is much preferred.
>    */
>   struct intel_breadcrumbs {
> +	struct kref ref;
>   	atomic_t active;
>   
>   	spinlock_t signalers_lock; /* protects the list of signalers */
> @@ -42,7 +46,10 @@ struct intel_breadcrumbs {
>   	bool irq_armed;
>   
>   	/* Not all breadcrumbs are attached to physical HW */
> +	intel_engine_mask_t	engine_mask;
>   	struct intel_engine_cs *irq_engine;
> +	bool	(*irq_enable)(struct intel_breadcrumbs *b);
> +	void	(*irq_disable)(struct intel_breadcrumbs *b);
>   };
>   
>   #endif /* __INTEL_BREADCRUMBS_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> index 9fec0aca5f4b..edbde6171bca 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> @@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
>   
>   void intel_engine_init_execlists(struct intel_engine_cs *engine);
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine);
> +void intel_engine_irq_disable(struct intel_engine_cs *engine);
> +
>   static inline void __intel_engine_reset(struct intel_engine_cs *engine,
>   					bool stalled)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index b7292d5cb7da..d95d666407f5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
>   err_cmd_parser:
>   	i915_sched_engine_put(engine->sched_engine);
>   err_sched_engine:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_status:
>   	cleanup_status_page(engine);
>   	return err;
> @@ -948,7 +948,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
>   
>   	i915_sched_engine_put(engine->sched_engine);
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   
>   	intel_engine_fini_retire(engine);
>   	intel_engine_cleanup_cmd_parser(engine);
> @@ -1265,6 +1265,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
>   	return true;
>   }
>   
> +bool intel_engine_irq_enable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_enable)
> +		return false;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_enable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +
> +	return true;
> +}
> +
> +void intel_engine_irq_disable(struct intel_engine_cs *engine)
> +{
> +	if (!engine->irq_disable)
> +		return;
> +
> +	/* Caller disables interrupts */
> +	spin_lock(&engine->gt->irq_lock);
> +	engine->irq_disable(engine);
> +	spin_unlock(&engine->gt->irq_lock);
> +}
> +
>   void intel_engines_reset_default_submission(struct intel_gt *gt)
>   {
>   	struct intel_engine_cs *engine;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 8ad304b2f2e4..03a81e8d87f4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -21,7 +21,6 @@
>   #include "i915_pmu.h"
>   #include "i915_priolist_types.h"
>   #include "i915_selftest.h"
> -#include "intel_breadcrumbs_types.h"
>   #include "intel_sseu.h"
>   #include "intel_timeline_types.h"
>   #include "intel_uncore.h"
> @@ -63,6 +62,7 @@ struct i915_sched_engine;
>   struct intel_gt;
>   struct intel_ring;
>   struct intel_uncore;
> +struct intel_breadcrumbs;
>   
>   typedef u8 intel_engine_mask_t;
>   #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 920707e22eb0..abe48421fd7a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3407,7 +3407,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
>   	intel_context_fini(&ve->context);
>   
>   	if (ve->base.breadcrumbs)
> -		intel_breadcrumbs_free(ve->base.breadcrumbs);
> +		intel_breadcrumbs_put(ve->base.breadcrumbs);
>   	if (ve->base.sched_engine)
>   		i915_sched_engine_put(ve->base.sched_engine);
>   	intel_engine_free_request_pool(&ve->base);
> diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> index 9203c766db80..fc5a65ab1937 100644
> --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> @@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
>   	GEM_BUG_ON(timer_pending(&mock->hw_delay));
>   
>   	i915_sched_engine_put(engine->sched_engine);
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   
>   	intel_context_unpin(engine->kernel_context);
>   	intel_context_put(engine->kernel_context);
> @@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
>   	return 0;
>   
>   err_breadcrumbs:
> -	intel_breadcrumbs_free(engine->breadcrumbs);
> +	intel_breadcrumbs_put(engine->breadcrumbs);
>   err_schedule:
>   	i915_sched_engine_put(engine->sched_engine);
>   	return -ENOMEM;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 372e0dc7617a..9f28899ff17f 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -1076,6 +1076,9 @@ static void __guc_context_destroy(struct intel_context *ce)
>   		struct guc_virtual_engine *ve =
>   			container_of(ce, typeof(*ve), context);
>   
> +		if (ve->base.breadcrumbs)
> +			intel_breadcrumbs_put(ve->base.breadcrumbs);
> +
>   		kfree(ve);
>   	} else {
>   		intel_context_free(ce);
> @@ -1366,6 +1369,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>   	.get_sibling = guc_virtual_get_sibling,
>   };
>   
> +static bool
> +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +	bool result = false;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		result |= intel_engine_irq_enable(sibling);
> +
> +	return result;
> +}
> +
> +static void
> +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
> +{
> +	struct intel_engine_cs *sibling;
> +	intel_engine_mask_t tmp, mask = b->engine_mask;
> +
> +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> +		intel_engine_irq_disable(sibling);
> +}
> +
> +static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
> +{
> +	int i;
> +
> +       /*
> +        * In GuC submission mode we do not know which physical engine a request
> +        * will be scheduled on, this creates a problem because the breadcrumb
> +        * interrupt is per physical engine. To work around this we attach
> +        * requests and direct all breadcrumb interrupts to the first instance
> +        * of an engine per class. In addition all breadcrumb interrupts are
> +	* enabled / disabled across an engine class in unison.
> +        */
> +	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
> +		struct intel_engine_cs *sibling =
> +			engine->gt->engine_class[engine->class][i];
> +
> +		if (sibling) {
> +			if (engine->breadcrumbs != sibling->breadcrumbs) {
> +				intel_breadcrumbs_put(engine->breadcrumbs);

So it's still only me that is bothered by the lazyness of the proposed 
init path?

intel_engines_init
   for_each_engine
     engine_setup_common
       engine->breadcrumbs = intel_breadcrumbs_create(engine);
     intel_guc_submission_setup
       intel_breadcrumbs_put(engine->breadcrumbs);
       ...

In other words leaves the code to allocate N objects in common and then 
it frees all but one of them, overriding vfuncs in the remaining instance.

Why not pull out the setup out of common, when it is obviously not 
common any more?

Regards,

Tvrtko

> +				engine->breadcrumbs =
> +					intel_breadcrumbs_get(sibling->breadcrumbs);
> +			}
> +			break;
> +		}
> +	}
> +
> +	if (engine->breadcrumbs) {
> +		engine->breadcrumbs->engine_mask |= engine->mask;
> +		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
> +		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
> +	}
> +}
> +
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
>   {
>   	struct intel_timeline *tl;
> @@ -1589,6 +1648,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   
>   	guc_default_vfuncs(engine);
>   	guc_default_irqs(engine);
> +	guc_init_breadcrumbs(engine);
>   
>   	if (engine->class == RENDER_CLASS)
>   		rcs_submission_override(engine);
> @@ -1831,11 +1891,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.saturated = ALL_ENGINES;
> -	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> -	if (!ve->base.breadcrumbs) {
> -		kfree(ve);
> -		return ERR_PTR(-ENOMEM);
> -	}
>   
>   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>   
> @@ -1884,6 +1939,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
>   				sibling->emit_fini_breadcrumb;
>   			ve->base.emit_fini_breadcrumb_dw =
>   				sibling->emit_fini_breadcrumb_dw;
> +			ve->base.breadcrumbs =
> +				intel_breadcrumbs_get(sibling->breadcrumbs);
>   
>   			ve->base.flags |= sibling->flags;
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
  2021-07-22  7:57           ` Michal Wajdeczko
@ 2021-07-22 15:48             ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22 15:48 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, Daniele Ceraolo Spurio, dri-devel

On Thu, Jul 22, 2021 at 09:57:00AM +0200, Michal Wajdeczko wrote:
> 
> 
> On 22.07.2021 01:51, Daniele Ceraolo Spurio wrote:
> > 
> > 
> > On 7/19/2021 9:04 PM, Matthew Brost wrote:
> >> On Mon, Jul 19, 2021 at 05:51:46PM -0700, Daniele Ceraolo Spurio wrote:
> >>>
> >>> On 7/16/2021 1:16 PM, Matthew Brost wrote:
> >>>> Implement GuC context operations which includes GuC specific operations
> >>>> alloc, pin, unpin, and destroy.
> >>>>
> >>>> v2:
> >>>>    (Daniel Vetter)
> >>>>     - Use msleep_interruptible rather than cond_resched in busy loop
> >>>>    (Michal)
> >>>>     - Remove C++ style comment
> >>>>
> >>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
> >>>>    drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
> >>>>    drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
> >>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666
> >>>> ++++++++++++++++--
> >>>>    drivers/gpu/drm/i915/i915_reg.h               |   1 +
> >>>>    drivers/gpu/drm/i915/i915_request.c           |   1 +
> >>>>    8 files changed, 685 insertions(+), 55 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> index bd63813c8a80..32fd6647154b 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce,
> >>>> struct intel_engine_cs *engine)
> >>>>        mutex_init(&ce->pin_mutex);
> >>>> +    spin_lock_init(&ce->guc_state.lock);
> >>>> +
> >>>> +    ce->guc_id = GUC_INVALID_LRC_ID;
> >>>> +    INIT_LIST_HEAD(&ce->guc_id_link);
> >>>> +
> >>>>        i915_active_init(&ce->active,
> >>>>                 __intel_context_active, __intel_context_retire, 0);
> >>>>    }
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h
> >>>> b/drivers/gpu/drm/i915/gt/intel_context_types.h
> >>>> index 6d99631d19b9..606c480aec26 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> >>>> @@ -96,6 +96,7 @@ struct intel_context {
> >>>>    #define CONTEXT_BANNED            6
> >>>>    #define CONTEXT_FORCE_SINGLE_SUBMISSION    7
> >>>>    #define CONTEXT_NOPREEMPT        8
> >>>> +#define CONTEXT_LRCA_DIRTY        9
> >>>>        struct {
> >>>>            u64 timeout_us;
> >>>> @@ -138,14 +139,29 @@ struct intel_context {
> >>>>        u8 wa_bb_page; /* if set, page num reserved for context
> >>>> workarounds */
> >>>> +    struct {
> >>>> +        /** lock: protects everything in guc_state */
> >>>> +        spinlock_t lock;
> >>>> +        /**
> >>>> +         * sched_state: scheduling state of this context using GuC
> >>>> +         * submission
> >>>> +         */
> >>>> +        u8 sched_state;
> >>>> +    } guc_state;
> >>>> +
> >>>>        /* GuC scheduling state flags that do not require a lock. */
> >>>>        atomic_t guc_sched_state_no_lock;
> >>>> +    /* GuC LRC descriptor ID */
> >>>> +    u16 guc_id;
> >>>> +
> >>>> +    /* GuC LRC descriptor reference count */
> >>>> +    atomic_t guc_id_ref;
> >>>> +
> >>>>        /*
> >>>> -     * GuC LRC descriptor ID - Not assigned in this patch but
> >>>> future patches
> >>>> -     * in the series will.
> >>>> +     * GuC ID link - in list when unpinned but guc_id still valid
> >>>> in GuC
> >>>>         */
> >>>> -    u16 guc_id;
> >>>> +    struct list_head guc_id_link;
> >>>>    };
> >>>>    #endif /* __INTEL_CONTEXT_TYPES__ */
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> >>>> b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> >>>> index 41e5350a7a05..49d4857ad9b7 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> >>>> @@ -87,7 +87,6 @@
> >>>>    #define GEN11_CSB_WRITE_PTR_MASK    (GEN11_CSB_PTR_MASK << 0)
> >>>>    #define MAX_CONTEXT_HW_ID    (1 << 21) /* exclusive */
> >>>> -#define MAX_GUC_CONTEXT_HW_ID    (1 << 20) /* exclusive */
> >>>>    #define GEN11_MAX_CONTEXT_HW_ID    (1 << 11) /* exclusive */
> >>>>    /* in Gen12 ID 0x7FF is reserved to indicate idle */
> >>>>    #define GEN12_MAX_CONTEXT_HW_ID    (GEN11_MAX_CONTEXT_HW_ID - 1)
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>> index 8c7b92f699f1..30773cd699f5 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>> @@ -7,6 +7,7 @@
> >>>>    #define _INTEL_GUC_H_
> >>>>    #include <linux/xarray.h>
> >>>> +#include <linux/delay.h>
> >>>>    #include "intel_uncore.h"
> >>>>    #include "intel_guc_fw.h"
> >>>> @@ -44,6 +45,14 @@ struct intel_guc {
> >>>>            void (*disable)(struct intel_guc *guc);
> >>>>        } interrupts;
> >>>> +    /*
> >>>> +     * contexts_lock protects the pool of free guc ids and a linked
> >>>> list of
> >>>> +     * guc ids available to be stolen
> >>>> +     */
> >>>> +    spinlock_t contexts_lock;
> >>>> +    struct ida guc_ids;
> >>>> +    struct list_head guc_id_list;
> >>>> +
> >>>>        bool submission_selected;
> >>>>        struct i915_vma *ads_vma;
> >>>> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc
> >>>> *guc, const u32 *action, u32 len,
> >>>>                     response_buf, response_buf_size, 0);
> >>>>    }
> >>>> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> >>>> +                       const u32 *action,
> >>>> +                       u32 len,
> >>>> +                       bool loop)
> >>>> +{
> >>>> +    int err;
> >>>> +    unsigned int sleep_period_ms = 1;
> >>>> +    bool not_atomic = !in_atomic() && !irqs_disabled();
> >>>> +
> >>>> +    /* No sleeping with spin locks, just busy loop */
> >>>> +    might_sleep_if(loop && not_atomic);
> >>>> +
> >>>> +retry:
> >>>> +    err = intel_guc_send_nb(guc, action, len);
> >>>> +    if (unlikely(err == -EBUSY && loop)) {
> >>>> +        if (likely(not_atomic)) {
> >>>> +            if (msleep_interruptible(sleep_period_ms))
> >>>> +                return -EINTR;
> >>>> +            sleep_period_ms = sleep_period_ms << 1;
> >>>> +        } else {
> >>>> +            cpu_relax();
> >>> Potentially something we can change later, but if we're in atomic
> >>> context we
> >>> can't keep looping without a timeout while we get -EBUSY, we need to
> >>> bail
> >>> out early.
> >>>
> >> Eventually intel_guc_send_nb will pop with a different return code after
> >> 1 second I think if the GuC is hung.
> > 
> > I know, the point is that 1 second is too long in atomic context. This
> > has to either complete quick in atomic or be forked off to a worker. Can
> > fix as a follow up.
> 
> if we implement fallback to a worker, then maybe we can get rid of this
> busy loop completely, as it was supposed to be only needed for rare
> cases when CTBs were full due to unlikely congestion either caused by us

Right. That last project I worked on we had a circular buffer to
communicate with the HW and fell back to kthread (or workqueue?) when
the buffer the was full. We likely could do something like that but if
we have malloc memory that gets tricky if we are atomic contexts. This
is good one to write down and revisit a bit later.

Matt

> (host driver didn't read G2H on time or filled up H2G faster than GuC
> could proceed it) or GuC actual failure (no H2G processing at all).
> 
> Michal
> 
> > 
> > Daniele
> > 
> >>  
> >>>> +        }
> >>>> +        goto retry;
> >>>> +    }
> >>>> +
> >>>> +    return err;
> >>>> +}
> >>>> +
> >>>>    static inline void intel_guc_to_host_event_handler(struct
> >>>> intel_guc *guc)
> >>>>    {
> >>>>        intel_guc_ct_event_handler(&guc->ct);
> >>>> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct
> >>>> intel_guc *guc, u32 mask)
> >>>>    int intel_guc_reset_engine(struct intel_guc *guc,
> >>>>                   struct intel_engine_cs *engine);
> >>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >>>> +                      const u32 *msg, u32 len);
> >>>> +
> >>>>    void intel_guc_load_status(struct intel_guc *guc, struct
> >>>> drm_printer *p);
> >>>>    #endif
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> index 83ec60ea3f89..28ff82c5be45 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> @@ -928,6 +928,10 @@ static int ct_process_request(struct
> >>>> intel_guc_ct *ct, struct ct_incoming_msg *r
> >>>>        case INTEL_GUC_ACTION_DEFAULT:
> >>>>            ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
> >>>>            break;
> >>>> +    case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> >>>> +        ret = intel_guc_deregister_done_process_msg(guc, payload,
> >>>> +                                len);
> >>>> +        break;
> >>>>        default:
> >>>>            ret = -EOPNOTSUPP;
> >>>>            break;
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> >>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> >>>> index 53b4a5eb4a85..a47b3813b4d0 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> >>>> @@ -13,7 +13,9 @@
> >>>>    #include "gt/intel_gt.h"
> >>>>    #include "gt/intel_gt_irq.h"
> >>>>    #include "gt/intel_gt_pm.h"
> >>>> +#include "gt/intel_gt_requests.h"
> >>>>    #include "gt/intel_lrc.h"
> >>>> +#include "gt/intel_lrc_reg.h"
> >>>>    #include "gt/intel_mocs.h"
> >>>>    #include "gt/intel_ring.h"
> >>>> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct
> >>>> intel_context *ce)
> >>>>               &ce->guc_sched_state_no_lock);
> >>>>    }
> >>>> +/*
> >>>> + * Below is a set of functions which control the GuC scheduling
> >>>> state which
> >>>> + * require a lock, aside from the special case where the functions
> >>>> are called
> >>>> + * from guc_lrc_desc_pin(). In that case it isn't possible for any
> >>>> other code
> >>>> + * path to be executing on the context.
> >>>> + */
> >>> Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even
> >>> if it
> >>> isn't strictly needed? I'd prefer to avoid the asymmetry in the locking
> >> If this code is called from releasing a fence a circular dependency is
> >> created. Once we move the DRM scheduler the locking becomes way easier
> >> and we clean this up then. We already have task for locking clean up.
> >>
> >>> scheme if possible, as that might case trouble if thing change in the
> >>> future.
> >> Again this is temporary code, so I think we can live with this until the
> >> DRM scheduler lands.
> >>
> >>>> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER    BIT(0)
> >>>> +#define SCHED_STATE_DESTROYED                BIT(1)
> >>>> +static inline void init_sched_state(struct intel_context *ce)
> >>>> +{
> >>>> +    /* Only should be called from guc_lrc_desc_pin() */
> >>>> +    atomic_set(&ce->guc_sched_state_no_lock, 0);
> >>>> +    ce->guc_state.sched_state = 0;
> >>>> +}
> >>>> +
> >>>> +static inline bool
> >>>> +context_wait_for_deregister_to_register(struct intel_context *ce)
> >>>> +{
> >>>> +    return (ce->guc_state.sched_state &
> >>>> +        SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> >>> No need for (). Below as well.
> >>>
> >> Yep.
> >>
> >>>> +}
> >>>> +
> >>>> +static inline void
> >>>> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> >>>> +{
> >>>> +    /* Only should be called from guc_lrc_desc_pin() */
> >>>> +    ce->guc_state.sched_state |=
> >>>> +        SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> >>>> +}
> >>>> +
> >>>> +static inline void
> >>>> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> >>>> +{
> >>>> +    lockdep_assert_held(&ce->guc_state.lock);
> >>>> +    ce->guc_state.sched_state =
> >>>> +        (ce->guc_state.sched_state &
> >>>> +         ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> >>> nit: can also use
> >>>
> >>> ce->guc_state.sched_state &=
> >>>     ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER
> >>>
> >> Yep.
> >>  
> >>>> +}
> >>>> +
> >>>> +static inline bool
> >>>> +context_destroyed(struct intel_context *ce)
> >>>> +{
> >>>> +    return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> >>>> +}
> >>>> +
> >>>> +static inline void
> >>>> +set_context_destroyed(struct intel_context *ce)
> >>>> +{
> >>>> +    lockdep_assert_held(&ce->guc_state.lock);
> >>>> +    ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> >>>> +}
> >>>> +
> >>>> +static inline bool context_guc_id_invalid(struct intel_context *ce)
> >>>> +{
> >>>> +    return (ce->guc_id == GUC_INVALID_LRC_ID);
> >>>> +}
> >>>> +
> >>>> +static inline void set_context_guc_id_invalid(struct intel_context
> >>>> *ce)
> >>>> +{
> >>>> +    ce->guc_id = GUC_INVALID_LRC_ID;
> >>>> +}
> >>>> +
> >>>> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> >>>> +{
> >>>> +    return &ce->engine->gt->uc.guc;
> >>>> +}
> >>>> +
> >>>>    static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >>>>    {
> >>>>        return rb_entry(rb, struct i915_priolist, node);
> >>>> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc
> >>>> *guc, struct i915_request *rq)
> >>>>        int len = 0;
> >>>>        bool enabled = context_enabled(ce);
> >>>> +    GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> >>>> +    GEM_BUG_ON(context_guc_id_invalid(ce));
> >>>> +
> >>>>        if (!enabled) {
> >>>>            action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >>>>            action[len++] = ce->guc_id;
> >>>> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc
> >>>> *guc)
> >>>>        xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> >>>> +    spin_lock_init(&guc->contexts_lock);
> >>>> +    INIT_LIST_HEAD(&guc->guc_id_list);
> >>>> +    ida_init(&guc->guc_ids);
> >>>> +
> >>>>        return 0;
> >>>>    }
> >>>> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct
> >>>> intel_guc *guc)
> >>>>        i915_sched_engine_put(guc->sched_engine);
> >>>>    }
> >>>> -static int guc_context_alloc(struct intel_context *ce)
> >>>> +static inline void queue_request(struct i915_sched_engine
> >>>> *sched_engine,
> >>>> +                 struct i915_request *rq,
> >>>> +                 int prio)
> >>>>    {
> >>>> -    return lrc_alloc(ce, ce->engine);
> >>>> +    GEM_BUG_ON(!list_empty(&rq->sched.link));
> >>>> +    list_add_tail(&rq->sched.link,
> >>>> +              i915_sched_lookup_priolist(sched_engine, prio));
> >>>> +    set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >>>> +}
> >>>> +
> >>>> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> >>>> +                     struct i915_request *rq)
> >>>> +{
> >>>> +    int ret;
> >>>> +
> >>>> +    __i915_request_submit(rq);
> >>>> +
> >>>> +    trace_i915_request_in(rq, 0);
> >>>> +
> >>>> +    guc_set_lrc_tail(rq);
> >>>> +    ret = guc_add_request(guc, rq);
> >>>> +    if (ret == -EBUSY)
> >>>> +        guc->stalled_request = rq;
> >>>> +
> >>>> +    return ret;
> >>>> +}
> >>>> +
> >>>> +static void guc_submit_request(struct i915_request *rq)
> >>>> +{
> >>>> +    struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> >>>> +    struct intel_guc *guc = &rq->engine->gt->uc.guc;
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    /* Will be called from irq-context when using foreign fences. */
> >>>> +    spin_lock_irqsave(&sched_engine->lock, flags);
> >>>> +
> >>>> +    if (guc->stalled_request ||
> >>>> !i915_sched_engine_is_empty(sched_engine))
> >>>> +        queue_request(sched_engine, rq, rq_prio(rq));
> >>>> +    else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> >>>> +        tasklet_hi_schedule(&sched_engine->tasklet);
> >>>> +
> >>>> +    spin_unlock_irqrestore(&sched_engine->lock, flags);
> >>>> +}
> >>>> +
> >>>> +#define GUC_ID_START    64    /* First 64 guc_ids reserved */
> >>>> +static int new_guc_id(struct intel_guc *guc)
> >>>> +{
> >>>> +    return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> >>>> +                  GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> >>>> +                  __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> >>>> +}
> >>>> +
> >>>> +static void __release_guc_id(struct intel_guc *guc, struct
> >>>> intel_context *ce)
> >>>> +{
> >>>> +    if (!context_guc_id_invalid(ce)) {
> >>>> +        ida_simple_remove(&guc->guc_ids, ce->guc_id);
> >>>> +        reset_lrc_desc(guc, ce->guc_id);
> >>>> +        set_context_guc_id_invalid(ce);
> >>>> +    }
> >>>> +    if (!list_empty(&ce->guc_id_link))
> >>>> +        list_del_init(&ce->guc_id_link);
> >>>> +}
> >>>> +
> >>>> +static void release_guc_id(struct intel_guc *guc, struct
> >>>> intel_context *ce)
> >>>> +{
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
> >>>> +    __release_guc_id(guc, ce);
> >>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +}
> >>>> +
> >>>> +static int steal_guc_id(struct intel_guc *guc)
> >>>> +{
> >>>> +    struct intel_context *ce;
> >>>> +    int guc_id;
> >>>> +
> >>>> +    if (!list_empty(&guc->guc_id_list)) {
> >>>> +        ce = list_first_entry(&guc->guc_id_list,
> >>>> +                      struct intel_context,
> >>>> +                      guc_id_link);
> >>>> +
> >>>> +        GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> >>>> +        GEM_BUG_ON(context_guc_id_invalid(ce));
> >>>> +
> >>>> +        list_del_init(&ce->guc_id_link);
> >>>> +        guc_id = ce->guc_id;
> >>>> +        set_context_guc_id_invalid(ce);
> >>>> +        return guc_id;
> >>>> +    } else {
> >>>> +        return -EAGAIN;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> >>>> +{
> >>>> +    int ret;
> >>>> +
> >>>> +    ret = new_guc_id(guc);
> >>>> +    if (unlikely(ret < 0)) {
> >>>> +        ret = steal_guc_id(guc);
> >>>> +        if (ret < 0)
> >>>> +            return ret;
> >>>> +    }
> >>>> +
> >>>> +    *out = ret;
> >>>> +    return 0;
> >>> Is it worth adding spinlock_held asserts for guc->contexts_lock in
> >>> these ID
> >>> functions? Doubles up as a documentation of what locking we expect.
> >>>
> >> Never a bad idea to have asserts in the code. Will add.
> >>
> >>>> +}
> >>>> +
> >>>> +#define PIN_GUC_ID_TRIES    4
> >>>> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> >>>> +{
> >>>> +    int ret = 0;
> >>>> +    unsigned long flags, tries = PIN_GUC_ID_TRIES;
> >>>> +
> >>>> +    GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> >>>> +
> >>>> +try_again:
> >>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
> >>>> +
> >>>> +    if (context_guc_id_invalid(ce)) {
> >>>> +        ret = assign_guc_id(guc, &ce->guc_id);
> >>>> +        if (ret)
> >>>> +            goto out_unlock;
> >>>> +        ret = 1;    /* Indidcates newly assigned guc_id */
> >>>> +    }
> >>>> +    if (!list_empty(&ce->guc_id_link))
> >>>> +        list_del_init(&ce->guc_id_link);
> >>>> +    atomic_inc(&ce->guc_id_ref);
> >>>> +
> >>>> +out_unlock:
> >>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +
> >>>> +    /*
> >>>> +     * -EAGAIN indicates no guc_ids are available, let's retire any
> >>>> +     * outstanding requests to see if that frees up a guc_id. If
> >>>> the first
> >>>> +     * retire didn't help, insert a sleep with the timeslice
> >>>> duration before
> >>>> +     * attempting to retire more requests. Double the sleep period
> >>>> each
> >>>> +     * subsequent pass before finally giving up. The sleep period
> >>>> has max of
> >>>> +     * 100ms and minimum of 1ms.
> >>>> +     */
> >>>> +    if (ret == -EAGAIN && --tries) {
> >>>> +        if (PIN_GUC_ID_TRIES - tries > 1) {
> >>>> +            unsigned int timeslice_shifted =
> >>>> +                ce->engine->props.timeslice_duration_ms <<
> >>>> +                (PIN_GUC_ID_TRIES - tries - 2);
> >>>> +            unsigned int max = min_t(unsigned int, 100,
> >>>> +                         timeslice_shifted);
> >>>> +
> >>>> +            msleep(max_t(unsigned int, max, 1));
> >>>> +        }
> >>>> +        intel_gt_retire_requests(guc_to_gt(guc));
> >>>> +        goto try_again;
> >>>> +    }
> >>>> +
> >>>> +    return ret;
> >>>> +}
> >>>> +
> >>>> +static void unpin_guc_id(struct intel_guc *guc, struct
> >>>> intel_context *ce)
> >>>> +{
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> >>>> +
> >>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
> >>>> +    if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> >>>> +        !atomic_read(&ce->guc_id_ref))
> >>>> +        list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> >>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +}
> >>>> +
> >>>> +static int __guc_action_register_context(struct intel_guc *guc,
> >>>> +                     u32 guc_id,
> >>>> +                     u32 offset)
> >>>> +{
> >>>> +    u32 action[] = {
> >>>> +        INTEL_GUC_ACTION_REGISTER_CONTEXT,
> >>>> +        guc_id,
> >>>> +        offset,
> >>>> +    };
> >>>> +
> >>>> +    return intel_guc_send_busy_loop(guc, action,
> >>>> ARRAY_SIZE(action), true);
> >>>> +}
> >>>> +
> >>>> +static int register_context(struct intel_context *ce)
> >>>> +{
> >>>> +    struct intel_guc *guc = ce_to_guc(ce);
> >>>> +    u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> >>>> +        ce->guc_id * sizeof(struct guc_lrc_desc);
> >>>> +
> >>>> +    return __guc_action_register_context(guc, ce->guc_id, offset);
> >>>> +}
> >>>> +
> >>>> +static int __guc_action_deregister_context(struct intel_guc *guc,
> >>>> +                       u32 guc_id)
> >>>> +{
> >>>> +    u32 action[] = {
> >>>> +        INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> >>>> +        guc_id,
> >>>> +    };
> >>>> +
> >>>> +    return intel_guc_send_busy_loop(guc, action,
> >>>> ARRAY_SIZE(action), true);
> >>>> +}
> >>>> +
> >>>> +static int deregister_context(struct intel_context *ce, u32 guc_id)
> >>>> +{
> >>>> +    struct intel_guc *guc = ce_to_guc(ce);
> >>>> +
> >>>> +    return __guc_action_deregister_context(guc, guc_id);
> >>>> +}
> >>>> +
> >>>> +static intel_engine_mask_t adjust_engine_mask(u8 class,
> >>>> intel_engine_mask_t mask)
> >>>> +{
> >>>> +    switch (class) {
> >>>> +    case RENDER_CLASS:
> >>>> +        return mask >> RCS0;
> >>>> +    case VIDEO_ENHANCEMENT_CLASS:
> >>>> +        return mask >> VECS0;
> >>>> +    case VIDEO_DECODE_CLASS:
> >>>> +        return mask >> VCS0;
> >>>> +    case COPY_ENGINE_CLASS:
> >>>> +        return mask >> BCS0;
> >>>> +    default:
> >>>> +        GEM_BUG_ON("Invalid Class");
> >>> we usually use MISSING_CASE for this type of errors.
> >> As soon as start talking in logical space (series immediatly after this
> >> gets merged) this function gets deleted but will fix.
> >>
> >>>> +        return 0;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static void guc_context_policy_init(struct intel_engine_cs *engine,
> >>>> +                    struct guc_lrc_desc *desc)
> >>>> +{
> >>>> +    desc->policy_flags = 0;
> >>>> +
> >>>> +    desc->execution_quantum =
> >>>> CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> >>>> +    desc->preemption_timeout =
> >>>> CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> >>>> +}
> >>>> +
> >>>> +static int guc_lrc_desc_pin(struct intel_context *ce)
> >>>> +{
> >>>> +    struct intel_runtime_pm *runtime_pm =
> >>>> +        &ce->engine->gt->i915->runtime_pm;
> >>> If you move this after the engine var below you can skip the ce->engine
> >>> jump. Also, you can shorten the pointer chasing by using
> >>> engine->uncore->rpm.
> >>>
> >> Sure.
> >>
> >>>> +    struct intel_engine_cs *engine = ce->engine;
> >>>> +    struct intel_guc *guc = &engine->gt->uc.guc;
> >>>> +    u32 desc_idx = ce->guc_id;
> >>>> +    struct guc_lrc_desc *desc;
> >>>> +    bool context_registered;
> >>>> +    intel_wakeref_t wakeref;
> >>>> +    int ret = 0;
> >>>> +
> >>>> +    GEM_BUG_ON(!engine->mask);
> >>>> +
> >>>> +    /*
> >>>> +     * Ensure LRC + CT vmas are is same region as write barrier is
> >>>> done
> >>>> +     * based on CT vma region.
> >>>> +     */
> >>>> +    GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> >>>> +           i915_gem_object_is_lmem(ce->ring->vma->obj));
> >>>> +
> >>>> +    context_registered = lrc_desc_registered(guc, desc_idx);
> >>>> +
> >>>> +    reset_lrc_desc(guc, desc_idx);
> >>>> +    set_lrc_desc_registered(guc, desc_idx, ce);
> >>>> +
> >>>> +    desc = __get_lrc_desc(guc, desc_idx);
> >>>> +    desc->engine_class = engine_class_to_guc_class(engine->class);
> >>>> +    desc->engine_submit_mask = adjust_engine_mask(engine->class,
> >>>> +                              engine->mask);
> >>>> +    desc->hw_context_desc = ce->lrc.lrca;
> >>>> +    desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> >>>> +    desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> >>>> +    guc_context_policy_init(engine, desc);
> >>>> +    init_sched_state(ce);
> >>>> +
> >>>> +    /*
> >>>> +     * The context_lookup xarray is used to determine if the hardware
> >>>> +     * context is currently registered. There are two cases in
> >>>> which it
> >>>> +     * could be regisgered either the guc_id has been stole from from
> >>> typo regisgered
> >>>
> >> John got this one. Fixed.
> >>  
> >>>> +     * another context or the lrc descriptor address of this
> >>>> context has
> >>>> +     * changed. In either case the context needs to be deregistered
> >>>> with the
> >>>> +     * GuC before registering this context.
> >>>> +     */
> >>>> +    if (context_registered) {
> >>>> +        set_context_wait_for_deregister_to_register(ce);
> >>>> +        intel_context_get(ce);
> >>>> +
> >>>> +        /*
> >>>> +         * If stealing the guc_id, this ce has the same guc_id as the
> >>>> +         * context whos guc_id was stole.
> >>>> +         */
> >>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
> >>>> +            ret = deregister_context(ce, ce->guc_id);
> >>>> +    } else {
> >>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
> >>>> +            ret = register_context(ce);
> >>>> +    }
> >>>> +
> >>>> +    return ret;
> >>>>    }
> >>>>    static int guc_context_pre_pin(struct intel_context *ce,
> >>>> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct
> >>>> intel_context *ce,
> >>>>    static int guc_context_pin(struct intel_context *ce, void *vaddr)
> >>>>    {
> >>>> +    if (i915_ggtt_offset(ce->state) !=
> >>>> +        (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> >>>> +        set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> >>> Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being
> >>> re-assigned in there; it looks like it can only change if the context is
> >>> new, but for future-proofing IMO better to re-order here.
> >>>
> >> No this is right. ce->state is assigned in lr_pre_pin and ce->lrc.lrca is
> >> assigned in lrc_pin. To catch the change you have to check between
> >> those two functions.
> >>
> >>>> +
> >>>>        return lrc_pin(ce, ce->engine, vaddr);
> >>> Could use a comment to say that the GuC context pinning call is delayed
> >>> until request alloc and to look at the comment in the latter function
> >>> for
> >>> details (or move the explanation here and refer here from request
> >>> alloc).
> >>>
> >> Yes.
> >>  
> >>>>    }
> >>>> +static void guc_context_unpin(struct intel_context *ce)
> >>>> +{
> >>>> +    struct intel_guc *guc = ce_to_guc(ce);
> >>>> +
> >>>> +    unpin_guc_id(guc, ce);
> >>>> +    lrc_unpin(ce);
> >>>> +}
> >>>> +
> >>>> +static void guc_context_post_unpin(struct intel_context *ce)
> >>>> +{
> >>>> +    lrc_post_unpin(ce);
> >>> why do we need this function? we can just pass lrc_post_unpin to the ops
> >>> (like we already do).
> >>>
> >> We don't before style reasons I think having a GuC specific wrapper is
> >> correct.
> >>
> >>>> +}
> >>>> +
> >>>> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >>>> +{
> >>>> +    struct intel_engine_cs *engine = ce->engine;
> >>>> +    struct intel_guc *guc = &engine->gt->uc.guc;
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> >>>> +    GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> >>>> +
> >>>> +    spin_lock_irqsave(&ce->guc_state.lock, flags);
> >>>> +    set_context_destroyed(ce);
> >>>> +    spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >>>> +
> >>>> +    deregister_context(ce, ce->guc_id);
> >>>> +}
> >>>> +
> >>>> +static void guc_context_destroy(struct kref *kref)
> >>>> +{
> >>>> +    struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> >>>> +    struct intel_runtime_pm *runtime_pm =
> >>>> &ce->engine->gt->i915->runtime_pm;
> >>> same as above, going through engine->uncore->rpm is shorter
> >>>
> >> Sure.
> >>  
> >>>> +    struct intel_guc *guc = &ce->engine->gt->uc.guc;
> >>>> +    intel_wakeref_t wakeref;
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    /*
> >>>> +     * If the guc_id is invalid this context has been stolen and we
> >>>> can free
> >>>> +     * it immediately. Also can be freed immediately if the context
> >>>> is not
> >>>> +     * registered with the GuC.
> >>>> +     */
> >>>> +    if (context_guc_id_invalid(ce) ||
> >>>> +        !lrc_desc_registered(guc, ce->guc_id)) {
> >>>> +        release_guc_id(guc, ce);
> >>> it feels a bit weird that we call release_guc_id in the case where
> >>> the id is
> >>> invalid. The code handles it fine, but still the flow doesn't feel
> >>> clean.
> >>> Not a blocker.
> >>>
> >> I understand. Let's see if this can get cleaned up at some point.
> >>
> >>>> +        lrc_destroy(kref);
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    /*
> >>>> +     * We have to acquire the context spinlock and check guc_id
> >>>> again, if it
> >>>> +     * is valid it hasn't been stolen and needs to be deregistered. We
> >>>> +     * delete this context from the list of unpinned guc_ids
> >>>> available to
> >>>> +     * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> >>>> +     * returns indicating this context has been deregistered the
> >>>> guc_id is
> >>>> +     * returned to the pool of available guc_ids.
> >>>> +     */
> >>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
> >>>> +    if (context_guc_id_invalid(ce)) {
> >>>> +        __release_guc_id(guc, ce);
> >>> But here the call to __release_guc_id is unneded, right? the ce
> >>> doesn't own
> >>> the ID anymore and both the steal and the release function already clean
> >>> ce->guc_id_link, so there should be nothing left to clean.
> >>>
> >> This is dead code. Will delete.
> >>
> >>>> +        spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +        lrc_destroy(kref);
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    if (!list_empty(&ce->guc_id_link))
> >>>> +        list_del_init(&ce->guc_id_link);
> >>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +
> >>>> +    /*
> >>>> +     * We defer GuC context deregistration until the context is
> >>>> destroyed
> >>>> +     * in order to save on CTBs. With this optimization ideally we
> >>>> only need
> >>>> +     * 1 CTB to register the context during the first pin and 1 CTB to
> >>>> +     * deregister the context when the context is destroyed.
> >>>> Without this
> >>>> +     * optimization, a CTB would be needed every pin & unpin.
> >>>> +     *
> >>>> +     * XXX: Need to acqiure the runtime wakeref as this can be
> >>>> triggered
> >>>> +     * from context_free_worker when not runtime wakeref is held.
> >>>> +     * guc_lrc_desc_unpin requires the runtime as a GuC register is
> >>>> written
> >>>> +     * in H2G CTB to deregister the context. A future patch may
> >>>> defer this
> >>>> +     * H2G CTB if the runtime wakeref is zero.
> >>>> +     */
> >>>> +    with_intel_runtime_pm(runtime_pm, wakeref)
> >>>> +        guc_lrc_desc_unpin(ce);
> >>>> +}
> >>>> +
> >>>> +static int guc_context_alloc(struct intel_context *ce)
> >>>> +{
> >>>> +    return lrc_alloc(ce, ce->engine);
> >>>> +}
> >>>> +
> >>>>    static const struct intel_context_ops guc_context_ops = {
> >>>>        .alloc = guc_context_alloc,
> >>>>        .pre_pin = guc_context_pre_pin,
> >>>>        .pin = guc_context_pin,
> >>>> -    .unpin = lrc_unpin,
> >>>> -    .post_unpin = lrc_post_unpin,
> >>>> +    .unpin = guc_context_unpin,
> >>>> +    .post_unpin = guc_context_post_unpin,
> >>>>        .enter = intel_context_enter_engine,
> >>>>        .exit = intel_context_exit_engine,
> >>>>        .reset = lrc_reset,
> >>>> -    .destroy = lrc_destroy,
> >>>> +    .destroy = guc_context_destroy,
> >>>>    };
> >>>> -static int guc_request_alloc(struct i915_request *request)
> >>>> +static bool context_needs_register(struct intel_context *ce, bool
> >>>> new_guc_id)
> >>>>    {
> >>>> +    return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> >>>> +        !lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> >>>> +}
> >>>> +
> >>>> +static int guc_request_alloc(struct i915_request *rq)
> >>>> +{
> >>>> +    struct intel_context *ce = rq->context;
> >>>> +    struct intel_guc *guc = ce_to_guc(ce);
> >>>>        int ret;
> >>>> -    GEM_BUG_ON(!intel_context_is_pinned(request->context));
> >>>> +    GEM_BUG_ON(!intel_context_is_pinned(rq->context));
> >>>>        /*
> >>>>         * Flush enough space to reduce the likelihood of waiting after
> >>>>         * we start building the request - in which case we will just
> >>>>         * have to repeat work.
> >>>>         */
> >>>> -    request->reserved_space += GUC_REQUEST_SIZE;
> >>>> +    rq->reserved_space += GUC_REQUEST_SIZE;
> >>>>        /*
> >>>>         * Note that after this point, we have committed to using
> >>>> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct
> >>>> i915_request *request)
> >>>>         */
> >>>>        /* Unconditionally invalidate GPU caches and TLBs. */
> >>>> -    ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> >>>> +    ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> >>>>        if (ret)
> >>>>            return ret;
> >>>> -    request->reserved_space -= GUC_REQUEST_SIZE;
> >>>> -    return 0;
> >>>> -}
> >>>> +    rq->reserved_space -= GUC_REQUEST_SIZE;
> >>>> -static inline void queue_request(struct i915_sched_engine
> >>>> *sched_engine,
> >>>> -                 struct i915_request *rq,
> >>>> -                 int prio)
> >>>> -{
> >>>> -    GEM_BUG_ON(!list_empty(&rq->sched.link));
> >>>> -    list_add_tail(&rq->sched.link,
> >>>> -              i915_sched_lookup_priolist(sched_engine, prio));
> >>>> -    set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >>>> -}
> >>>> -
> >>>> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> >>>> -                     struct i915_request *rq)
> >>>> -{
> >>>> -    int ret;
> >>>> -
> >>>> -    __i915_request_submit(rq);
> >>>> -
> >>>> -    trace_i915_request_in(rq, 0);
> >>>> -
> >>>> -    guc_set_lrc_tail(rq);
> >>>> -    ret = guc_add_request(guc, rq);
> >>>> -    if (ret == -EBUSY)
> >>>> -        guc->stalled_request = rq;
> >>>> -
> >>>> -    return ret;
> >>>> -}
> >>>> -
> >>>> -static void guc_submit_request(struct i915_request *rq)
> >>>> -{
> >>>> -    struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> >>>> -    struct intel_guc *guc = &rq->engine->gt->uc.guc;
> >>>> -    unsigned long flags;
> >>>> +    /*
> >>>> +     * Call pin_guc_id here rather than in the pinning step as with
> >>>> +     * dma_resv, contexts can be repeatedly pinned / unpinned
> >>>> trashing the
> >>>> +     * guc_ids and creating horrible race conditions. This is
> >>>> especially bad
> >>>> +     * when guc_ids are being stolen due to over subscription. By
> >>>> the time
> >>>> +     * this function is reached, it is guaranteed that the guc_id
> >>>> will be
> >>>> +     * persistent until the generated request is retired. Thus,
> >>>> sealing these
> >>>> +     * race conditions. It is still safe to fail here if guc_ids are
> >>>> +     * exhausted and return -EAGAIN to the user indicating that
> >>>> they can try
> >>>> +     * again in the future.
> >>>> +     *
> >>>> +     * There is no need for a lock here as the timeline mutex
> >>>> ensures at
> >>>> +     * most one context can be executing this code path at once. The
> >>>> +     * guc_id_ref is incremented once for every request in flight and
> >>>> +     * decremented on each retire. When it is zero, a lock around the
> >>>> +     * increment (in pin_guc_id) is needed to seal a race with
> >>>> unpin_guc_id.
> >>>> +     */
> >>>> +    if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> >>>> +        return 0;
> >>>> -    /* Will be called from irq-context when using foreign fences. */
> >>>> -    spin_lock_irqsave(&sched_engine->lock, flags);
> >>>> +    ret = pin_guc_id(guc, ce);    /* returns 1 if new guc_id
> >>>> assigned */
> >>>> +    if (unlikely(ret < 0))
> >>>> +        return ret;;
> >>> typo ";;"
> >>>
> >> Yep.
> >>
> >>>> +    if (context_needs_register(ce, !!ret)) {
> >>>> +        ret = guc_lrc_desc_pin(ce);
> >>>> +        if (unlikely(ret)) {    /* unwind */
> >>>> +            atomic_dec(&ce->guc_id_ref);
> >>>> +            unpin_guc_id(guc, ce);
> >>>> +            return ret;
> >>>> +        }
> >>>> +    }
> >>>> -    if (guc->stalled_request ||
> >>>> !i915_sched_engine_is_empty(sched_engine))
> >>>> -        queue_request(sched_engine, rq, rq_prio(rq));
> >>>> -    else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> >>>> -        tasklet_hi_schedule(&sched_engine->tasklet);
> >>>> +    clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> >>>
> >>> Might be worth moving the pinning to a separate function to keep the
> >>> request_alloc focused on the request. Can be done as a follow up.
> >>>
> >> Sure. Let me see how that looks in a follow up.
> >>  
> >>>> -    spin_unlock_irqrestore(&sched_engine->lock, flags);
> >>>> +    return 0;
> >>>>    }
> >>>>    static void sanitize_hwsp(struct intel_engine_cs *engine)
> >>>> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct
> >>>> intel_engine_cs *engine)
> >>>>        engine->submit_request = guc_submit_request;
> >>>>    }
> >>>> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> >>>> +                      struct intel_context *ce)
> >>>> +{
> >>>> +    if (context_guc_id_invalid(ce))
> >>>> +        pin_guc_id(guc, ce);
> >>>> +    guc_lrc_desc_pin(ce);
> >>>> +}
> >>>> +
> >>>> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> >>>> +{
> >>>> +    struct intel_gt *gt = guc_to_gt(guc);
> >>>> +    struct intel_engine_cs *engine;
> >>>> +    enum intel_engine_id id;
> >>>> +
> >>>> +    /* make sure all descriptors are clean... */
> >>>> +    xa_destroy(&guc->context_lookup);
> >>>> +
> >>>> +    /*
> >>>> +     * Some contexts might have been pinned before we enabled GuC
> >>>> +     * submission, so we need to add them to the GuC bookeeping.
> >>>> +     * Also, after a reset the GuC we want to make sure that the
> >>>> information
> >>>> +     * shared with GuC is properly reset. The kernel lrcs are not
> >>>> attached
> >>>> +     * to the gem_context, so they need to be added separately.
> >>>> +     *
> >>>> +     * Note: we purposely do not check the error return of
> >>>> +     * guc_lrc_desc_pin, because that function can only fail in two
> >>>> cases.
> >>>> +     * One, if there aren't enough free IDs, but we're guaranteed
> >>>> to have
> >>>> +     * enough here (we're either only pinning a handful of lrc on
> >>>> first boot
> >>>> +     * or we're re-pinning lrcs that were already pinned before the
> >>>> reset).
> >>>> +     * Two, if the GuC has died and CTBs can't make forward progress.
> >>>> +     * Presumably, the GuC should be alive as this function is
> >>>> called on
> >>>> +     * driver load or after a reset. Even if it is dead, another
> >>>> full GPU
> >>>> +     * reset will be triggered and this function would be called
> >>>> again.
> >>>> +     */
> >>>> +
> >>>> +    for_each_engine(engine, gt, id)
> >>>> +        if (engine->kernel_context)
> >>>> +            guc_kernel_context_pin(guc, engine->kernel_context);
> >>>> +}
> >>>> +
> >>>>    static void guc_release(struct intel_engine_cs *engine)
> >>>>    {
> >>>>        engine->sanitize = NULL; /* no longer in control, nothing to
> >>>> sanitize */
> >>>> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct
> >>>> intel_engine_cs *engine)
> >>>>    void intel_guc_submission_enable(struct intel_guc *guc)
> >>>>    {
> >>>> +    guc_init_lrc_mapping(guc);
> >>>>    }
> >>>>    void intel_guc_submission_disable(struct intel_guc *guc)
> >>>> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct
> >>>> intel_guc *guc)
> >>>>    {
> >>>>        guc->submission_selected = __guc_submission_selected(guc);
> >>>>    }
> >>>> +
> >>>> +static inline struct intel_context *
> >>>> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> >>>> +{
> >>>> +    struct intel_context *ce;
> >>>> +
> >>>> +    if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> >>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm,
> >>>> +            "Invalid desc_idx %u", desc_idx);
> >>>> +        return NULL;
> >>>> +    }
> >>>> +
> >>>> +    ce = __get_context(guc, desc_idx);
> >>>> +    if (unlikely(!ce)) {
> >>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm,
> >>>> +            "Context is NULL, desc_idx %u", desc_idx);
> >>>> +        return NULL;
> >>>> +    }
> >>>> +
> >>>> +    return ce;
> >>>> +}
> >>>> +
> >>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >>>> +                      const u32 *msg,
> >>>> +                      u32 len)
> >>>> +{
> >>>> +    struct intel_context *ce;
> >>>> +    u32 desc_idx = msg[0];
> >>>> +
> >>>> +    if (unlikely(len < 1)) {
> >>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> >>>> +        return -EPROTO;
> >>>> +    }
> >>>> +
> >>>> +    ce = g2h_context_lookup(guc, desc_idx);
> >>>> +    if (unlikely(!ce))
> >>>> +        return -EPROTO;
> >>>> +
> >>>> +    if (context_wait_for_deregister_to_register(ce)) {
> >>>> +        struct intel_runtime_pm *runtime_pm =
> >>>> +            &ce->engine->gt->i915->runtime_pm;
> >>>> +        intel_wakeref_t wakeref;
> >>>> +
> >>>> +        /*
> >>>> +         * Previous owner of this guc_id has been deregistered, now
> >>>> safe
> >>>> +         * register this context.
> >>>> +         */
> >>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
> >>>> +            register_context(ce);
> >>>> +        clr_context_wait_for_deregister_to_register(ce);
> >>>> +        intel_context_put(ce);
> >>>> +    } else if (context_destroyed(ce)) {
> >>>> +        /* Context has been destroyed */
> >>>> +        release_guc_id(guc, ce);
> >>>> +        lrc_destroy(&ce->ref);
> >>>> +    }
> >>>> +
> >>>> +    return 0;
> >>>> +}
> >>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> >>>> b/drivers/gpu/drm/i915/i915_reg.h
> >>>> index 943fe485c662..204c95c39353 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_reg.h
> >>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
> >>>> @@ -4142,6 +4142,7 @@ enum {
> >>>>        FAULT_AND_CONTINUE /* Unsupported */
> >>>>    };
> >>>> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
> >>>>    #define GEN8_CTX_VALID (1 << 0)
> >>>>    #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
> >>>>    #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> >>>> diff --git a/drivers/gpu/drm/i915/i915_request.c
> >>>> b/drivers/gpu/drm/i915/i915_request.c
> >>>> index 86b4c9f2613d..b48c4905d3fc 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_request.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_request.c
> >>>> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
> >>>>         */
> >>>>        if (!list_empty(&rq->sched.link))
> >>>>            remove_from_engine(rq);
> >>>> +    atomic_dec(&rq->context->guc_id_ref);
> >>> Does this work/make sense  if GuC is disabled?
> >>>
> >> It doesn't matter if the GuC is disabled as this var isn't used if it
> >> is. A following patch adds a vfunc here and pushes this in the GuC
> >> backend.
> >>
> >> Matt
> >>
> >>> Daniele
> >>>
> >>>>        GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> >>>>        __list_del_entry(&rq->link); /* poison neither prev/next (RCU
> >>>> walks) */
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface
@ 2021-07-22 15:48             ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22 15:48 UTC (permalink / raw)
  To: Michal Wajdeczko; +Cc: intel-gfx, dri-devel

On Thu, Jul 22, 2021 at 09:57:00AM +0200, Michal Wajdeczko wrote:
> 
> 
> On 22.07.2021 01:51, Daniele Ceraolo Spurio wrote:
> > 
> > 
> > On 7/19/2021 9:04 PM, Matthew Brost wrote:
> >> On Mon, Jul 19, 2021 at 05:51:46PM -0700, Daniele Ceraolo Spurio wrote:
> >>>
> >>> On 7/16/2021 1:16 PM, Matthew Brost wrote:
> >>>> Implement GuC context operations which includes GuC specific operations
> >>>> alloc, pin, unpin, and destroy.
> >>>>
> >>>> v2:
> >>>>    (Daniel Vetter)
> >>>>     - Use msleep_interruptible rather than cond_resched in busy loop
> >>>>    (Michal)
> >>>>     - Remove C++ style comment
> >>>>
> >>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/gt/intel_context.c       |   5 +
> >>>>    drivers/gpu/drm/i915/gt/intel_context_types.h |  22 +-
> >>>>    drivers/gpu/drm/i915/gt/intel_lrc_reg.h       |   1 -
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc.h        |  40 ++
> >>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c     |   4 +
> >>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 666
> >>>> ++++++++++++++++--
> >>>>    drivers/gpu/drm/i915/i915_reg.h               |   1 +
> >>>>    drivers/gpu/drm/i915/i915_request.c           |   1 +
> >>>>    8 files changed, 685 insertions(+), 55 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> index bd63813c8a80..32fd6647154b 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> @@ -384,6 +384,11 @@ intel_context_init(struct intel_context *ce,
> >>>> struct intel_engine_cs *engine)
> >>>>        mutex_init(&ce->pin_mutex);
> >>>> +    spin_lock_init(&ce->guc_state.lock);
> >>>> +
> >>>> +    ce->guc_id = GUC_INVALID_LRC_ID;
> >>>> +    INIT_LIST_HEAD(&ce->guc_id_link);
> >>>> +
> >>>>        i915_active_init(&ce->active,
> >>>>                 __intel_context_active, __intel_context_retire, 0);
> >>>>    }
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h
> >>>> b/drivers/gpu/drm/i915/gt/intel_context_types.h
> >>>> index 6d99631d19b9..606c480aec26 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> >>>> @@ -96,6 +96,7 @@ struct intel_context {
> >>>>    #define CONTEXT_BANNED            6
> >>>>    #define CONTEXT_FORCE_SINGLE_SUBMISSION    7
> >>>>    #define CONTEXT_NOPREEMPT        8
> >>>> +#define CONTEXT_LRCA_DIRTY        9
> >>>>        struct {
> >>>>            u64 timeout_us;
> >>>> @@ -138,14 +139,29 @@ struct intel_context {
> >>>>        u8 wa_bb_page; /* if set, page num reserved for context
> >>>> workarounds */
> >>>> +    struct {
> >>>> +        /** lock: protects everything in guc_state */
> >>>> +        spinlock_t lock;
> >>>> +        /**
> >>>> +         * sched_state: scheduling state of this context using GuC
> >>>> +         * submission
> >>>> +         */
> >>>> +        u8 sched_state;
> >>>> +    } guc_state;
> >>>> +
> >>>>        /* GuC scheduling state flags that do not require a lock. */
> >>>>        atomic_t guc_sched_state_no_lock;
> >>>> +    /* GuC LRC descriptor ID */
> >>>> +    u16 guc_id;
> >>>> +
> >>>> +    /* GuC LRC descriptor reference count */
> >>>> +    atomic_t guc_id_ref;
> >>>> +
> >>>>        /*
> >>>> -     * GuC LRC descriptor ID - Not assigned in this patch but
> >>>> future patches
> >>>> -     * in the series will.
> >>>> +     * GuC ID link - in list when unpinned but guc_id still valid
> >>>> in GuC
> >>>>         */
> >>>> -    u16 guc_id;
> >>>> +    struct list_head guc_id_link;
> >>>>    };
> >>>>    #endif /* __INTEL_CONTEXT_TYPES__ */
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> >>>> b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> >>>> index 41e5350a7a05..49d4857ad9b7 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc_reg.h
> >>>> @@ -87,7 +87,6 @@
> >>>>    #define GEN11_CSB_WRITE_PTR_MASK    (GEN11_CSB_PTR_MASK << 0)
> >>>>    #define MAX_CONTEXT_HW_ID    (1 << 21) /* exclusive */
> >>>> -#define MAX_GUC_CONTEXT_HW_ID    (1 << 20) /* exclusive */
> >>>>    #define GEN11_MAX_CONTEXT_HW_ID    (1 << 11) /* exclusive */
> >>>>    /* in Gen12 ID 0x7FF is reserved to indicate idle */
> >>>>    #define GEN12_MAX_CONTEXT_HW_ID    (GEN11_MAX_CONTEXT_HW_ID - 1)
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>> index 8c7b92f699f1..30773cd699f5 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> >>>> @@ -7,6 +7,7 @@
> >>>>    #define _INTEL_GUC_H_
> >>>>    #include <linux/xarray.h>
> >>>> +#include <linux/delay.h>
> >>>>    #include "intel_uncore.h"
> >>>>    #include "intel_guc_fw.h"
> >>>> @@ -44,6 +45,14 @@ struct intel_guc {
> >>>>            void (*disable)(struct intel_guc *guc);
> >>>>        } interrupts;
> >>>> +    /*
> >>>> +     * contexts_lock protects the pool of free guc ids and a linked
> >>>> list of
> >>>> +     * guc ids available to be stolen
> >>>> +     */
> >>>> +    spinlock_t contexts_lock;
> >>>> +    struct ida guc_ids;
> >>>> +    struct list_head guc_id_list;
> >>>> +
> >>>>        bool submission_selected;
> >>>>        struct i915_vma *ads_vma;
> >>>> @@ -101,6 +110,34 @@ intel_guc_send_and_receive(struct intel_guc
> >>>> *guc, const u32 *action, u32 len,
> >>>>                     response_buf, response_buf_size, 0);
> >>>>    }
> >>>> +static inline int intel_guc_send_busy_loop(struct intel_guc* guc,
> >>>> +                       const u32 *action,
> >>>> +                       u32 len,
> >>>> +                       bool loop)
> >>>> +{
> >>>> +    int err;
> >>>> +    unsigned int sleep_period_ms = 1;
> >>>> +    bool not_atomic = !in_atomic() && !irqs_disabled();
> >>>> +
> >>>> +    /* No sleeping with spin locks, just busy loop */
> >>>> +    might_sleep_if(loop && not_atomic);
> >>>> +
> >>>> +retry:
> >>>> +    err = intel_guc_send_nb(guc, action, len);
> >>>> +    if (unlikely(err == -EBUSY && loop)) {
> >>>> +        if (likely(not_atomic)) {
> >>>> +            if (msleep_interruptible(sleep_period_ms))
> >>>> +                return -EINTR;
> >>>> +            sleep_period_ms = sleep_period_ms << 1;
> >>>> +        } else {
> >>>> +            cpu_relax();
> >>> Potentially something we can change later, but if we're in atomic
> >>> context we
> >>> can't keep looping without a timeout while we get -EBUSY, we need to
> >>> bail
> >>> out early.
> >>>
> >> Eventually intel_guc_send_nb will pop with a different return code after
> >> 1 second I think if the GuC is hung.
> > 
> > I know, the point is that 1 second is too long in atomic context. This
> > has to either complete quick in atomic or be forked off to a worker. Can
> > fix as a follow up.
> 
> if we implement fallback to a worker, then maybe we can get rid of this
> busy loop completely, as it was supposed to be only needed for rare
> cases when CTBs were full due to unlikely congestion either caused by us

Right. That last project I worked on we had a circular buffer to
communicate with the HW and fell back to kthread (or workqueue?) when
the buffer the was full. We likely could do something like that but if
we have malloc memory that gets tricky if we are atomic contexts. This
is good one to write down and revisit a bit later.

Matt

> (host driver didn't read G2H on time or filled up H2G faster than GuC
> could proceed it) or GuC actual failure (no H2G processing at all).
> 
> Michal
> 
> > 
> > Daniele
> > 
> >>  
> >>>> +        }
> >>>> +        goto retry;
> >>>> +    }
> >>>> +
> >>>> +    return err;
> >>>> +}
> >>>> +
> >>>>    static inline void intel_guc_to_host_event_handler(struct
> >>>> intel_guc *guc)
> >>>>    {
> >>>>        intel_guc_ct_event_handler(&guc->ct);
> >>>> @@ -202,6 +239,9 @@ static inline void intel_guc_disable_msg(struct
> >>>> intel_guc *guc, u32 mask)
> >>>>    int intel_guc_reset_engine(struct intel_guc *guc,
> >>>>                   struct intel_engine_cs *engine);
> >>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >>>> +                      const u32 *msg, u32 len);
> >>>> +
> >>>>    void intel_guc_load_status(struct intel_guc *guc, struct
> >>>> drm_printer *p);
> >>>>    #endif
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> index 83ec60ea3f89..28ff82c5be45 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> >>>> @@ -928,6 +928,10 @@ static int ct_process_request(struct
> >>>> intel_guc_ct *ct, struct ct_incoming_msg *r
> >>>>        case INTEL_GUC_ACTION_DEFAULT:
> >>>>            ret = intel_guc_to_host_process_recv_msg(guc, payload, len);
> >>>>            break;
> >>>> +    case INTEL_GUC_ACTION_DEREGISTER_CONTEXT_DONE:
> >>>> +        ret = intel_guc_deregister_done_process_msg(guc, payload,
> >>>> +                                len);
> >>>> +        break;
> >>>>        default:
> >>>>            ret = -EOPNOTSUPP;
> >>>>            break;
> >>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> >>>> b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> >>>> index 53b4a5eb4a85..a47b3813b4d0 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> >>>> @@ -13,7 +13,9 @@
> >>>>    #include "gt/intel_gt.h"
> >>>>    #include "gt/intel_gt_irq.h"
> >>>>    #include "gt/intel_gt_pm.h"
> >>>> +#include "gt/intel_gt_requests.h"
> >>>>    #include "gt/intel_lrc.h"
> >>>> +#include "gt/intel_lrc_reg.h"
> >>>>    #include "gt/intel_mocs.h"
> >>>>    #include "gt/intel_ring.h"
> >>>> @@ -85,6 +87,73 @@ static inline void clr_context_enabled(struct
> >>>> intel_context *ce)
> >>>>               &ce->guc_sched_state_no_lock);
> >>>>    }
> >>>> +/*
> >>>> + * Below is a set of functions which control the GuC scheduling
> >>>> state which
> >>>> + * require a lock, aside from the special case where the functions
> >>>> are called
> >>>> + * from guc_lrc_desc_pin(). In that case it isn't possible for any
> >>>> other code
> >>>> + * path to be executing on the context.
> >>>> + */
> >>> Is there a reason to avoid taking the lock in guc_lrc_desc_pin, even
> >>> if it
> >>> isn't strictly needed? I'd prefer to avoid the asymmetry in the locking
> >> If this code is called from releasing a fence a circular dependency is
> >> created. Once we move the DRM scheduler the locking becomes way easier
> >> and we clean this up then. We already have task for locking clean up.
> >>
> >>> scheme if possible, as that might case trouble if thing change in the
> >>> future.
> >> Again this is temporary code, so I think we can live with this until the
> >> DRM scheduler lands.
> >>
> >>>> +#define SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER    BIT(0)
> >>>> +#define SCHED_STATE_DESTROYED                BIT(1)
> >>>> +static inline void init_sched_state(struct intel_context *ce)
> >>>> +{
> >>>> +    /* Only should be called from guc_lrc_desc_pin() */
> >>>> +    atomic_set(&ce->guc_sched_state_no_lock, 0);
> >>>> +    ce->guc_state.sched_state = 0;
> >>>> +}
> >>>> +
> >>>> +static inline bool
> >>>> +context_wait_for_deregister_to_register(struct intel_context *ce)
> >>>> +{
> >>>> +    return (ce->guc_state.sched_state &
> >>>> +        SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> >>> No need for (). Below as well.
> >>>
> >> Yep.
> >>
> >>>> +}
> >>>> +
> >>>> +static inline void
> >>>> +set_context_wait_for_deregister_to_register(struct intel_context *ce)
> >>>> +{
> >>>> +    /* Only should be called from guc_lrc_desc_pin() */
> >>>> +    ce->guc_state.sched_state |=
> >>>> +        SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER;
> >>>> +}
> >>>> +
> >>>> +static inline void
> >>>> +clr_context_wait_for_deregister_to_register(struct intel_context *ce)
> >>>> +{
> >>>> +    lockdep_assert_held(&ce->guc_state.lock);
> >>>> +    ce->guc_state.sched_state =
> >>>> +        (ce->guc_state.sched_state &
> >>>> +         ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER);
> >>> nit: can also use
> >>>
> >>> ce->guc_state.sched_state &=
> >>>     ~SCHED_STATE_WAIT_FOR_DEREGISTER_TO_REGISTER
> >>>
> >> Yep.
> >>  
> >>>> +}
> >>>> +
> >>>> +static inline bool
> >>>> +context_destroyed(struct intel_context *ce)
> >>>> +{
> >>>> +    return (ce->guc_state.sched_state & SCHED_STATE_DESTROYED);
> >>>> +}
> >>>> +
> >>>> +static inline void
> >>>> +set_context_destroyed(struct intel_context *ce)
> >>>> +{
> >>>> +    lockdep_assert_held(&ce->guc_state.lock);
> >>>> +    ce->guc_state.sched_state |= SCHED_STATE_DESTROYED;
> >>>> +}
> >>>> +
> >>>> +static inline bool context_guc_id_invalid(struct intel_context *ce)
> >>>> +{
> >>>> +    return (ce->guc_id == GUC_INVALID_LRC_ID);
> >>>> +}
> >>>> +
> >>>> +static inline void set_context_guc_id_invalid(struct intel_context
> >>>> *ce)
> >>>> +{
> >>>> +    ce->guc_id = GUC_INVALID_LRC_ID;
> >>>> +}
> >>>> +
> >>>> +static inline struct intel_guc *ce_to_guc(struct intel_context *ce)
> >>>> +{
> >>>> +    return &ce->engine->gt->uc.guc;
> >>>> +}
> >>>> +
> >>>>    static inline struct i915_priolist *to_priolist(struct rb_node *rb)
> >>>>    {
> >>>>        return rb_entry(rb, struct i915_priolist, node);
> >>>> @@ -155,6 +224,9 @@ static int guc_add_request(struct intel_guc
> >>>> *guc, struct i915_request *rq)
> >>>>        int len = 0;
> >>>>        bool enabled = context_enabled(ce);
> >>>> +    GEM_BUG_ON(!atomic_read(&ce->guc_id_ref));
> >>>> +    GEM_BUG_ON(context_guc_id_invalid(ce));
> >>>> +
> >>>>        if (!enabled) {
> >>>>            action[len++] = INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET;
> >>>>            action[len++] = ce->guc_id;
> >>>> @@ -417,6 +489,10 @@ int intel_guc_submission_init(struct intel_guc
> >>>> *guc)
> >>>>        xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> >>>> +    spin_lock_init(&guc->contexts_lock);
> >>>> +    INIT_LIST_HEAD(&guc->guc_id_list);
> >>>> +    ida_init(&guc->guc_ids);
> >>>> +
> >>>>        return 0;
> >>>>    }
> >>>> @@ -429,9 +505,303 @@ void intel_guc_submission_fini(struct
> >>>> intel_guc *guc)
> >>>>        i915_sched_engine_put(guc->sched_engine);
> >>>>    }
> >>>> -static int guc_context_alloc(struct intel_context *ce)
> >>>> +static inline void queue_request(struct i915_sched_engine
> >>>> *sched_engine,
> >>>> +                 struct i915_request *rq,
> >>>> +                 int prio)
> >>>>    {
> >>>> -    return lrc_alloc(ce, ce->engine);
> >>>> +    GEM_BUG_ON(!list_empty(&rq->sched.link));
> >>>> +    list_add_tail(&rq->sched.link,
> >>>> +              i915_sched_lookup_priolist(sched_engine, prio));
> >>>> +    set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >>>> +}
> >>>> +
> >>>> +static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> >>>> +                     struct i915_request *rq)
> >>>> +{
> >>>> +    int ret;
> >>>> +
> >>>> +    __i915_request_submit(rq);
> >>>> +
> >>>> +    trace_i915_request_in(rq, 0);
> >>>> +
> >>>> +    guc_set_lrc_tail(rq);
> >>>> +    ret = guc_add_request(guc, rq);
> >>>> +    if (ret == -EBUSY)
> >>>> +        guc->stalled_request = rq;
> >>>> +
> >>>> +    return ret;
> >>>> +}
> >>>> +
> >>>> +static void guc_submit_request(struct i915_request *rq)
> >>>> +{
> >>>> +    struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> >>>> +    struct intel_guc *guc = &rq->engine->gt->uc.guc;
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    /* Will be called from irq-context when using foreign fences. */
> >>>> +    spin_lock_irqsave(&sched_engine->lock, flags);
> >>>> +
> >>>> +    if (guc->stalled_request ||
> >>>> !i915_sched_engine_is_empty(sched_engine))
> >>>> +        queue_request(sched_engine, rq, rq_prio(rq));
> >>>> +    else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> >>>> +        tasklet_hi_schedule(&sched_engine->tasklet);
> >>>> +
> >>>> +    spin_unlock_irqrestore(&sched_engine->lock, flags);
> >>>> +}
> >>>> +
> >>>> +#define GUC_ID_START    64    /* First 64 guc_ids reserved */
> >>>> +static int new_guc_id(struct intel_guc *guc)
> >>>> +{
> >>>> +    return ida_simple_get(&guc->guc_ids, GUC_ID_START,
> >>>> +                  GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL |
> >>>> +                  __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> >>>> +}
> >>>> +
> >>>> +static void __release_guc_id(struct intel_guc *guc, struct
> >>>> intel_context *ce)
> >>>> +{
> >>>> +    if (!context_guc_id_invalid(ce)) {
> >>>> +        ida_simple_remove(&guc->guc_ids, ce->guc_id);
> >>>> +        reset_lrc_desc(guc, ce->guc_id);
> >>>> +        set_context_guc_id_invalid(ce);
> >>>> +    }
> >>>> +    if (!list_empty(&ce->guc_id_link))
> >>>> +        list_del_init(&ce->guc_id_link);
> >>>> +}
> >>>> +
> >>>> +static void release_guc_id(struct intel_guc *guc, struct
> >>>> intel_context *ce)
> >>>> +{
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
> >>>> +    __release_guc_id(guc, ce);
> >>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +}
> >>>> +
> >>>> +static int steal_guc_id(struct intel_guc *guc)
> >>>> +{
> >>>> +    struct intel_context *ce;
> >>>> +    int guc_id;
> >>>> +
> >>>> +    if (!list_empty(&guc->guc_id_list)) {
> >>>> +        ce = list_first_entry(&guc->guc_id_list,
> >>>> +                      struct intel_context,
> >>>> +                      guc_id_link);
> >>>> +
> >>>> +        GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> >>>> +        GEM_BUG_ON(context_guc_id_invalid(ce));
> >>>> +
> >>>> +        list_del_init(&ce->guc_id_link);
> >>>> +        guc_id = ce->guc_id;
> >>>> +        set_context_guc_id_invalid(ce);
> >>>> +        return guc_id;
> >>>> +    } else {
> >>>> +        return -EAGAIN;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static int assign_guc_id(struct intel_guc *guc, u16 *out)
> >>>> +{
> >>>> +    int ret;
> >>>> +
> >>>> +    ret = new_guc_id(guc);
> >>>> +    if (unlikely(ret < 0)) {
> >>>> +        ret = steal_guc_id(guc);
> >>>> +        if (ret < 0)
> >>>> +            return ret;
> >>>> +    }
> >>>> +
> >>>> +    *out = ret;
> >>>> +    return 0;
> >>> Is it worth adding spinlock_held asserts for guc->contexts_lock in
> >>> these ID
> >>> functions? Doubles up as a documentation of what locking we expect.
> >>>
> >> Never a bad idea to have asserts in the code. Will add.
> >>
> >>>> +}
> >>>> +
> >>>> +#define PIN_GUC_ID_TRIES    4
> >>>> +static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce)
> >>>> +{
> >>>> +    int ret = 0;
> >>>> +    unsigned long flags, tries = PIN_GUC_ID_TRIES;
> >>>> +
> >>>> +    GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> >>>> +
> >>>> +try_again:
> >>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
> >>>> +
> >>>> +    if (context_guc_id_invalid(ce)) {
> >>>> +        ret = assign_guc_id(guc, &ce->guc_id);
> >>>> +        if (ret)
> >>>> +            goto out_unlock;
> >>>> +        ret = 1;    /* Indidcates newly assigned guc_id */
> >>>> +    }
> >>>> +    if (!list_empty(&ce->guc_id_link))
> >>>> +        list_del_init(&ce->guc_id_link);
> >>>> +    atomic_inc(&ce->guc_id_ref);
> >>>> +
> >>>> +out_unlock:
> >>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +
> >>>> +    /*
> >>>> +     * -EAGAIN indicates no guc_ids are available, let's retire any
> >>>> +     * outstanding requests to see if that frees up a guc_id. If
> >>>> the first
> >>>> +     * retire didn't help, insert a sleep with the timeslice
> >>>> duration before
> >>>> +     * attempting to retire more requests. Double the sleep period
> >>>> each
> >>>> +     * subsequent pass before finally giving up. The sleep period
> >>>> has max of
> >>>> +     * 100ms and minimum of 1ms.
> >>>> +     */
> >>>> +    if (ret == -EAGAIN && --tries) {
> >>>> +        if (PIN_GUC_ID_TRIES - tries > 1) {
> >>>> +            unsigned int timeslice_shifted =
> >>>> +                ce->engine->props.timeslice_duration_ms <<
> >>>> +                (PIN_GUC_ID_TRIES - tries - 2);
> >>>> +            unsigned int max = min_t(unsigned int, 100,
> >>>> +                         timeslice_shifted);
> >>>> +
> >>>> +            msleep(max_t(unsigned int, max, 1));
> >>>> +        }
> >>>> +        intel_gt_retire_requests(guc_to_gt(guc));
> >>>> +        goto try_again;
> >>>> +    }
> >>>> +
> >>>> +    return ret;
> >>>> +}
> >>>> +
> >>>> +static void unpin_guc_id(struct intel_guc *guc, struct
> >>>> intel_context *ce)
> >>>> +{
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> >>>> +
> >>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
> >>>> +    if (!context_guc_id_invalid(ce) && list_empty(&ce->guc_id_link) &&
> >>>> +        !atomic_read(&ce->guc_id_ref))
> >>>> +        list_add_tail(&ce->guc_id_link, &guc->guc_id_list);
> >>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +}
> >>>> +
> >>>> +static int __guc_action_register_context(struct intel_guc *guc,
> >>>> +                     u32 guc_id,
> >>>> +                     u32 offset)
> >>>> +{
> >>>> +    u32 action[] = {
> >>>> +        INTEL_GUC_ACTION_REGISTER_CONTEXT,
> >>>> +        guc_id,
> >>>> +        offset,
> >>>> +    };
> >>>> +
> >>>> +    return intel_guc_send_busy_loop(guc, action,
> >>>> ARRAY_SIZE(action), true);
> >>>> +}
> >>>> +
> >>>> +static int register_context(struct intel_context *ce)
> >>>> +{
> >>>> +    struct intel_guc *guc = ce_to_guc(ce);
> >>>> +    u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> >>>> +        ce->guc_id * sizeof(struct guc_lrc_desc);
> >>>> +
> >>>> +    return __guc_action_register_context(guc, ce->guc_id, offset);
> >>>> +}
> >>>> +
> >>>> +static int __guc_action_deregister_context(struct intel_guc *guc,
> >>>> +                       u32 guc_id)
> >>>> +{
> >>>> +    u32 action[] = {
> >>>> +        INTEL_GUC_ACTION_DEREGISTER_CONTEXT,
> >>>> +        guc_id,
> >>>> +    };
> >>>> +
> >>>> +    return intel_guc_send_busy_loop(guc, action,
> >>>> ARRAY_SIZE(action), true);
> >>>> +}
> >>>> +
> >>>> +static int deregister_context(struct intel_context *ce, u32 guc_id)
> >>>> +{
> >>>> +    struct intel_guc *guc = ce_to_guc(ce);
> >>>> +
> >>>> +    return __guc_action_deregister_context(guc, guc_id);
> >>>> +}
> >>>> +
> >>>> +static intel_engine_mask_t adjust_engine_mask(u8 class,
> >>>> intel_engine_mask_t mask)
> >>>> +{
> >>>> +    switch (class) {
> >>>> +    case RENDER_CLASS:
> >>>> +        return mask >> RCS0;
> >>>> +    case VIDEO_ENHANCEMENT_CLASS:
> >>>> +        return mask >> VECS0;
> >>>> +    case VIDEO_DECODE_CLASS:
> >>>> +        return mask >> VCS0;
> >>>> +    case COPY_ENGINE_CLASS:
> >>>> +        return mask >> BCS0;
> >>>> +    default:
> >>>> +        GEM_BUG_ON("Invalid Class");
> >>> we usually use MISSING_CASE for this type of errors.
> >> As soon as start talking in logical space (series immediatly after this
> >> gets merged) this function gets deleted but will fix.
> >>
> >>>> +        return 0;
> >>>> +    }
> >>>> +}
> >>>> +
> >>>> +static void guc_context_policy_init(struct intel_engine_cs *engine,
> >>>> +                    struct guc_lrc_desc *desc)
> >>>> +{
> >>>> +    desc->policy_flags = 0;
> >>>> +
> >>>> +    desc->execution_quantum =
> >>>> CONTEXT_POLICY_DEFAULT_EXECUTION_QUANTUM_US;
> >>>> +    desc->preemption_timeout =
> >>>> CONTEXT_POLICY_DEFAULT_PREEMPTION_TIME_US;
> >>>> +}
> >>>> +
> >>>> +static int guc_lrc_desc_pin(struct intel_context *ce)
> >>>> +{
> >>>> +    struct intel_runtime_pm *runtime_pm =
> >>>> +        &ce->engine->gt->i915->runtime_pm;
> >>> If you move this after the engine var below you can skip the ce->engine
> >>> jump. Also, you can shorten the pointer chasing by using
> >>> engine->uncore->rpm.
> >>>
> >> Sure.
> >>
> >>>> +    struct intel_engine_cs *engine = ce->engine;
> >>>> +    struct intel_guc *guc = &engine->gt->uc.guc;
> >>>> +    u32 desc_idx = ce->guc_id;
> >>>> +    struct guc_lrc_desc *desc;
> >>>> +    bool context_registered;
> >>>> +    intel_wakeref_t wakeref;
> >>>> +    int ret = 0;
> >>>> +
> >>>> +    GEM_BUG_ON(!engine->mask);
> >>>> +
> >>>> +    /*
> >>>> +     * Ensure LRC + CT vmas are is same region as write barrier is
> >>>> done
> >>>> +     * based on CT vma region.
> >>>> +     */
> >>>> +    GEM_BUG_ON(i915_gem_object_is_lmem(guc->ct.vma->obj) !=
> >>>> +           i915_gem_object_is_lmem(ce->ring->vma->obj));
> >>>> +
> >>>> +    context_registered = lrc_desc_registered(guc, desc_idx);
> >>>> +
> >>>> +    reset_lrc_desc(guc, desc_idx);
> >>>> +    set_lrc_desc_registered(guc, desc_idx, ce);
> >>>> +
> >>>> +    desc = __get_lrc_desc(guc, desc_idx);
> >>>> +    desc->engine_class = engine_class_to_guc_class(engine->class);
> >>>> +    desc->engine_submit_mask = adjust_engine_mask(engine->class,
> >>>> +                              engine->mask);
> >>>> +    desc->hw_context_desc = ce->lrc.lrca;
> >>>> +    desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> >>>> +    desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> >>>> +    guc_context_policy_init(engine, desc);
> >>>> +    init_sched_state(ce);
> >>>> +
> >>>> +    /*
> >>>> +     * The context_lookup xarray is used to determine if the hardware
> >>>> +     * context is currently registered. There are two cases in
> >>>> which it
> >>>> +     * could be regisgered either the guc_id has been stole from from
> >>> typo regisgered
> >>>
> >> John got this one. Fixed.
> >>  
> >>>> +     * another context or the lrc descriptor address of this
> >>>> context has
> >>>> +     * changed. In either case the context needs to be deregistered
> >>>> with the
> >>>> +     * GuC before registering this context.
> >>>> +     */
> >>>> +    if (context_registered) {
> >>>> +        set_context_wait_for_deregister_to_register(ce);
> >>>> +        intel_context_get(ce);
> >>>> +
> >>>> +        /*
> >>>> +         * If stealing the guc_id, this ce has the same guc_id as the
> >>>> +         * context whos guc_id was stole.
> >>>> +         */
> >>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
> >>>> +            ret = deregister_context(ce, ce->guc_id);
> >>>> +    } else {
> >>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
> >>>> +            ret = register_context(ce);
> >>>> +    }
> >>>> +
> >>>> +    return ret;
> >>>>    }
> >>>>    static int guc_context_pre_pin(struct intel_context *ce,
> >>>> @@ -443,36 +813,139 @@ static int guc_context_pre_pin(struct
> >>>> intel_context *ce,
> >>>>    static int guc_context_pin(struct intel_context *ce, void *vaddr)
> >>>>    {
> >>>> +    if (i915_ggtt_offset(ce->state) !=
> >>>> +        (ce->lrc.lrca & CTX_GTT_ADDRESS_MASK))
> >>>> +        set_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> >>> Shouldn't this be set after lrc_pin()? I can see ce->lrc.lrca being
> >>> re-assigned in there; it looks like it can only change if the context is
> >>> new, but for future-proofing IMO better to re-order here.
> >>>
> >> No this is right. ce->state is assigned in lr_pre_pin and ce->lrc.lrca is
> >> assigned in lrc_pin. To catch the change you have to check between
> >> those two functions.
> >>
> >>>> +
> >>>>        return lrc_pin(ce, ce->engine, vaddr);
> >>> Could use a comment to say that the GuC context pinning call is delayed
> >>> until request alloc and to look at the comment in the latter function
> >>> for
> >>> details (or move the explanation here and refer here from request
> >>> alloc).
> >>>
> >> Yes.
> >>  
> >>>>    }
> >>>> +static void guc_context_unpin(struct intel_context *ce)
> >>>> +{
> >>>> +    struct intel_guc *guc = ce_to_guc(ce);
> >>>> +
> >>>> +    unpin_guc_id(guc, ce);
> >>>> +    lrc_unpin(ce);
> >>>> +}
> >>>> +
> >>>> +static void guc_context_post_unpin(struct intel_context *ce)
> >>>> +{
> >>>> +    lrc_post_unpin(ce);
> >>> why do we need this function? we can just pass lrc_post_unpin to the ops
> >>> (like we already do).
> >>>
> >> We don't before style reasons I think having a GuC specific wrapper is
> >> correct.
> >>
> >>>> +}
> >>>> +
> >>>> +static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >>>> +{
> >>>> +    struct intel_engine_cs *engine = ce->engine;
> >>>> +    struct intel_guc *guc = &engine->gt->uc.guc;
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    GEM_BUG_ON(!lrc_desc_registered(guc, ce->guc_id));
> >>>> +    GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> >>>> +
> >>>> +    spin_lock_irqsave(&ce->guc_state.lock, flags);
> >>>> +    set_context_destroyed(ce);
> >>>> +    spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >>>> +
> >>>> +    deregister_context(ce, ce->guc_id);
> >>>> +}
> >>>> +
> >>>> +static void guc_context_destroy(struct kref *kref)
> >>>> +{
> >>>> +    struct intel_context *ce = container_of(kref, typeof(*ce), ref);
> >>>> +    struct intel_runtime_pm *runtime_pm =
> >>>> &ce->engine->gt->i915->runtime_pm;
> >>> same as above, going through engine->uncore->rpm is shorter
> >>>
> >> Sure.
> >>  
> >>>> +    struct intel_guc *guc = &ce->engine->gt->uc.guc;
> >>>> +    intel_wakeref_t wakeref;
> >>>> +    unsigned long flags;
> >>>> +
> >>>> +    /*
> >>>> +     * If the guc_id is invalid this context has been stolen and we
> >>>> can free
> >>>> +     * it immediately. Also can be freed immediately if the context
> >>>> is not
> >>>> +     * registered with the GuC.
> >>>> +     */
> >>>> +    if (context_guc_id_invalid(ce) ||
> >>>> +        !lrc_desc_registered(guc, ce->guc_id)) {
> >>>> +        release_guc_id(guc, ce);
> >>> it feels a bit weird that we call release_guc_id in the case where
> >>> the id is
> >>> invalid. The code handles it fine, but still the flow doesn't feel
> >>> clean.
> >>> Not a blocker.
> >>>
> >> I understand. Let's see if this can get cleaned up at some point.
> >>
> >>>> +        lrc_destroy(kref);
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    /*
> >>>> +     * We have to acquire the context spinlock and check guc_id
> >>>> again, if it
> >>>> +     * is valid it hasn't been stolen and needs to be deregistered. We
> >>>> +     * delete this context from the list of unpinned guc_ids
> >>>> available to
> >>>> +     * stole to seal a race with guc_lrc_desc_pin(). When the G2H CTB
> >>>> +     * returns indicating this context has been deregistered the
> >>>> guc_id is
> >>>> +     * returned to the pool of available guc_ids.
> >>>> +     */
> >>>> +    spin_lock_irqsave(&guc->contexts_lock, flags);
> >>>> +    if (context_guc_id_invalid(ce)) {
> >>>> +        __release_guc_id(guc, ce);
> >>> But here the call to __release_guc_id is unneded, right? the ce
> >>> doesn't own
> >>> the ID anymore and both the steal and the release function already clean
> >>> ce->guc_id_link, so there should be nothing left to clean.
> >>>
> >> This is dead code. Will delete.
> >>
> >>>> +        spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +        lrc_destroy(kref);
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    if (!list_empty(&ce->guc_id_link))
> >>>> +        list_del_init(&ce->guc_id_link);
> >>>> +    spin_unlock_irqrestore(&guc->contexts_lock, flags);
> >>>> +
> >>>> +    /*
> >>>> +     * We defer GuC context deregistration until the context is
> >>>> destroyed
> >>>> +     * in order to save on CTBs. With this optimization ideally we
> >>>> only need
> >>>> +     * 1 CTB to register the context during the first pin and 1 CTB to
> >>>> +     * deregister the context when the context is destroyed.
> >>>> Without this
> >>>> +     * optimization, a CTB would be needed every pin & unpin.
> >>>> +     *
> >>>> +     * XXX: Need to acqiure the runtime wakeref as this can be
> >>>> triggered
> >>>> +     * from context_free_worker when not runtime wakeref is held.
> >>>> +     * guc_lrc_desc_unpin requires the runtime as a GuC register is
> >>>> written
> >>>> +     * in H2G CTB to deregister the context. A future patch may
> >>>> defer this
> >>>> +     * H2G CTB if the runtime wakeref is zero.
> >>>> +     */
> >>>> +    with_intel_runtime_pm(runtime_pm, wakeref)
> >>>> +        guc_lrc_desc_unpin(ce);
> >>>> +}
> >>>> +
> >>>> +static int guc_context_alloc(struct intel_context *ce)
> >>>> +{
> >>>> +    return lrc_alloc(ce, ce->engine);
> >>>> +}
> >>>> +
> >>>>    static const struct intel_context_ops guc_context_ops = {
> >>>>        .alloc = guc_context_alloc,
> >>>>        .pre_pin = guc_context_pre_pin,
> >>>>        .pin = guc_context_pin,
> >>>> -    .unpin = lrc_unpin,
> >>>> -    .post_unpin = lrc_post_unpin,
> >>>> +    .unpin = guc_context_unpin,
> >>>> +    .post_unpin = guc_context_post_unpin,
> >>>>        .enter = intel_context_enter_engine,
> >>>>        .exit = intel_context_exit_engine,
> >>>>        .reset = lrc_reset,
> >>>> -    .destroy = lrc_destroy,
> >>>> +    .destroy = guc_context_destroy,
> >>>>    };
> >>>> -static int guc_request_alloc(struct i915_request *request)
> >>>> +static bool context_needs_register(struct intel_context *ce, bool
> >>>> new_guc_id)
> >>>>    {
> >>>> +    return new_guc_id || test_bit(CONTEXT_LRCA_DIRTY, &ce->flags) ||
> >>>> +        !lrc_desc_registered(ce_to_guc(ce), ce->guc_id);
> >>>> +}
> >>>> +
> >>>> +static int guc_request_alloc(struct i915_request *rq)
> >>>> +{
> >>>> +    struct intel_context *ce = rq->context;
> >>>> +    struct intel_guc *guc = ce_to_guc(ce);
> >>>>        int ret;
> >>>> -    GEM_BUG_ON(!intel_context_is_pinned(request->context));
> >>>> +    GEM_BUG_ON(!intel_context_is_pinned(rq->context));
> >>>>        /*
> >>>>         * Flush enough space to reduce the likelihood of waiting after
> >>>>         * we start building the request - in which case we will just
> >>>>         * have to repeat work.
> >>>>         */
> >>>> -    request->reserved_space += GUC_REQUEST_SIZE;
> >>>> +    rq->reserved_space += GUC_REQUEST_SIZE;
> >>>>        /*
> >>>>         * Note that after this point, we have committed to using
> >>>> @@ -483,56 +956,47 @@ static int guc_request_alloc(struct
> >>>> i915_request *request)
> >>>>         */
> >>>>        /* Unconditionally invalidate GPU caches and TLBs. */
> >>>> -    ret = request->engine->emit_flush(request, EMIT_INVALIDATE);
> >>>> +    ret = rq->engine->emit_flush(rq, EMIT_INVALIDATE);
> >>>>        if (ret)
> >>>>            return ret;
> >>>> -    request->reserved_space -= GUC_REQUEST_SIZE;
> >>>> -    return 0;
> >>>> -}
> >>>> +    rq->reserved_space -= GUC_REQUEST_SIZE;
> >>>> -static inline void queue_request(struct i915_sched_engine
> >>>> *sched_engine,
> >>>> -                 struct i915_request *rq,
> >>>> -                 int prio)
> >>>> -{
> >>>> -    GEM_BUG_ON(!list_empty(&rq->sched.link));
> >>>> -    list_add_tail(&rq->sched.link,
> >>>> -              i915_sched_lookup_priolist(sched_engine, prio));
> >>>> -    set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >>>> -}
> >>>> -
> >>>> -static int guc_bypass_tasklet_submit(struct intel_guc *guc,
> >>>> -                     struct i915_request *rq)
> >>>> -{
> >>>> -    int ret;
> >>>> -
> >>>> -    __i915_request_submit(rq);
> >>>> -
> >>>> -    trace_i915_request_in(rq, 0);
> >>>> -
> >>>> -    guc_set_lrc_tail(rq);
> >>>> -    ret = guc_add_request(guc, rq);
> >>>> -    if (ret == -EBUSY)
> >>>> -        guc->stalled_request = rq;
> >>>> -
> >>>> -    return ret;
> >>>> -}
> >>>> -
> >>>> -static void guc_submit_request(struct i915_request *rq)
> >>>> -{
> >>>> -    struct i915_sched_engine *sched_engine = rq->engine->sched_engine;
> >>>> -    struct intel_guc *guc = &rq->engine->gt->uc.guc;
> >>>> -    unsigned long flags;
> >>>> +    /*
> >>>> +     * Call pin_guc_id here rather than in the pinning step as with
> >>>> +     * dma_resv, contexts can be repeatedly pinned / unpinned
> >>>> trashing the
> >>>> +     * guc_ids and creating horrible race conditions. This is
> >>>> especially bad
> >>>> +     * when guc_ids are being stolen due to over subscription. By
> >>>> the time
> >>>> +     * this function is reached, it is guaranteed that the guc_id
> >>>> will be
> >>>> +     * persistent until the generated request is retired. Thus,
> >>>> sealing these
> >>>> +     * race conditions. It is still safe to fail here if guc_ids are
> >>>> +     * exhausted and return -EAGAIN to the user indicating that
> >>>> they can try
> >>>> +     * again in the future.
> >>>> +     *
> >>>> +     * There is no need for a lock here as the timeline mutex
> >>>> ensures at
> >>>> +     * most one context can be executing this code path at once. The
> >>>> +     * guc_id_ref is incremented once for every request in flight and
> >>>> +     * decremented on each retire. When it is zero, a lock around the
> >>>> +     * increment (in pin_guc_id) is needed to seal a race with
> >>>> unpin_guc_id.
> >>>> +     */
> >>>> +    if (atomic_add_unless(&ce->guc_id_ref, 1, 0))
> >>>> +        return 0;
> >>>> -    /* Will be called from irq-context when using foreign fences. */
> >>>> -    spin_lock_irqsave(&sched_engine->lock, flags);
> >>>> +    ret = pin_guc_id(guc, ce);    /* returns 1 if new guc_id
> >>>> assigned */
> >>>> +    if (unlikely(ret < 0))
> >>>> +        return ret;;
> >>> typo ";;"
> >>>
> >> Yep.
> >>
> >>>> +    if (context_needs_register(ce, !!ret)) {
> >>>> +        ret = guc_lrc_desc_pin(ce);
> >>>> +        if (unlikely(ret)) {    /* unwind */
> >>>> +            atomic_dec(&ce->guc_id_ref);
> >>>> +            unpin_guc_id(guc, ce);
> >>>> +            return ret;
> >>>> +        }
> >>>> +    }
> >>>> -    if (guc->stalled_request ||
> >>>> !i915_sched_engine_is_empty(sched_engine))
> >>>> -        queue_request(sched_engine, rq, rq_prio(rq));
> >>>> -    else if (guc_bypass_tasklet_submit(guc, rq) == -EBUSY)
> >>>> -        tasklet_hi_schedule(&sched_engine->tasklet);
> >>>> +    clear_bit(CONTEXT_LRCA_DIRTY, &ce->flags);
> >>>
> >>> Might be worth moving the pinning to a separate function to keep the
> >>> request_alloc focused on the request. Can be done as a follow up.
> >>>
> >> Sure. Let me see how that looks in a follow up.
> >>  
> >>>> -    spin_unlock_irqrestore(&sched_engine->lock, flags);
> >>>> +    return 0;
> >>>>    }
> >>>>    static void sanitize_hwsp(struct intel_engine_cs *engine)
> >>>> @@ -606,6 +1070,46 @@ static void guc_set_default_submission(struct
> >>>> intel_engine_cs *engine)
> >>>>        engine->submit_request = guc_submit_request;
> >>>>    }
> >>>> +static inline void guc_kernel_context_pin(struct intel_guc *guc,
> >>>> +                      struct intel_context *ce)
> >>>> +{
> >>>> +    if (context_guc_id_invalid(ce))
> >>>> +        pin_guc_id(guc, ce);
> >>>> +    guc_lrc_desc_pin(ce);
> >>>> +}
> >>>> +
> >>>> +static inline void guc_init_lrc_mapping(struct intel_guc *guc)
> >>>> +{
> >>>> +    struct intel_gt *gt = guc_to_gt(guc);
> >>>> +    struct intel_engine_cs *engine;
> >>>> +    enum intel_engine_id id;
> >>>> +
> >>>> +    /* make sure all descriptors are clean... */
> >>>> +    xa_destroy(&guc->context_lookup);
> >>>> +
> >>>> +    /*
> >>>> +     * Some contexts might have been pinned before we enabled GuC
> >>>> +     * submission, so we need to add them to the GuC bookeeping.
> >>>> +     * Also, after a reset the GuC we want to make sure that the
> >>>> information
> >>>> +     * shared with GuC is properly reset. The kernel lrcs are not
> >>>> attached
> >>>> +     * to the gem_context, so they need to be added separately.
> >>>> +     *
> >>>> +     * Note: we purposely do not check the error return of
> >>>> +     * guc_lrc_desc_pin, because that function can only fail in two
> >>>> cases.
> >>>> +     * One, if there aren't enough free IDs, but we're guaranteed
> >>>> to have
> >>>> +     * enough here (we're either only pinning a handful of lrc on
> >>>> first boot
> >>>> +     * or we're re-pinning lrcs that were already pinned before the
> >>>> reset).
> >>>> +     * Two, if the GuC has died and CTBs can't make forward progress.
> >>>> +     * Presumably, the GuC should be alive as this function is
> >>>> called on
> >>>> +     * driver load or after a reset. Even if it is dead, another
> >>>> full GPU
> >>>> +     * reset will be triggered and this function would be called
> >>>> again.
> >>>> +     */
> >>>> +
> >>>> +    for_each_engine(engine, gt, id)
> >>>> +        if (engine->kernel_context)
> >>>> +            guc_kernel_context_pin(guc, engine->kernel_context);
> >>>> +}
> >>>> +
> >>>>    static void guc_release(struct intel_engine_cs *engine)
> >>>>    {
> >>>>        engine->sanitize = NULL; /* no longer in control, nothing to
> >>>> sanitize */
> >>>> @@ -718,6 +1222,7 @@ int intel_guc_submission_setup(struct
> >>>> intel_engine_cs *engine)
> >>>>    void intel_guc_submission_enable(struct intel_guc *guc)
> >>>>    {
> >>>> +    guc_init_lrc_mapping(guc);
> >>>>    }
> >>>>    void intel_guc_submission_disable(struct intel_guc *guc)
> >>>> @@ -743,3 +1248,62 @@ void intel_guc_submission_init_early(struct
> >>>> intel_guc *guc)
> >>>>    {
> >>>>        guc->submission_selected = __guc_submission_selected(guc);
> >>>>    }
> >>>> +
> >>>> +static inline struct intel_context *
> >>>> +g2h_context_lookup(struct intel_guc *guc, u32 desc_idx)
> >>>> +{
> >>>> +    struct intel_context *ce;
> >>>> +
> >>>> +    if (unlikely(desc_idx >= GUC_MAX_LRC_DESCRIPTORS)) {
> >>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm,
> >>>> +            "Invalid desc_idx %u", desc_idx);
> >>>> +        return NULL;
> >>>> +    }
> >>>> +
> >>>> +    ce = __get_context(guc, desc_idx);
> >>>> +    if (unlikely(!ce)) {
> >>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm,
> >>>> +            "Context is NULL, desc_idx %u", desc_idx);
> >>>> +        return NULL;
> >>>> +    }
> >>>> +
> >>>> +    return ce;
> >>>> +}
> >>>> +
> >>>> +int intel_guc_deregister_done_process_msg(struct intel_guc *guc,
> >>>> +                      const u32 *msg,
> >>>> +                      u32 len)
> >>>> +{
> >>>> +    struct intel_context *ce;
> >>>> +    u32 desc_idx = msg[0];
> >>>> +
> >>>> +    if (unlikely(len < 1)) {
> >>>> +        drm_dbg(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
> >>>> +        return -EPROTO;
> >>>> +    }
> >>>> +
> >>>> +    ce = g2h_context_lookup(guc, desc_idx);
> >>>> +    if (unlikely(!ce))
> >>>> +        return -EPROTO;
> >>>> +
> >>>> +    if (context_wait_for_deregister_to_register(ce)) {
> >>>> +        struct intel_runtime_pm *runtime_pm =
> >>>> +            &ce->engine->gt->i915->runtime_pm;
> >>>> +        intel_wakeref_t wakeref;
> >>>> +
> >>>> +        /*
> >>>> +         * Previous owner of this guc_id has been deregistered, now
> >>>> safe
> >>>> +         * register this context.
> >>>> +         */
> >>>> +        with_intel_runtime_pm(runtime_pm, wakeref)
> >>>> +            register_context(ce);
> >>>> +        clr_context_wait_for_deregister_to_register(ce);
> >>>> +        intel_context_put(ce);
> >>>> +    } else if (context_destroyed(ce)) {
> >>>> +        /* Context has been destroyed */
> >>>> +        release_guc_id(guc, ce);
> >>>> +        lrc_destroy(&ce->ref);
> >>>> +    }
> >>>> +
> >>>> +    return 0;
> >>>> +}
> >>>> diff --git a/drivers/gpu/drm/i915/i915_reg.h
> >>>> b/drivers/gpu/drm/i915/i915_reg.h
> >>>> index 943fe485c662..204c95c39353 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_reg.h
> >>>> +++ b/drivers/gpu/drm/i915/i915_reg.h
> >>>> @@ -4142,6 +4142,7 @@ enum {
> >>>>        FAULT_AND_CONTINUE /* Unsupported */
> >>>>    };
> >>>> +#define CTX_GTT_ADDRESS_MASK GENMASK(31, 12)
> >>>>    #define GEN8_CTX_VALID (1 << 0)
> >>>>    #define GEN8_CTX_FORCE_PD_RESTORE (1 << 1)
> >>>>    #define GEN8_CTX_FORCE_RESTORE (1 << 2)
> >>>> diff --git a/drivers/gpu/drm/i915/i915_request.c
> >>>> b/drivers/gpu/drm/i915/i915_request.c
> >>>> index 86b4c9f2613d..b48c4905d3fc 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_request.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_request.c
> >>>> @@ -407,6 +407,7 @@ bool i915_request_retire(struct i915_request *rq)
> >>>>         */
> >>>>        if (!list_empty(&rq->sched.link))
> >>>>            remove_from_engine(rq);
> >>>> +    atomic_dec(&rq->context->guc_id_ref);
> >>> Does this work/make sense  if GuC is disabled?
> >>>
> >> It doesn't matter if the GuC is disabled as this var isn't used if it
> >> is. A following patch adds a vfunc here and pushes this in the GuC
> >> backend.
> >>
> >> Matt
> >>
> >>> Daniele
> >>>
> >>>>        GEM_BUG_ON(!llist_empty(&rq->execute_cb));
> >>>>        __list_del_entry(&rq->link); /* poison neither prev/next (RCU
> >>>> walks) */
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 43/51] drm/i915/guc: Support request cancellation
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-22 19:56     ` Daniele Ceraolo Spurio
  -1 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-22 19:56 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: john.c.harrison



On 7/16/2021 1:17 PM, Matthew Brost wrote:
> This adds GuC backend support for i915_request_cancel(), which in turn
> makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.

This needs a bit of explanation on why we're using fences for this 
instead of other simpler options.

> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |   9 +
>   drivers/gpu/drm/i915/gt/intel_context.h       |   7 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
>   .../drm/i915/gt/intel_execlists_submission.c  |  18 ++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 169 ++++++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c           |  14 +-
>   6 files changed, 211 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index dd078a80c3a3..b1e3d00fb1f2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -366,6 +366,12 @@ static int __intel_context_active(struct i915_active *active)
>   	return 0;
>   }
>   
> +static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> +				 enum i915_sw_fence_notify state)
> +{
> +	return NOTIFY_DONE;
> +}
> +
>   void
>   intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   {
> @@ -399,6 +405,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   	ce->guc_id = GUC_INVALID_LRC_ID;
>   	INIT_LIST_HEAD(&ce->guc_id_link);
>   
> +	i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
> +	i915_sw_fence_commit(&ce->guc_blocked);

We need a comment somewhere to explain how we use this blocked fence, 
I.e. that fence starts signaled to indicate unblocked and we re-init it 
to unsignaled status when we need to mark something as blocked.

> +
>   	i915_active_init(&ce->active,
>   			 __intel_context_active, __intel_context_retire, 0);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index 814d9277096a..876bdb08303c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -70,6 +70,13 @@ intel_context_is_pinned(struct intel_context *ce)
>   	return atomic_read(&ce->pin_count);
>   }
>   
> +static inline void intel_context_cancel_request(struct intel_context *ce,
> +						struct i915_request *rq)
> +{
> +	GEM_BUG_ON(!ce->ops->cancel_request);
> +	return ce->ops->cancel_request(ce, rq);
> +}
> +
>   /**
>    * intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status
>    * @ce - the context
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 57c19ee3e313..005a64f2afa7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -13,6 +13,7 @@
>   #include <linux/types.h>
>   
>   #include "i915_active_types.h"
> +#include "i915_sw_fence.h"
>   #include "i915_utils.h"
>   #include "intel_engine_types.h"
>   #include "intel_sseu.h"
> @@ -42,6 +43,9 @@ struct intel_context_ops {
>   	void (*unpin)(struct intel_context *ce);
>   	void (*post_unpin)(struct intel_context *ce);
>   
> +	void (*cancel_request)(struct intel_context *ce,
> +			       struct i915_request *rq);

I don't see an implementation for this for the ringbuffer backend.

> +
>   	void (*enter)(struct intel_context *ce);
>   	void (*exit)(struct intel_context *ce);
>   
> @@ -184,6 +188,9 @@ struct intel_context {
>   	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
>   	 */
>   	struct list_head guc_id_link;
> +
> +	/* GuC context blocked fence */
> +	struct i915_sw_fence guc_blocked;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index f9b5f54a5abe..8f6dc0fb49a6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -114,6 +114,7 @@
>   #include "gen8_engine_cs.h"
>   #include "intel_breadcrumbs.h"
>   #include "intel_context.h"
> +#include "intel_engine_heartbeat.h"
>   #include "intel_engine_pm.h"
>   #include "intel_engine_stats.h"
>   #include "intel_execlists_submission.h"
> @@ -2536,11 +2537,26 @@ static int execlists_context_alloc(struct intel_context *ce)
>   	return lrc_alloc(ce, ce->engine);
>   }
>   
> +static void execlists_context_cancel_request(struct intel_context *ce,
> +					     struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine = NULL;
> +
> +	i915_request_active_engine(rq, &engine);
> +
> +	if (engine && intel_engine_pulse(engine))
> +		intel_gt_handle_error(engine->gt, engine->mask, 0,
> +				      "request cancellation by %s",
> +				      current->comm);
> +}
> +
>   static const struct intel_context_ops execlists_context_ops = {
>   	.flags = COPS_HAS_INFLIGHT,
>   
>   	.alloc = execlists_context_alloc,
>   
> +	.cancel_request = execlists_context_cancel_request,
> +
>   	.pre_pin = execlists_context_pre_pin,
>   	.pin = execlists_context_pin,
>   	.unpin = lrc_unpin,
> @@ -3558,6 +3574,8 @@ static const struct intel_context_ops virtual_context_ops = {
>   
>   	.alloc = virtual_context_alloc,
>   
> +	.cancel_request = execlists_context_cancel_request,
> +
>   	.pre_pin = virtual_context_pre_pin,
>   	.pin = virtual_context_pin,
>   	.unpin = lrc_unpin,
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 149990196e3a..1c30d04733ff 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -81,6 +81,11 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
>    */
>   #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
>   #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
> +#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
> +#define SCHED_STATE_NO_LOCK_BLOCKED \
> +	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
> +#define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
> +	(0xffff << SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
>   static inline bool context_enabled(struct intel_context *ce)
>   {
>   	return (atomic_read(&ce->guc_sched_state_no_lock) &
> @@ -116,6 +121,27 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
>   		   &ce->guc_sched_state_no_lock);
>   }
>   
> +static inline u32 context_blocked(struct intel_context *ce)
> +{
> +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> +		SCHED_STATE_NO_LOCK_BLOCKED_MASK) >>
> +		SCHED_STATE_NO_LOCK_BLOCKED_SHIFT;
> +}
> +
> +static inline void incr_context_blocked(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->engine->sched_engine->lock);

It's a bit weird requiring a lock for a variable that is purposely 
called no_lock, but I do get it is not the GuC lock. Can you explain 
which race you're trying to guard against?

> +	atomic_add(SCHED_STATE_NO_LOCK_BLOCKED,
> +		   &ce->guc_sched_state_no_lock);

Do we need an overflow check, or are we guaranteed that the count will 
stay within a certain range?

> +}
> +
> +static inline void decr_context_blocked(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->engine->sched_engine->lock);

GEM_BUG_ON(!context_blocked(ce)) ?

> +	atomic_sub(SCHED_STATE_NO_LOCK_BLOCKED,
> +		   &ce->guc_sched_state_no_lock);
> +}
> +
>   /*
>    * Below is a set of functions which control the GuC scheduling state which
>    * require a lock, aside from the special case where the functions are called
> @@ -403,6 +429,10 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   		if (unlikely(err))
>   			goto out;
>   	}
> +
> +	if (unlikely(context_blocked(ce)))
> +		goto out;

You're not setting any error state here for this aborted request. Will 
the request be automatically re-submitted on unblock? could use a 
comment if that's the case.

> +
>   	enabled = context_enabled(ce);
>   
>   	if (!enabled) {
> @@ -531,6 +561,7 @@ static void __guc_context_destroy(struct intel_context *ce);
>   static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
>   static void guc_signal_context_fence(struct intel_context *ce);
>   static void guc_cancel_context_requests(struct intel_context *ce);
> +static void guc_blocked_fence_complete(struct intel_context *ce);
>   
>   static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   {
> @@ -578,6 +609,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   			}
>   			intel_context_sched_disable_unpin(ce);
>   			atomic_dec(&guc->outstanding_submission_g2h);
> +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> +			guc_blocked_fence_complete(ce);
> +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
>   			intel_context_put(ce);
>   		}
>   	}
> @@ -1339,6 +1374,21 @@ static void guc_context_post_unpin(struct intel_context *ce)
>   	lrc_post_unpin(ce);
>   }
>   
> +static void __guc_context_sched_enable(struct intel_guc *guc,

Why void? this can fail

> +				       struct intel_context *ce)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
> +		ce->guc_id,
> +		GUC_CONTEXT_ENABLE
> +	};
> +
> +	trace_intel_context_sched_enable(ce);
> +
> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> +}
> +
>   static void __guc_context_sched_disable(struct intel_guc *guc,
>   					struct intel_context *ce,
>   					u16 guc_id)
> @@ -1357,17 +1407,131 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>   				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>   }
>   
> +static void guc_blocked_fence_complete(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +
> +	if (!i915_sw_fence_done(&ce->guc_blocked))
> +		i915_sw_fence_complete(&ce->guc_blocked);
> +}
> +
> +static void guc_blocked_fence_reinit(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked));
> +	i915_sw_fence_fini(&ce->guc_blocked);
> +	i915_sw_fence_reinit(&ce->guc_blocked);
> +	i915_sw_fence_await(&ce->guc_blocked);
> +	i915_sw_fence_commit(&ce->guc_blocked);
> +}
> +
>   static u16 prep_context_pending_disable(struct intel_context *ce)
>   {
>   	lockdep_assert_held(&ce->guc_state.lock);
>   
>   	set_context_pending_disable(ce);
>   	clr_context_enabled(ce);
> +	guc_blocked_fence_reinit(ce);
>   	intel_context_get(ce);
>   
>   	return ce->guc_id;
>   }
>   
> +static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> +	unsigned long flags;
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;

engine->uncore->rpm

> +	intel_wakeref_t wakeref;
> +	u16 guc_id;
> +	bool enabled;
> +
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	incr_context_blocked(ce);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	enabled = context_enabled(ce);
> +	if (unlikely(!enabled || submission_disabled(guc))) {
> +		if (enabled)
> +			clr_context_enabled(ce);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +		return &ce->guc_blocked;
> +	}
> +
> +	/*
> +	 * We add +2 here as the schedule disable complete CTB handler calls
> +	 * intel_context_sched_disable_unpin (-2 to pin_count).
> +	 */
> +	atomic_add(2, &ce->pin_count);
> +
> +	guc_id = prep_context_pending_disable(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		__guc_context_sched_disable(guc, ce, guc_id);
> +
> +	return &ce->guc_blocked;
> +}
> +
> +static void guc_context_unblock(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> +	unsigned long flags;
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;

engine->uncore->rpm

> +	intel_wakeref_t wakeref;
> +
> +	GEM_BUG_ON(context_enabled(ce));
> +
> +	if (unlikely(context_blocked(ce) > 1)) {
> +		spin_lock_irqsave(&sched_engine->lock, flags);
> +		if (likely(context_blocked(ce) > 1))
> +			goto decrement;
> +		spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	}
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	if (unlikely(submission_disabled(guc) ||
> +		     !intel_context_is_pinned(ce) ||
> +		     context_pending_disable(ce) ||
> +		     context_blocked(ce) > 1)) {

you've already checked context_blocked > 1 twice above. If you can't 
trust the value to remain stable, maybe keep the spinlock locked for 
more of the flow?

> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +		goto out;
> +	}
> +
> +	set_context_pending_enable(ce);
> +	set_context_enabled(ce);

Shouldn't we set this to enabled only after the H2G has succeeded?

Daniele

> +	intel_context_get(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		__guc_context_sched_enable(guc, ce);
> +
> +out:
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +decrement:
> +	decr_context_blocked(ce);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +static void guc_context_cancel_request(struct intel_context *ce,
> +				       struct i915_request *rq)
> +{
> +	if (i915_sw_fence_signaled(&rq->submit)) {
> +		struct i915_sw_fence *fence = guc_context_block(ce);
> +
> +		i915_sw_fence_wait(fence);
> +		if (!i915_request_completed(rq)) {
> +			__i915_request_skip(rq);
> +			guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head),
> +					true);
> +		}
> +		guc_context_unblock(ce);
> +	}
> +}
> +
>   static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
>   						 u16 guc_id,
>   						 u32 preemption_timeout)
> @@ -1626,6 +1790,8 @@ static const struct intel_context_ops guc_context_ops = {
>   
>   	.ban = guc_context_ban,
>   
> +	.cancel_request = guc_context_cancel_request,
> +
>   	.enter = intel_context_enter_engine,
>   	.exit = intel_context_exit_engine,
>   
> @@ -1821,6 +1987,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>   
>   	.ban = guc_context_ban,
>   
> +	.cancel_request = guc_context_cancel_request,
> +
>   	.enter = guc_virtual_context_enter,
>   	.exit = guc_virtual_context_exit,
>   
> @@ -2290,6 +2458,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   		clr_context_banned(ce);
>   		clr_context_pending_disable(ce);
>   		__guc_signal_context_fence(ce);
> +		guc_blocked_fence_complete(ce);
>   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   
>   		if (banned) {
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index eb109f93ebcb..f3552642b8a1 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -708,18 +708,6 @@ void i915_request_unsubmit(struct i915_request *request)
>   	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>   }
>   
> -static void __cancel_request(struct i915_request *rq)
> -{
> -	struct intel_engine_cs *engine = NULL;
> -
> -	i915_request_active_engine(rq, &engine);
> -
> -	if (engine && intel_engine_pulse(engine))
> -		intel_gt_handle_error(engine->gt, engine->mask, 0,
> -				      "request cancellation by %s",
> -				      current->comm);
> -}
> -
>   void i915_request_cancel(struct i915_request *rq, int error)
>   {
>   	if (!i915_request_set_error_once(rq, error))
> @@ -727,7 +715,7 @@ void i915_request_cancel(struct i915_request *rq, int error)
>   
>   	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
>   
> -	__cancel_request(rq);
> +	intel_context_cancel_request(rq->context, rq);
>   }
>   
>   static int __i915_sw_fence_call


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 43/51] drm/i915/guc: Support request cancellation
@ 2021-07-22 19:56     ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-22 19:56 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 7/16/2021 1:17 PM, Matthew Brost wrote:
> This adds GuC backend support for i915_request_cancel(), which in turn
> makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.

This needs a bit of explanation on why we're using fences for this 
instead of other simpler options.

> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |   9 +
>   drivers/gpu/drm/i915/gt/intel_context.h       |   7 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
>   .../drm/i915/gt/intel_execlists_submission.c  |  18 ++
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 169 ++++++++++++++++++
>   drivers/gpu/drm/i915/i915_request.c           |  14 +-
>   6 files changed, 211 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index dd078a80c3a3..b1e3d00fb1f2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -366,6 +366,12 @@ static int __intel_context_active(struct i915_active *active)
>   	return 0;
>   }
>   
> +static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> +				 enum i915_sw_fence_notify state)
> +{
> +	return NOTIFY_DONE;
> +}
> +
>   void
>   intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   {
> @@ -399,6 +405,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   	ce->guc_id = GUC_INVALID_LRC_ID;
>   	INIT_LIST_HEAD(&ce->guc_id_link);
>   
> +	i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
> +	i915_sw_fence_commit(&ce->guc_blocked);

We need a comment somewhere to explain how we use this blocked fence, 
I.e. that fence starts signaled to indicate unblocked and we re-init it 
to unsignaled status when we need to mark something as blocked.

> +
>   	i915_active_init(&ce->active,
>   			 __intel_context_active, __intel_context_retire, 0);
>   }
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> index 814d9277096a..876bdb08303c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -70,6 +70,13 @@ intel_context_is_pinned(struct intel_context *ce)
>   	return atomic_read(&ce->pin_count);
>   }
>   
> +static inline void intel_context_cancel_request(struct intel_context *ce,
> +						struct i915_request *rq)
> +{
> +	GEM_BUG_ON(!ce->ops->cancel_request);
> +	return ce->ops->cancel_request(ce, rq);
> +}
> +
>   /**
>    * intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status
>    * @ce - the context
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 57c19ee3e313..005a64f2afa7 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -13,6 +13,7 @@
>   #include <linux/types.h>
>   
>   #include "i915_active_types.h"
> +#include "i915_sw_fence.h"
>   #include "i915_utils.h"
>   #include "intel_engine_types.h"
>   #include "intel_sseu.h"
> @@ -42,6 +43,9 @@ struct intel_context_ops {
>   	void (*unpin)(struct intel_context *ce);
>   	void (*post_unpin)(struct intel_context *ce);
>   
> +	void (*cancel_request)(struct intel_context *ce,
> +			       struct i915_request *rq);

I don't see an implementation for this for the ringbuffer backend.

> +
>   	void (*enter)(struct intel_context *ce);
>   	void (*exit)(struct intel_context *ce);
>   
> @@ -184,6 +188,9 @@ struct intel_context {
>   	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
>   	 */
>   	struct list_head guc_id_link;
> +
> +	/* GuC context blocked fence */
> +	struct i915_sw_fence guc_blocked;
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index f9b5f54a5abe..8f6dc0fb49a6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -114,6 +114,7 @@
>   #include "gen8_engine_cs.h"
>   #include "intel_breadcrumbs.h"
>   #include "intel_context.h"
> +#include "intel_engine_heartbeat.h"
>   #include "intel_engine_pm.h"
>   #include "intel_engine_stats.h"
>   #include "intel_execlists_submission.h"
> @@ -2536,11 +2537,26 @@ static int execlists_context_alloc(struct intel_context *ce)
>   	return lrc_alloc(ce, ce->engine);
>   }
>   
> +static void execlists_context_cancel_request(struct intel_context *ce,
> +					     struct i915_request *rq)
> +{
> +	struct intel_engine_cs *engine = NULL;
> +
> +	i915_request_active_engine(rq, &engine);
> +
> +	if (engine && intel_engine_pulse(engine))
> +		intel_gt_handle_error(engine->gt, engine->mask, 0,
> +				      "request cancellation by %s",
> +				      current->comm);
> +}
> +
>   static const struct intel_context_ops execlists_context_ops = {
>   	.flags = COPS_HAS_INFLIGHT,
>   
>   	.alloc = execlists_context_alloc,
>   
> +	.cancel_request = execlists_context_cancel_request,
> +
>   	.pre_pin = execlists_context_pre_pin,
>   	.pin = execlists_context_pin,
>   	.unpin = lrc_unpin,
> @@ -3558,6 +3574,8 @@ static const struct intel_context_ops virtual_context_ops = {
>   
>   	.alloc = virtual_context_alloc,
>   
> +	.cancel_request = execlists_context_cancel_request,
> +
>   	.pre_pin = virtual_context_pre_pin,
>   	.pin = virtual_context_pin,
>   	.unpin = lrc_unpin,
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 149990196e3a..1c30d04733ff 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -81,6 +81,11 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
>    */
>   #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
>   #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
> +#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
> +#define SCHED_STATE_NO_LOCK_BLOCKED \
> +	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
> +#define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
> +	(0xffff << SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
>   static inline bool context_enabled(struct intel_context *ce)
>   {
>   	return (atomic_read(&ce->guc_sched_state_no_lock) &
> @@ -116,6 +121,27 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
>   		   &ce->guc_sched_state_no_lock);
>   }
>   
> +static inline u32 context_blocked(struct intel_context *ce)
> +{
> +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> +		SCHED_STATE_NO_LOCK_BLOCKED_MASK) >>
> +		SCHED_STATE_NO_LOCK_BLOCKED_SHIFT;
> +}
> +
> +static inline void incr_context_blocked(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->engine->sched_engine->lock);

It's a bit weird requiring a lock for a variable that is purposely 
called no_lock, but I do get it is not the GuC lock. Can you explain 
which race you're trying to guard against?

> +	atomic_add(SCHED_STATE_NO_LOCK_BLOCKED,
> +		   &ce->guc_sched_state_no_lock);

Do we need an overflow check, or are we guaranteed that the count will 
stay within a certain range?

> +}
> +
> +static inline void decr_context_blocked(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->engine->sched_engine->lock);

GEM_BUG_ON(!context_blocked(ce)) ?

> +	atomic_sub(SCHED_STATE_NO_LOCK_BLOCKED,
> +		   &ce->guc_sched_state_no_lock);
> +}
> +
>   /*
>    * Below is a set of functions which control the GuC scheduling state which
>    * require a lock, aside from the special case where the functions are called
> @@ -403,6 +429,10 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   		if (unlikely(err))
>   			goto out;
>   	}
> +
> +	if (unlikely(context_blocked(ce)))
> +		goto out;

You're not setting any error state here for this aborted request. Will 
the request be automatically re-submitted on unblock? could use a 
comment if that's the case.

> +
>   	enabled = context_enabled(ce);
>   
>   	if (!enabled) {
> @@ -531,6 +561,7 @@ static void __guc_context_destroy(struct intel_context *ce);
>   static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
>   static void guc_signal_context_fence(struct intel_context *ce);
>   static void guc_cancel_context_requests(struct intel_context *ce);
> +static void guc_blocked_fence_complete(struct intel_context *ce);
>   
>   static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   {
> @@ -578,6 +609,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
>   			}
>   			intel_context_sched_disable_unpin(ce);
>   			atomic_dec(&guc->outstanding_submission_g2h);
> +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> +			guc_blocked_fence_complete(ce);
> +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
>   			intel_context_put(ce);
>   		}
>   	}
> @@ -1339,6 +1374,21 @@ static void guc_context_post_unpin(struct intel_context *ce)
>   	lrc_post_unpin(ce);
>   }
>   
> +static void __guc_context_sched_enable(struct intel_guc *guc,

Why void? this can fail

> +				       struct intel_context *ce)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
> +		ce->guc_id,
> +		GUC_CONTEXT_ENABLE
> +	};
> +
> +	trace_intel_context_sched_enable(ce);
> +
> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> +}
> +
>   static void __guc_context_sched_disable(struct intel_guc *guc,
>   					struct intel_context *ce,
>   					u16 guc_id)
> @@ -1357,17 +1407,131 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
>   				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
>   }
>   
> +static void guc_blocked_fence_complete(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +
> +	if (!i915_sw_fence_done(&ce->guc_blocked))
> +		i915_sw_fence_complete(&ce->guc_blocked);
> +}
> +
> +static void guc_blocked_fence_reinit(struct intel_context *ce)
> +{
> +	lockdep_assert_held(&ce->guc_state.lock);
> +	GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked));
> +	i915_sw_fence_fini(&ce->guc_blocked);
> +	i915_sw_fence_reinit(&ce->guc_blocked);
> +	i915_sw_fence_await(&ce->guc_blocked);
> +	i915_sw_fence_commit(&ce->guc_blocked);
> +}
> +
>   static u16 prep_context_pending_disable(struct intel_context *ce)
>   {
>   	lockdep_assert_held(&ce->guc_state.lock);
>   
>   	set_context_pending_disable(ce);
>   	clr_context_enabled(ce);
> +	guc_blocked_fence_reinit(ce);
>   	intel_context_get(ce);
>   
>   	return ce->guc_id;
>   }
>   
> +static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> +	unsigned long flags;
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;

engine->uncore->rpm

> +	intel_wakeref_t wakeref;
> +	u16 guc_id;
> +	bool enabled;
> +
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +	incr_context_blocked(ce);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	enabled = context_enabled(ce);
> +	if (unlikely(!enabled || submission_disabled(guc))) {
> +		if (enabled)
> +			clr_context_enabled(ce);
> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +		return &ce->guc_blocked;
> +	}
> +
> +	/*
> +	 * We add +2 here as the schedule disable complete CTB handler calls
> +	 * intel_context_sched_disable_unpin (-2 to pin_count).
> +	 */
> +	atomic_add(2, &ce->pin_count);
> +
> +	guc_id = prep_context_pending_disable(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		__guc_context_sched_disable(guc, ce, guc_id);
> +
> +	return &ce->guc_blocked;
> +}
> +
> +static void guc_context_unblock(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = ce_to_guc(ce);
> +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> +	unsigned long flags;
> +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;

engine->uncore->rpm

> +	intel_wakeref_t wakeref;
> +
> +	GEM_BUG_ON(context_enabled(ce));
> +
> +	if (unlikely(context_blocked(ce) > 1)) {
> +		spin_lock_irqsave(&sched_engine->lock, flags);
> +		if (likely(context_blocked(ce) > 1))
> +			goto decrement;
> +		spin_unlock_irqrestore(&sched_engine->lock, flags);
> +	}
> +
> +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> +	if (unlikely(submission_disabled(guc) ||
> +		     !intel_context_is_pinned(ce) ||
> +		     context_pending_disable(ce) ||
> +		     context_blocked(ce) > 1)) {

you've already checked context_blocked > 1 twice above. If you can't 
trust the value to remain stable, maybe keep the spinlock locked for 
more of the flow?

> +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +		goto out;
> +	}
> +
> +	set_context_pending_enable(ce);
> +	set_context_enabled(ce);

Shouldn't we set this to enabled only after the H2G has succeeded?

Daniele

> +	intel_context_get(ce);
> +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> +
> +	with_intel_runtime_pm(runtime_pm, wakeref)
> +		__guc_context_sched_enable(guc, ce);
> +
> +out:
> +	spin_lock_irqsave(&sched_engine->lock, flags);
> +decrement:
> +	decr_context_blocked(ce);
> +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> +}
> +
> +static void guc_context_cancel_request(struct intel_context *ce,
> +				       struct i915_request *rq)
> +{
> +	if (i915_sw_fence_signaled(&rq->submit)) {
> +		struct i915_sw_fence *fence = guc_context_block(ce);
> +
> +		i915_sw_fence_wait(fence);
> +		if (!i915_request_completed(rq)) {
> +			__i915_request_skip(rq);
> +			guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head),
> +					true);
> +		}
> +		guc_context_unblock(ce);
> +	}
> +}
> +
>   static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
>   						 u16 guc_id,
>   						 u32 preemption_timeout)
> @@ -1626,6 +1790,8 @@ static const struct intel_context_ops guc_context_ops = {
>   
>   	.ban = guc_context_ban,
>   
> +	.cancel_request = guc_context_cancel_request,
> +
>   	.enter = intel_context_enter_engine,
>   	.exit = intel_context_exit_engine,
>   
> @@ -1821,6 +1987,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
>   
>   	.ban = guc_context_ban,
>   
> +	.cancel_request = guc_context_cancel_request,
> +
>   	.enter = guc_virtual_context_enter,
>   	.exit = guc_virtual_context_exit,
>   
> @@ -2290,6 +2458,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
>   		clr_context_banned(ce);
>   		clr_context_pending_disable(ce);
>   		__guc_signal_context_fence(ce);
> +		guc_blocked_fence_complete(ce);
>   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
>   
>   		if (banned) {
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index eb109f93ebcb..f3552642b8a1 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -708,18 +708,6 @@ void i915_request_unsubmit(struct i915_request *request)
>   	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
>   }
>   
> -static void __cancel_request(struct i915_request *rq)
> -{
> -	struct intel_engine_cs *engine = NULL;
> -
> -	i915_request_active_engine(rq, &engine);
> -
> -	if (engine && intel_engine_pulse(engine))
> -		intel_gt_handle_error(engine->gt, engine->mask, 0,
> -				      "request cancellation by %s",
> -				      current->comm);
> -}
> -
>   void i915_request_cancel(struct i915_request *rq, int error)
>   {
>   	if (!i915_request_set_error_once(rq, error))
> @@ -727,7 +715,7 @@ void i915_request_cancel(struct i915_request *rq, int error)
>   
>   	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
>   
> -	__cancel_request(rq);
> +	intel_context_cancel_request(rq->context, rq);
>   }
>   
>   static int __i915_sw_fence_call

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 43/51] drm/i915/guc: Support request cancellation
  2021-07-22 19:56     ` [Intel-gfx] " Daniele Ceraolo Spurio
@ 2021-07-22 20:13       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22 20:13 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, john.c.harrison, dri-devel

On Thu, Jul 22, 2021 at 12:56:56PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:17 PM, Matthew Brost wrote:
> > This adds GuC backend support for i915_request_cancel(), which in turn
> > makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.
> 
> This needs a bit of explanation on why we're using fences for this instead
> of other simpler options.
> 

Yep, agree. It doesn't really make sense unless I explain an upcoming
patch leverages guc_context_block and waits on the returned fence. 

> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c       |   9 +
> >   drivers/gpu/drm/i915/gt/intel_context.h       |   7 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
> >   .../drm/i915/gt/intel_execlists_submission.c  |  18 ++
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 169 ++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_request.c           |  14 +-
> >   6 files changed, 211 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index dd078a80c3a3..b1e3d00fb1f2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -366,6 +366,12 @@ static int __intel_context_active(struct i915_active *active)
> >   	return 0;
> >   }
> > +static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> > +				 enum i915_sw_fence_notify state)
> > +{
> > +	return NOTIFY_DONE;
> > +}
> > +
> >   void
> >   intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >   {
> > @@ -399,6 +405,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >   	ce->guc_id = GUC_INVALID_LRC_ID;
> >   	INIT_LIST_HEAD(&ce->guc_id_link);
> > +	i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
> > +	i915_sw_fence_commit(&ce->guc_blocked);
> 
> We need a comment somewhere to explain how we use this blocked fence, I.e.
> that fence starts signaled to indicate unblocked and we re-init it to
> unsignaled status when we need to mark something as blocked.
>

Sure.

> > +
> >   	i915_active_init(&ce->active,
> >   			 __intel_context_active, __intel_context_retire, 0);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> > index 814d9277096a..876bdb08303c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -70,6 +70,13 @@ intel_context_is_pinned(struct intel_context *ce)
> >   	return atomic_read(&ce->pin_count);
> >   }
> > +static inline void intel_context_cancel_request(struct intel_context *ce,
> > +						struct i915_request *rq)
> > +{
> > +	GEM_BUG_ON(!ce->ops->cancel_request);
> > +	return ce->ops->cancel_request(ce, rq);
> > +}
> > +
> >   /**
> >    * intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status
> >    * @ce - the context
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 57c19ee3e313..005a64f2afa7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -13,6 +13,7 @@
> >   #include <linux/types.h>
> >   #include "i915_active_types.h"
> > +#include "i915_sw_fence.h"
> >   #include "i915_utils.h"
> >   #include "intel_engine_types.h"
> >   #include "intel_sseu.h"
> > @@ -42,6 +43,9 @@ struct intel_context_ops {
> >   	void (*unpin)(struct intel_context *ce);
> >   	void (*post_unpin)(struct intel_context *ce);
> > +	void (*cancel_request)(struct intel_context *ce,
> > +			       struct i915_request *rq);
> 
> I don't see an implementation for this for the ringbuffer backend.
>

Yea, I probably need to add that.

> > +
> >   	void (*enter)(struct intel_context *ce);
> >   	void (*exit)(struct intel_context *ce);
> > @@ -184,6 +188,9 @@ struct intel_context {
> >   	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
> >   	 */
> >   	struct list_head guc_id_link;
> > +
> > +	/* GuC context blocked fence */
> > +	struct i915_sw_fence guc_blocked;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index f9b5f54a5abe..8f6dc0fb49a6 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -114,6 +114,7 @@
> >   #include "gen8_engine_cs.h"
> >   #include "intel_breadcrumbs.h"
> >   #include "intel_context.h"
> > +#include "intel_engine_heartbeat.h"
> >   #include "intel_engine_pm.h"
> >   #include "intel_engine_stats.h"
> >   #include "intel_execlists_submission.h"
> > @@ -2536,11 +2537,26 @@ static int execlists_context_alloc(struct intel_context *ce)
> >   	return lrc_alloc(ce, ce->engine);
> >   }
> > +static void execlists_context_cancel_request(struct intel_context *ce,
> > +					     struct i915_request *rq)
> > +{
> > +	struct intel_engine_cs *engine = NULL;
> > +
> > +	i915_request_active_engine(rq, &engine);
> > +
> > +	if (engine && intel_engine_pulse(engine))
> > +		intel_gt_handle_error(engine->gt, engine->mask, 0,
> > +				      "request cancellation by %s",
> > +				      current->comm);
> > +}
> > +
> >   static const struct intel_context_ops execlists_context_ops = {
> >   	.flags = COPS_HAS_INFLIGHT,
> >   	.alloc = execlists_context_alloc,
> > +	.cancel_request = execlists_context_cancel_request,
> > +
> >   	.pre_pin = execlists_context_pre_pin,
> >   	.pin = execlists_context_pin,
> >   	.unpin = lrc_unpin,
> > @@ -3558,6 +3574,8 @@ static const struct intel_context_ops virtual_context_ops = {
> >   	.alloc = virtual_context_alloc,
> > +	.cancel_request = execlists_context_cancel_request,
> > +
> >   	.pre_pin = virtual_context_pre_pin,
> >   	.pin = virtual_context_pin,
> >   	.unpin = lrc_unpin,
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 149990196e3a..1c30d04733ff 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -81,6 +81,11 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> >    */
> >   #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> >   #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
> > +#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
> > +#define SCHED_STATE_NO_LOCK_BLOCKED \
> > +	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
> > +#define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
> > +	(0xffff << SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
> >   static inline bool context_enabled(struct intel_context *ce)
> >   {
> >   	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > @@ -116,6 +121,27 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
> >   		   &ce->guc_sched_state_no_lock);
> >   }
> > +static inline u32 context_blocked(struct intel_context *ce)
> > +{
> > +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > +		SCHED_STATE_NO_LOCK_BLOCKED_MASK) >>
> > +		SCHED_STATE_NO_LOCK_BLOCKED_SHIFT;
> > +}
> > +
> > +static inline void incr_context_blocked(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->engine->sched_engine->lock);
> 
> It's a bit weird requiring a lock for a variable that is purposely called
> no_lock, but I do get it is not the GuC lock. Can you explain which race
> you're trying to guard against?
>

This is tricky, basically it is to make the 'context_block' visible to
the submission path. I'll add a comment.

> > +	atomic_add(SCHED_STATE_NO_LOCK_BLOCKED,
> > +		   &ce->guc_sched_state_no_lock);
> 
> Do we need an overflow check, or are we guaranteed that the count will stay
> within a certain range?
>

I should add a GEM_BUG_ON here.

> > +}
> > +
> > +static inline void decr_context_blocked(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->engine->sched_engine->lock);
> 
> GEM_BUG_ON(!context_blocked(ce)) ?
> 

Yep.

> > +	atomic_sub(SCHED_STATE_NO_LOCK_BLOCKED,
> > +		   &ce->guc_sched_state_no_lock);
> > +}
> > +
> >   /*
> >    * Below is a set of functions which control the GuC scheduling state which
> >    * require a lock, aside from the special case where the functions are called
> > @@ -403,6 +429,10 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   		if (unlikely(err))
> >   			goto out;
> >   	}
> > +
> > +	if (unlikely(context_blocked(ce)))
> > +		goto out;
> 
> You're not setting any error state here for this aborted request. Will the
> request be automatically re-submitted on unblock? could use a comment if
> that's the case.
>

Yes, when scheduling gets enabled by guc_context_unblock the request
can start running. Can add a comment explaining that.

> > +
> >   	enabled = context_enabled(ce);
> >   	if (!enabled) {
> > @@ -531,6 +561,7 @@ static void __guc_context_destroy(struct intel_context *ce);
> >   static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> >   static void guc_signal_context_fence(struct intel_context *ce);
> >   static void guc_cancel_context_requests(struct intel_context *ce);
> > +static void guc_blocked_fence_complete(struct intel_context *ce);
> >   static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> >   {
> > @@ -578,6 +609,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> >   			}
> >   			intel_context_sched_disable_unpin(ce);
> >   			atomic_dec(&guc->outstanding_submission_g2h);
> > +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +			guc_blocked_fence_complete(ce);
> > +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> >   			intel_context_put(ce);
> >   		}
> >   	}
> > @@ -1339,6 +1374,21 @@ static void guc_context_post_unpin(struct intel_context *ce)
> >   	lrc_post_unpin(ce);
> >   }
> > +static void __guc_context_sched_enable(struct intel_guc *guc,
> 
> Why void? this can fail
>

This is a case where we don't care it it fails as that means a full GT
reset will happen. This isn't the H2G that we do this with (e.g.
__guc_context_sched_disable is the same).

> > +				       struct intel_context *ce)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
> > +		ce->guc_id,
> > +		GUC_CONTEXT_ENABLE
> > +	};
> > +
> > +	trace_intel_context_sched_enable(ce);
> > +
> > +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > +}
> > +
> >   static void __guc_context_sched_disable(struct intel_guc *guc,
> >   					struct intel_context *ce,
> >   					u16 guc_id)
> > @@ -1357,17 +1407,131 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> >   				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> >   }
> > +static void guc_blocked_fence_complete(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +
> > +	if (!i915_sw_fence_done(&ce->guc_blocked))
> > +		i915_sw_fence_complete(&ce->guc_blocked);
> > +}
> > +
> > +static void guc_blocked_fence_reinit(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked));
> > +	i915_sw_fence_fini(&ce->guc_blocked);
> > +	i915_sw_fence_reinit(&ce->guc_blocked);
> > +	i915_sw_fence_await(&ce->guc_blocked);
> > +	i915_sw_fence_commit(&ce->guc_blocked);
> > +}
> > +
> >   static u16 prep_context_pending_disable(struct intel_context *ce)
> >   {
> >   	lockdep_assert_held(&ce->guc_state.lock);
> >   	set_context_pending_disable(ce);
> >   	clr_context_enabled(ce);
> > +	guc_blocked_fence_reinit(ce);
> >   	intel_context_get(ce);
> >   	return ce->guc_id;
> >   }
> > +static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> > +	unsigned long flags;
> > +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> 
> engine->uncore->rpm
>

Yep.

> > +	intel_wakeref_t wakeref;
> > +	u16 guc_id;
> > +	bool enabled;
> > +
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	incr_context_blocked(ce);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	enabled = context_enabled(ce);
> > +	if (unlikely(!enabled || submission_disabled(guc))) {
> > +		if (enabled)
> > +			clr_context_enabled(ce);
> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +		return &ce->guc_blocked;
> > +	}
> > +
> > +	/*
> > +	 * We add +2 here as the schedule disable complete CTB handler calls
> > +	 * intel_context_sched_disable_unpin (-2 to pin_count).
> > +	 */
> > +	atomic_add(2, &ce->pin_count);
> > +
> > +	guc_id = prep_context_pending_disable(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +	with_intel_runtime_pm(runtime_pm, wakeref)
> > +		__guc_context_sched_disable(guc, ce, guc_id);
> > +
> > +	return &ce->guc_blocked;
> > +}
> > +
> > +static void guc_context_unblock(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> > +	unsigned long flags;
> > +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> 
> engine->uncore->rpm
> 

Yep.

> > +	intel_wakeref_t wakeref;
> > +
> > +	GEM_BUG_ON(context_enabled(ce));
> > +
> > +	if (unlikely(context_blocked(ce) > 1)) {
> > +		spin_lock_irqsave(&sched_engine->lock, flags);
> > +		if (likely(context_blocked(ce) > 1))
> > +			goto decrement;
> > +		spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +	}
> > +
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	if (unlikely(submission_disabled(guc) ||
> > +		     !intel_context_is_pinned(ce) ||
> > +		     context_pending_disable(ce) ||
> > +		     context_blocked(ce) > 1)) {
> 
> you've already checked context_blocked > 1 twice above. If you can't trust
> the value to remain stable, maybe keep the spinlock locked for more of the
> flow?
> 

I really don't know what I was thinking with the checks above this,
maybe trying to optimize this to avoid taking the spin lock? The above
checks can / should be deleted.

> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +		goto out;
> > +	}
> > +
> > +	set_context_pending_enable(ce);
> > +	set_context_enabled(ce);
> 
> Shouldn't we set this to enabled only after the H2G has succeeded?
> 

No. We always enable immediately - see the submission path. It is
enabled from the point of view if you submit something after this it is
enabled as the schedule enable is in the queue. Again if this fails for
some reason it means we to do a full GT reset and don't care.

Matt

> Daniele
> 
> > +	intel_context_get(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +	with_intel_runtime_pm(runtime_pm, wakeref)
> > +		__guc_context_sched_enable(guc, ce);
> > +
> > +out:
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +decrement:
> > +	decr_context_blocked(ce);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +static void guc_context_cancel_request(struct intel_context *ce,
> > +				       struct i915_request *rq)
> > +{
> > +	if (i915_sw_fence_signaled(&rq->submit)) {
> > +		struct i915_sw_fence *fence = guc_context_block(ce);
> > +
> > +		i915_sw_fence_wait(fence);
> > +		if (!i915_request_completed(rq)) {
> > +			__i915_request_skip(rq);
> > +			guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head),
> > +					true);
> > +		}
> > +		guc_context_unblock(ce);
> > +	}
> > +}
> > +
> >   static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> >   						 u16 guc_id,
> >   						 u32 preemption_timeout)
> > @@ -1626,6 +1790,8 @@ static const struct intel_context_ops guc_context_ops = {
> >   	.ban = guc_context_ban,
> > +	.cancel_request = guc_context_cancel_request,
> > +
> >   	.enter = intel_context_enter_engine,
> >   	.exit = intel_context_exit_engine,
> > @@ -1821,6 +1987,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
> >   	.ban = guc_context_ban,
> > +	.cancel_request = guc_context_cancel_request,
> > +
> >   	.enter = guc_virtual_context_enter,
> >   	.exit = guc_virtual_context_exit,
> > @@ -2290,6 +2458,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   		clr_context_banned(ce);
> >   		clr_context_pending_disable(ce);
> >   		__guc_signal_context_fence(ce);
> > +		guc_blocked_fence_complete(ce);
> >   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >   		if (banned) {
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index eb109f93ebcb..f3552642b8a1 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -708,18 +708,6 @@ void i915_request_unsubmit(struct i915_request *request)
> >   	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >   }
> > -static void __cancel_request(struct i915_request *rq)
> > -{
> > -	struct intel_engine_cs *engine = NULL;
> > -
> > -	i915_request_active_engine(rq, &engine);
> > -
> > -	if (engine && intel_engine_pulse(engine))
> > -		intel_gt_handle_error(engine->gt, engine->mask, 0,
> > -				      "request cancellation by %s",
> > -				      current->comm);
> > -}
> > -
> >   void i915_request_cancel(struct i915_request *rq, int error)
> >   {
> >   	if (!i915_request_set_error_once(rq, error))
> > @@ -727,7 +715,7 @@ void i915_request_cancel(struct i915_request *rq, int error)
> >   	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> > -	__cancel_request(rq);
> > +	intel_context_cancel_request(rq->context, rq);
> >   }
> >   static int __i915_sw_fence_call
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 43/51] drm/i915/guc: Support request cancellation
@ 2021-07-22 20:13       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22 20:13 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, dri-devel

On Thu, Jul 22, 2021 at 12:56:56PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:17 PM, Matthew Brost wrote:
> > This adds GuC backend support for i915_request_cancel(), which in turn
> > makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.
> 
> This needs a bit of explanation on why we're using fences for this instead
> of other simpler options.
> 

Yep, agree. It doesn't really make sense unless I explain an upcoming
patch leverages guc_context_block and waits on the returned fence. 

> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c       |   9 +
> >   drivers/gpu/drm/i915/gt/intel_context.h       |   7 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   7 +
> >   .../drm/i915/gt/intel_execlists_submission.c  |  18 ++
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 169 ++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_request.c           |  14 +-
> >   6 files changed, 211 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> > index dd078a80c3a3..b1e3d00fb1f2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -366,6 +366,12 @@ static int __intel_context_active(struct i915_active *active)
> >   	return 0;
> >   }
> > +static int sw_fence_dummy_notify(struct i915_sw_fence *sf,
> > +				 enum i915_sw_fence_notify state)
> > +{
> > +	return NOTIFY_DONE;
> > +}
> > +
> >   void
> >   intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >   {
> > @@ -399,6 +405,9 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
> >   	ce->guc_id = GUC_INVALID_LRC_ID;
> >   	INIT_LIST_HEAD(&ce->guc_id_link);
> > +	i915_sw_fence_init(&ce->guc_blocked, sw_fence_dummy_notify);
> > +	i915_sw_fence_commit(&ce->guc_blocked);
> 
> We need a comment somewhere to explain how we use this blocked fence, I.e.
> that fence starts signaled to indicate unblocked and we re-init it to
> unsignaled status when we need to mark something as blocked.
>

Sure.

> > +
> >   	i915_active_init(&ce->active,
> >   			 __intel_context_active, __intel_context_retire, 0);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> > index 814d9277096a..876bdb08303c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -70,6 +70,13 @@ intel_context_is_pinned(struct intel_context *ce)
> >   	return atomic_read(&ce->pin_count);
> >   }
> > +static inline void intel_context_cancel_request(struct intel_context *ce,
> > +						struct i915_request *rq)
> > +{
> > +	GEM_BUG_ON(!ce->ops->cancel_request);
> > +	return ce->ops->cancel_request(ce, rq);
> > +}
> > +
> >   /**
> >    * intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status
> >    * @ce - the context
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 57c19ee3e313..005a64f2afa7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -13,6 +13,7 @@
> >   #include <linux/types.h>
> >   #include "i915_active_types.h"
> > +#include "i915_sw_fence.h"
> >   #include "i915_utils.h"
> >   #include "intel_engine_types.h"
> >   #include "intel_sseu.h"
> > @@ -42,6 +43,9 @@ struct intel_context_ops {
> >   	void (*unpin)(struct intel_context *ce);
> >   	void (*post_unpin)(struct intel_context *ce);
> > +	void (*cancel_request)(struct intel_context *ce,
> > +			       struct i915_request *rq);
> 
> I don't see an implementation for this for the ringbuffer backend.
>

Yea, I probably need to add that.

> > +
> >   	void (*enter)(struct intel_context *ce);
> >   	void (*exit)(struct intel_context *ce);
> > @@ -184,6 +188,9 @@ struct intel_context {
> >   	 * GuC ID link - in list when unpinned but guc_id still valid in GuC
> >   	 */
> >   	struct list_head guc_id_link;
> > +
> > +	/* GuC context blocked fence */
> > +	struct i915_sw_fence guc_blocked;
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index f9b5f54a5abe..8f6dc0fb49a6 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -114,6 +114,7 @@
> >   #include "gen8_engine_cs.h"
> >   #include "intel_breadcrumbs.h"
> >   #include "intel_context.h"
> > +#include "intel_engine_heartbeat.h"
> >   #include "intel_engine_pm.h"
> >   #include "intel_engine_stats.h"
> >   #include "intel_execlists_submission.h"
> > @@ -2536,11 +2537,26 @@ static int execlists_context_alloc(struct intel_context *ce)
> >   	return lrc_alloc(ce, ce->engine);
> >   }
> > +static void execlists_context_cancel_request(struct intel_context *ce,
> > +					     struct i915_request *rq)
> > +{
> > +	struct intel_engine_cs *engine = NULL;
> > +
> > +	i915_request_active_engine(rq, &engine);
> > +
> > +	if (engine && intel_engine_pulse(engine))
> > +		intel_gt_handle_error(engine->gt, engine->mask, 0,
> > +				      "request cancellation by %s",
> > +				      current->comm);
> > +}
> > +
> >   static const struct intel_context_ops execlists_context_ops = {
> >   	.flags = COPS_HAS_INFLIGHT,
> >   	.alloc = execlists_context_alloc,
> > +	.cancel_request = execlists_context_cancel_request,
> > +
> >   	.pre_pin = execlists_context_pre_pin,
> >   	.pin = execlists_context_pin,
> >   	.unpin = lrc_unpin,
> > @@ -3558,6 +3574,8 @@ static const struct intel_context_ops virtual_context_ops = {
> >   	.alloc = virtual_context_alloc,
> > +	.cancel_request = execlists_context_cancel_request,
> > +
> >   	.pre_pin = virtual_context_pre_pin,
> >   	.pin = virtual_context_pin,
> >   	.unpin = lrc_unpin,
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 149990196e3a..1c30d04733ff 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -81,6 +81,11 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> >    */
> >   #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> >   #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
> > +#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
> > +#define SCHED_STATE_NO_LOCK_BLOCKED \
> > +	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
> > +#define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
> > +	(0xffff << SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
> >   static inline bool context_enabled(struct intel_context *ce)
> >   {
> >   	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > @@ -116,6 +121,27 @@ static inline void clr_context_pending_enable(struct intel_context *ce)
> >   		   &ce->guc_sched_state_no_lock);
> >   }
> > +static inline u32 context_blocked(struct intel_context *ce)
> > +{
> > +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > +		SCHED_STATE_NO_LOCK_BLOCKED_MASK) >>
> > +		SCHED_STATE_NO_LOCK_BLOCKED_SHIFT;
> > +}
> > +
> > +static inline void incr_context_blocked(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->engine->sched_engine->lock);
> 
> It's a bit weird requiring a lock for a variable that is purposely called
> no_lock, but I do get it is not the GuC lock. Can you explain which race
> you're trying to guard against?
>

This is tricky, basically it is to make the 'context_block' visible to
the submission path. I'll add a comment.

> > +	atomic_add(SCHED_STATE_NO_LOCK_BLOCKED,
> > +		   &ce->guc_sched_state_no_lock);
> 
> Do we need an overflow check, or are we guaranteed that the count will stay
> within a certain range?
>

I should add a GEM_BUG_ON here.

> > +}
> > +
> > +static inline void decr_context_blocked(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->engine->sched_engine->lock);
> 
> GEM_BUG_ON(!context_blocked(ce)) ?
> 

Yep.

> > +	atomic_sub(SCHED_STATE_NO_LOCK_BLOCKED,
> > +		   &ce->guc_sched_state_no_lock);
> > +}
> > +
> >   /*
> >    * Below is a set of functions which control the GuC scheduling state which
> >    * require a lock, aside from the special case where the functions are called
> > @@ -403,6 +429,10 @@ static int guc_add_request(struct intel_guc *guc, struct i915_request *rq)
> >   		if (unlikely(err))
> >   			goto out;
> >   	}
> > +
> > +	if (unlikely(context_blocked(ce)))
> > +		goto out;
> 
> You're not setting any error state here for this aborted request. Will the
> request be automatically re-submitted on unblock? could use a comment if
> that's the case.
>

Yes, when scheduling gets enabled by guc_context_unblock the request
can start running. Can add a comment explaining that.

> > +
> >   	enabled = context_enabled(ce);
> >   	if (!enabled) {
> > @@ -531,6 +561,7 @@ static void __guc_context_destroy(struct intel_context *ce);
> >   static void release_guc_id(struct intel_guc *guc, struct intel_context *ce);
> >   static void guc_signal_context_fence(struct intel_context *ce);
> >   static void guc_cancel_context_requests(struct intel_context *ce);
> > +static void guc_blocked_fence_complete(struct intel_context *ce);
> >   static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> >   {
> > @@ -578,6 +609,10 @@ static void scrub_guc_desc_for_outstanding_g2h(struct intel_guc *guc)
> >   			}
> >   			intel_context_sched_disable_unpin(ce);
> >   			atomic_dec(&guc->outstanding_submission_g2h);
> > +			spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +			guc_blocked_fence_complete(ce);
> > +			spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> >   			intel_context_put(ce);
> >   		}
> >   	}
> > @@ -1339,6 +1374,21 @@ static void guc_context_post_unpin(struct intel_context *ce)
> >   	lrc_post_unpin(ce);
> >   }
> > +static void __guc_context_sched_enable(struct intel_guc *guc,
> 
> Why void? this can fail
>

This is a case where we don't care it it fails as that means a full GT
reset will happen. This isn't the H2G that we do this with (e.g.
__guc_context_sched_disable is the same).

> > +				       struct intel_context *ce)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET,
> > +		ce->guc_id,
> > +		GUC_CONTEXT_ENABLE
> > +	};
> > +
> > +	trace_intel_context_sched_enable(ce);
> > +
> > +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action),
> > +				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> > +}
> > +
> >   static void __guc_context_sched_disable(struct intel_guc *guc,
> >   					struct intel_context *ce,
> >   					u16 guc_id)
> > @@ -1357,17 +1407,131 @@ static void __guc_context_sched_disable(struct intel_guc *guc,
> >   				      G2H_LEN_DW_SCHED_CONTEXT_MODE_SET, true);
> >   }
> > +static void guc_blocked_fence_complete(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +
> > +	if (!i915_sw_fence_done(&ce->guc_blocked))
> > +		i915_sw_fence_complete(&ce->guc_blocked);
> > +}
> > +
> > +static void guc_blocked_fence_reinit(struct intel_context *ce)
> > +{
> > +	lockdep_assert_held(&ce->guc_state.lock);
> > +	GEM_BUG_ON(!i915_sw_fence_done(&ce->guc_blocked));
> > +	i915_sw_fence_fini(&ce->guc_blocked);
> > +	i915_sw_fence_reinit(&ce->guc_blocked);
> > +	i915_sw_fence_await(&ce->guc_blocked);
> > +	i915_sw_fence_commit(&ce->guc_blocked);
> > +}
> > +
> >   static u16 prep_context_pending_disable(struct intel_context *ce)
> >   {
> >   	lockdep_assert_held(&ce->guc_state.lock);
> >   	set_context_pending_disable(ce);
> >   	clr_context_enabled(ce);
> > +	guc_blocked_fence_reinit(ce);
> >   	intel_context_get(ce);
> >   	return ce->guc_id;
> >   }
> > +static struct i915_sw_fence *guc_context_block(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> > +	unsigned long flags;
> > +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> 
> engine->uncore->rpm
>

Yep.

> > +	intel_wakeref_t wakeref;
> > +	u16 guc_id;
> > +	bool enabled;
> > +
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +	incr_context_blocked(ce);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	enabled = context_enabled(ce);
> > +	if (unlikely(!enabled || submission_disabled(guc))) {
> > +		if (enabled)
> > +			clr_context_enabled(ce);
> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +		return &ce->guc_blocked;
> > +	}
> > +
> > +	/*
> > +	 * We add +2 here as the schedule disable complete CTB handler calls
> > +	 * intel_context_sched_disable_unpin (-2 to pin_count).
> > +	 */
> > +	atomic_add(2, &ce->pin_count);
> > +
> > +	guc_id = prep_context_pending_disable(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +	with_intel_runtime_pm(runtime_pm, wakeref)
> > +		__guc_context_sched_disable(guc, ce, guc_id);
> > +
> > +	return &ce->guc_blocked;
> > +}
> > +
> > +static void guc_context_unblock(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = ce_to_guc(ce);
> > +	struct i915_sched_engine *sched_engine = ce->engine->sched_engine;
> > +	unsigned long flags;
> > +	struct intel_runtime_pm *runtime_pm = &ce->engine->gt->i915->runtime_pm;
> 
> engine->uncore->rpm
> 

Yep.

> > +	intel_wakeref_t wakeref;
> > +
> > +	GEM_BUG_ON(context_enabled(ce));
> > +
> > +	if (unlikely(context_blocked(ce) > 1)) {
> > +		spin_lock_irqsave(&sched_engine->lock, flags);
> > +		if (likely(context_blocked(ce) > 1))
> > +			goto decrement;
> > +		spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +	}
> > +
> > +	spin_lock_irqsave(&ce->guc_state.lock, flags);
> > +	if (unlikely(submission_disabled(guc) ||
> > +		     !intel_context_is_pinned(ce) ||
> > +		     context_pending_disable(ce) ||
> > +		     context_blocked(ce) > 1)) {
> 
> you've already checked context_blocked > 1 twice above. If you can't trust
> the value to remain stable, maybe keep the spinlock locked for more of the
> flow?
> 

I really don't know what I was thinking with the checks above this,
maybe trying to optimize this to avoid taking the spin lock? The above
checks can / should be deleted.

> > +		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +		goto out;
> > +	}
> > +
> > +	set_context_pending_enable(ce);
> > +	set_context_enabled(ce);
> 
> Shouldn't we set this to enabled only after the H2G has succeeded?
> 

No. We always enable immediately - see the submission path. It is
enabled from the point of view if you submit something after this it is
enabled as the schedule enable is in the queue. Again if this fails for
some reason it means we to do a full GT reset and don't care.

Matt

> Daniele
> 
> > +	intel_context_get(ce);
> > +	spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > +	with_intel_runtime_pm(runtime_pm, wakeref)
> > +		__guc_context_sched_enable(guc, ce);
> > +
> > +out:
> > +	spin_lock_irqsave(&sched_engine->lock, flags);
> > +decrement:
> > +	decr_context_blocked(ce);
> > +	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > +}
> > +
> > +static void guc_context_cancel_request(struct intel_context *ce,
> > +				       struct i915_request *rq)
> > +{
> > +	if (i915_sw_fence_signaled(&rq->submit)) {
> > +		struct i915_sw_fence *fence = guc_context_block(ce);
> > +
> > +		i915_sw_fence_wait(fence);
> > +		if (!i915_request_completed(rq)) {
> > +			__i915_request_skip(rq);
> > +			guc_reset_state(ce, intel_ring_wrap(ce->ring, rq->head),
> > +					true);
> > +		}
> > +		guc_context_unblock(ce);
> > +	}
> > +}
> > +
> >   static void __guc_context_set_preemption_timeout(struct intel_guc *guc,
> >   						 u16 guc_id,
> >   						 u32 preemption_timeout)
> > @@ -1626,6 +1790,8 @@ static const struct intel_context_ops guc_context_ops = {
> >   	.ban = guc_context_ban,
> > +	.cancel_request = guc_context_cancel_request,
> > +
> >   	.enter = intel_context_enter_engine,
> >   	.exit = intel_context_exit_engine,
> > @@ -1821,6 +1987,8 @@ static const struct intel_context_ops virtual_guc_context_ops = {
> >   	.ban = guc_context_ban,
> > +	.cancel_request = guc_context_cancel_request,
> > +
> >   	.enter = guc_virtual_context_enter,
> >   	.exit = guc_virtual_context_exit,
> > @@ -2290,6 +2458,7 @@ int intel_guc_sched_done_process_msg(struct intel_guc *guc,
> >   		clr_context_banned(ce);
> >   		clr_context_pending_disable(ce);
> >   		__guc_signal_context_fence(ce);
> > +		guc_blocked_fence_complete(ce);
> >   		spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> >   		if (banned) {
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index eb109f93ebcb..f3552642b8a1 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -708,18 +708,6 @@ void i915_request_unsubmit(struct i915_request *request)
> >   	spin_unlock_irqrestore(&engine->sched_engine->lock, flags);
> >   }
> > -static void __cancel_request(struct i915_request *rq)
> > -{
> > -	struct intel_engine_cs *engine = NULL;
> > -
> > -	i915_request_active_engine(rq, &engine);
> > -
> > -	if (engine && intel_engine_pulse(engine))
> > -		intel_gt_handle_error(engine->gt, engine->mask, 0,
> > -				      "request cancellation by %s",
> > -				      current->comm);
> > -}
> > -
> >   void i915_request_cancel(struct i915_request *rq, int error)
> >   {
> >   	if (!i915_request_set_error_once(rq, error))
> > @@ -727,7 +715,7 @@ void i915_request_cancel(struct i915_request *rq, int error)
> >   	set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
> > -	__cancel_request(rq);
> > +	intel_context_cancel_request(rq->context, rq);
> >   }
> >   static int __i915_sw_fence_call
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 50/51] drm/i915/guc: Implement GuC priority management
  2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
@ 2021-07-22 20:26     ` Daniele Ceraolo Spurio
  -1 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-22 20:26 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel; +Cc: john.c.harrison



On 7/16/2021 1:17 PM, Matthew Brost wrote:
> Implement a simple static mapping algorithm of the i915 priority levels
> (int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as
> follows:
>
> i915 level < 0          -> GuC low level     (3)
> i915 level == 0         -> GuC normal level  (2)
> i915 level < INT_MAX    -> GuC high level    (1)
> i915 level == INT_MAX   -> GuC highest level (0)
>
> We believe this mapping should cover the UMD use cases (3 distinct user
> levels + 1 kernel level).
>
> In addition to static mapping, a simple counter system is attached to
> each context tracking the number of requests inflight on the context at
> each level. This is needed as the GuC levels are per context while in
> the i915 levels are per request.

As a general note on this patch, IMO we could do a better job of 
integrating context-level priority with request-level prio. However, I'm 
aware that this code is going to be overhauled again for drm scheduler 
so I'm not going to comment in that direction and I'll review the result 
again once the drm scheduler patches are available.

>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   3 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
>   drivers/gpu/drm/i915/gt/intel_engine_user.c   |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 207 +++++++++++++++++-
>   drivers/gpu/drm/i915/i915_request.c           |   5 +
>   drivers/gpu/drm/i915/i915_request.h           |   8 +
>   drivers/gpu/drm/i915/i915_scheduler.c         |   7 +
>   drivers/gpu/drm/i915/i915_scheduler_types.h   |  12 +
>   drivers/gpu/drm/i915/i915_trace.h             |  16 +-
>   include/uapi/drm/i915_drm.h                   |   9 +
>   10 files changed, 274 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 2007dc6f6b99..209cf265bf74 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -245,6 +245,9 @@ static void signal_irq_work(struct irq_work *work)
>   			llist_entry(signal, typeof(*rq), signal_node);
>   		struct list_head cb_list;
>   
> +		if (rq->engine->sched_engine->retire_inflight_request_prio)
> +			rq->engine->sched_engine->retire_inflight_request_prio(rq);
> +
>   		spin_lock(&rq->lock);
>   		list_replace(&rq->fence.cb_list, &cb_list);
>   		__dma_fence_signal__timestamp(&rq->fence, timestamp);
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 005a64f2afa7..fe555551c2d2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -18,8 +18,9 @@
>   #include "intel_engine_types.h"
>   #include "intel_sseu.h"
>   
> -#define CONTEXT_REDZONE POISON_INUSE
> +#include "uc/intel_guc_fwif.h"
>   
> +#define CONTEXT_REDZONE POISON_INUSE
>   DECLARE_EWMA(runtime, 3, 8);
>   
>   struct i915_gem_context;
> @@ -191,6 +192,12 @@ struct intel_context {
>   
>   	/* GuC context blocked fence */
>   	struct i915_sw_fence guc_blocked;
> +
> +	/*
> +	 * GuC priority management
> +	 */
> +	u8 guc_prio;
> +	u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 84142127ebd8..8f8bea08e734 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -11,6 +11,7 @@
>   #include "intel_engine.h"
>   #include "intel_engine_user.h"
>   #include "intel_gt.h"
> +#include "uc/intel_guc_submission.h"
>   
>   struct intel_engine_cs *
>   intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
> @@ -115,6 +116,9 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
>   			disabled |= (I915_SCHEDULER_CAP_ENABLED |
>   				     I915_SCHEDULER_CAP_PRIORITY);
>   
> +		if (intel_uc_uses_guc_submission(&i915->gt.uc))
> +			enabled |= I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
> +
>   		for (i = 0; i < ARRAY_SIZE(map); i++) {
>   			if (engine->flags & BIT(map[i].engine))
>   				enabled |= BIT(map[i].sched);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 536fdbc406c6..263ad6a9e4a9 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -81,7 +81,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
>    */
>   #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
>   #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
> -#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
> +#define SCHED_STATE_NO_LOCK_REGISTERED			BIT(2)
> +#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		3
>   #define SCHED_STATE_NO_LOCK_BLOCKED \
>   	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
>   #define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
> @@ -142,6 +143,24 @@ static inline void decr_context_blocked(struct intel_context *ce)
>   		   &ce->guc_sched_state_no_lock);
>   }
>   
> +static inline bool context_registered(struct intel_context *ce)
> +{
> +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> +		SCHED_STATE_NO_LOCK_REGISTERED);
> +}
> +
> +static inline void set_context_registered(struct intel_context *ce)
> +{
> +	atomic_or(SCHED_STATE_NO_LOCK_REGISTERED,
> +		  &ce->guc_sched_state_no_lock);
> +}
> +
> +static inline void clr_context_registered(struct intel_context *ce)
> +{
> +	atomic_and((u32)~SCHED_STATE_NO_LOCK_REGISTERED,
> +		   &ce->guc_sched_state_no_lock);
> +}
> +
>   /*
>    * Below is a set of functions which control the GuC scheduling state which
>    * require a lock, aside from the special case where the functions are called
> @@ -1080,6 +1099,7 @@ static int steal_guc_id(struct intel_guc *guc)
>   
>   		list_del_init(&ce->guc_id_link);
>   		guc_id = ce->guc_id;
> +		clr_context_registered(ce);
>   		set_context_guc_id_invalid(ce);
>   		return guc_id;
>   	} else {
> @@ -1184,10 +1204,13 @@ static int register_context(struct intel_context *ce, bool loop)
>   	struct intel_guc *guc = ce_to_guc(ce);
>   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
>   		ce->guc_id * sizeof(struct guc_lrc_desc);
> +	int ret;
>   
>   	trace_intel_context_register(ce);
>   
> -	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
> +	ret = __guc_action_register_context(guc, ce->guc_id, offset, loop);
> +	set_context_registered(ce);

Should this setting be conditional to ret == 0 ?

> +	return ret;
>   }
>   
>   static int __guc_action_deregister_context(struct intel_guc *guc,
> @@ -1243,6 +1266,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>   	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
>   }
>   
> +static inline u8 map_i915_prio_to_guc_prio(int prio);
> +
>   static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   {
>   	struct intel_runtime_pm *runtime_pm =
> @@ -1251,6 +1276,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   	struct intel_guc *guc = &engine->gt->uc.guc;
>   	u32 desc_idx = ce->guc_id;
>   	struct guc_lrc_desc *desc;
> +	const struct i915_gem_context *ctx;
> +	int prio = I915_CONTEXT_DEFAULT_PRIORITY;
>   	bool context_registered;
>   	intel_wakeref_t wakeref;
>   	int ret = 0;
> @@ -1266,6 +1293,12 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   
>   	context_registered = lrc_desc_registered(guc, desc_idx);
>   
> +	rcu_read_lock();
> +	ctx = rcu_dereference(ce->gem_context);
> +	if (ctx)
> +		prio = ctx->sched.priority;
> +	rcu_read_unlock();
> +
>   	reset_lrc_desc(guc, desc_idx);
>   	set_lrc_desc_registered(guc, desc_idx, ce);
>   
> @@ -1274,7 +1307,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   	desc->engine_submit_mask = adjust_engine_mask(engine->class,
>   						      engine->mask);
>   	desc->hw_context_desc = ce->lrc.lrca;
> -	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	ce->guc_prio = map_i915_prio_to_guc_prio(prio);
> +	desc->priority = ce->guc_prio;
>   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
>   	guc_context_policy_init(engine, desc);
>   	init_sched_state(ce);
> @@ -1659,11 +1693,17 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
>   	GEM_BUG_ON(context_enabled(ce));
>   
> +	clr_context_registered(ce);
>   	deregister_context(ce, ce->guc_id, true);
>   }
>   
>   static void __guc_context_destroy(struct intel_context *ce)
>   {
> +	GEM_BUG_ON(ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] ||
> +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] ||
> +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] ||
> +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]);
> +
>   	lrc_fini(ce);
>   	intel_context_fini(ce);
>   
> @@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
>   	return lrc_alloc(ce, ce->engine);
>   }
>   
> +static void guc_context_set_prio(struct intel_guc *guc,
> +				 struct intel_context *ce,
> +				 u8 prio)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
> +		ce->guc_id,
> +		prio,
> +	};
> +
> +	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
> +		   prio > GUC_CLIENT_PRIORITY_NORMAL);
> +
> +	if (ce->guc_prio == prio || submission_disabled(guc) ||
> +	    !context_registered(ce))
> +		return;
> +
> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +
> +	ce->guc_prio = prio;
> +	trace_intel_context_set_prio(ce);
> +}
> +
> +static inline u8 map_i915_prio_to_guc_prio(int prio)
> +{
> +	if (prio == I915_PRIORITY_NORMAL)
> +		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	else if (prio < I915_PRIORITY_NORMAL)
> +		return GUC_CLIENT_PRIORITY_NORMAL;
> +	else if (prio != I915_PRIORITY_BARRIER)

Shouldn't this be I915_PRIORITY_UNPREEMPTABLE?

> +		return GUC_CLIENT_PRIORITY_HIGH;
> +	else
> +		return GUC_CLIENT_PRIORITY_KMD_HIGH;
> +}
> +
> +static inline void add_context_inflight_prio(struct intel_context *ce,
> +					     u8 guc_prio)
> +{
> +	lockdep_assert_held(&ce->guc_active.lock);
> +	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
> +
> +	++ce->guc_prio_count[guc_prio];
> +
> +	/* Overflow protection */
> +	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
> +}
> +
> +static inline void sub_context_inflight_prio(struct intel_context *ce,
> +					     u8 guc_prio)
> +{
> +	lockdep_assert_held(&ce->guc_active.lock);
> +	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
> +
> +	/* Underflow protection */
> +	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
> +
> +	--ce->guc_prio_count[guc_prio];
> +}
> +
> +static inline void update_context_prio(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> +	int i;
> +
> +	lockdep_assert_held(&ce->guc_active.lock);
> +
> +	for (i = 0; i < ARRAY_SIZE(ce->guc_prio_count); ++i) {

This amd other loops/checks rely on the prios going from highest to 
lowest starting from zero. Maybe add a build check somewhere?

BUILD_BUG_ON(GUC_PRIO_KMD_HIGH != 0);
BUILD_BUG_ON(GUC_PRIO_KMD_HIGH > GUC_PRIO_LOW);

> +		if (ce->guc_prio_count[i]) {
> +			guc_context_set_prio(guc, ce, i);
> +			break;
> +		}
> +	}
> +}
> +
> +static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio)
> +{
> +	/* Lower value is higher priority */
> +	return new_guc_prio < old_guc_prio;
> +}
> +
>   static void add_to_context(struct i915_request *rq)
>   {
>   	struct intel_context *ce = rq->context;
> +	u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq));
> +
> +	GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI);
>   
>   	spin_lock(&ce->guc_active.lock);
>   	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> +
> +	if (rq->guc_prio == GUC_PRIO_INIT) {
> +		rq->guc_prio = new_guc_prio;
> +		add_context_inflight_prio(ce, rq->guc_prio);
> +	} else if (new_guc_prio_higher(rq->guc_prio, new_guc_prio)) {
> +		sub_context_inflight_prio(ce, rq->guc_prio);
> +		rq->guc_prio = new_guc_prio;
> +		add_context_inflight_prio(ce, rq->guc_prio);
> +	}
> +	update_context_prio(ce);
> +
>   	spin_unlock(&ce->guc_active.lock);
>   }
>   
> +static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce)
> +{

assert_spin_lock_held(&ce->guc_active.lock) ?

> +	if (rq->guc_prio != GUC_PRIO_INIT &&
> +	    rq->guc_prio != GUC_PRIO_FINI) {
> +		sub_context_inflight_prio(ce, rq->guc_prio);
> +		update_context_prio(ce);
> +	}
> +	rq->guc_prio = GUC_PRIO_FINI;
> +}
> +
>   static void remove_from_context(struct i915_request *rq)
>   {
>   	struct intel_context *ce = rq->context;
> @@ -1777,6 +1921,8 @@ static void remove_from_context(struct i915_request *rq)
>   	/* Prevent further __await_execution() registering a cb, then flush */
>   	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
>   
> +	guc_prio_fini(rq, ce);
> +
>   	spin_unlock_irq(&ce->guc_active.lock);
>   
>   	atomic_dec(&ce->guc_id_ref);
> @@ -2058,6 +2204,39 @@ static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
>   	}
>   }
>   
> +static void guc_bump_inflight_request_prio(struct i915_request *rq,
> +					   int prio)
> +{
> +	struct intel_context *ce = rq->context;
> +	u8 new_guc_prio = map_i915_prio_to_guc_prio(prio);
> +
> +	/* Short circuit function */
> +	if (prio < I915_PRIORITY_NORMAL ||
> +	    (rq->guc_prio == GUC_PRIO_FINI) ||
> +	    (rq->guc_prio != GUC_PRIO_INIT &&
> +	     !new_guc_prio_higher(rq->guc_prio, new_guc_prio)))
> +		return;
> +
> +	spin_lock(&ce->guc_active.lock);
> +	if (rq->guc_prio != GUC_PRIO_FINI) {
> +		if (rq->guc_prio != GUC_PRIO_INIT)
> +			sub_context_inflight_prio(ce, rq->guc_prio);
> +		rq->guc_prio = new_guc_prio;
> +		add_context_inflight_prio(ce, rq->guc_prio);
> +		update_context_prio(ce);
> +	}
> +	spin_unlock(&ce->guc_active.lock);
> +}
> +
> +static void guc_retire_inflight_request_prio(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +
> +	spin_lock(&ce->guc_active.lock);
> +	guc_prio_fini(rq, ce);
> +	spin_unlock(&ce->guc_active.lock);
> +}
> +
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
>   {
>   	struct intel_timeline *tl;
> @@ -2293,6 +2472,10 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   		guc->sched_engine->disabled = guc_sched_engine_disabled;
>   		guc->sched_engine->private_data = guc;
>   		guc->sched_engine->destroy = guc_sched_engine_destroy;
> +		guc->sched_engine->bump_inflight_request_prio =
> +			guc_bump_inflight_request_prio;
> +		guc->sched_engine->retire_inflight_request_prio =
> +			guc_retire_inflight_request_prio;
>   		tasklet_setup(&guc->sched_engine->tasklet,
>   			      guc_submission_tasklet);
>   	}
> @@ -2670,6 +2853,22 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
>   	drm_printf(p, "\n");
>   }
>   
> +static inline void guc_log_context_priority(struct drm_printer *p,
> +					    struct intel_context *ce)
> +{
> +	int i;
> +
> +	drm_printf(p, "\t\tPriority: %d\n",
> +		   ce->guc_prio);
> +	drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n");
> +	for (i = GUC_CLIENT_PRIORITY_KMD_HIGH;
> +	     i < GUC_CLIENT_PRIORITY_NUM; ++i) {
> +		drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n",
> +			   i, ce->guc_prio_count[i]);
> +	}
> +	drm_printf(p, "\n");
> +}
> +
>   void intel_guc_submission_print_context_info(struct intel_guc *guc,
>   					     struct drm_printer *p)
>   {
> @@ -2692,6 +2891,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
>   		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
>   			   ce->guc_state.sched_state,
>   			   atomic_read(&ce->guc_sched_state_no_lock));
> +
> +		guc_log_context_priority(p, ce);
>   	}
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index f3552642b8a1..3fdfada99ba0 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -114,6 +114,9 @@ static void i915_fence_release(struct dma_fence *fence)
>   {
>   	struct i915_request *rq = to_request(fence);
>   
> +	GEM_BUG_ON(rq->guc_prio != GUC_PRIO_INIT &&
> +		   rq->guc_prio != GUC_PRIO_FINI);
> +
>   	/*
>   	 * The request is put onto a RCU freelist (i.e. the address
>   	 * is immediately reused), mark the fences as being freed now.
> @@ -922,6 +925,8 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
>   
>   	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
>   
> +	rq->guc_prio = GUC_PRIO_INIT;
> +
>   	/* We bump the ref for the fence chain */
>   	i915_sw_fence_reinit(&i915_request_get(rq)->submit);
>   	i915_sw_fence_reinit(&i915_request_get(rq)->semaphore);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index a3d4728ea06c..f0463d19c712 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -293,6 +293,14 @@ struct i915_request {
>   	 */
>   	struct list_head guc_fence_link;
>   
> +	/**
> +	 * Priority level while the request is inflight. Differs slightly than
> +	 * i915 scheduler priority.

I'd say it differs quite a bit. Maybe refer to the comment above 
I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP?

Daniele

> +	 */
> +#define	GUC_PRIO_INIT	0xff
> +#define	GUC_PRIO_FINI	0xfe
> +	u8 guc_prio;
> +
>   	I915_SELFTEST_DECLARE(struct {
>   		struct list_head link;
>   		unsigned long delay;
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 8766a8643469..3fccae3672c1 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -241,6 +241,9 @@ static void __i915_schedule(struct i915_sched_node *node,
>   	/* Fifo and depth-first replacement ensure our deps execute before us */
>   	sched_engine = lock_sched_engine(node, sched_engine, &cache);
>   	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
> +		struct i915_request *from = container_of(dep->signaler,
> +							 struct i915_request,
> +							 sched);
>   		INIT_LIST_HEAD(&dep->dfs_link);
>   
>   		node = dep->signaler;
> @@ -254,6 +257,10 @@ static void __i915_schedule(struct i915_sched_node *node,
>   		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
>   			   sched_engine);
>   
> +		/* Must be called before changing the nodes priority */
> +		if (sched_engine->bump_inflight_request_prio)
> +			sched_engine->bump_inflight_request_prio(from, prio);
> +
>   		WRITE_ONCE(node->attr.priority, prio);
>   
>   		/*
> diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> index eaef233e9080..b0a1b58c7893 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> @@ -179,6 +179,18 @@ struct i915_sched_engine {
>   	void	(*kick_backend)(const struct i915_request *rq,
>   				int prio);
>   
> +	/**
> +	 * @bump_inflight_request_prio: update priority of an inflight request
> +	 */
> +	void	(*bump_inflight_request_prio)(struct i915_request *rq,
> +					      int prio);
> +
> +	/**
> +	 * @retire_inflight_request_prio: indicate request is retired to
> +	 * priority tracking
> +	 */
> +	void	(*retire_inflight_request_prio)(struct i915_request *rq);
> +
>   	/**
>   	 * @schedule: adjust priority of request
>   	 *
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 937d3706af9b..9d2cd14ed882 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -914,6 +914,7 @@ DECLARE_EVENT_CLASS(intel_context,
>   			     __field(int, pin_count)
>   			     __field(u32, sched_state)
>   			     __field(u32, guc_sched_state_no_lock)
> +			     __field(u8, guc_prio)
>   			     ),
>   
>   	    TP_fast_assign(
> @@ -922,11 +923,17 @@ DECLARE_EVENT_CLASS(intel_context,
>   			   __entry->sched_state = ce->guc_state.sched_state;
>   			   __entry->guc_sched_state_no_lock =
>   			   atomic_read(&ce->guc_sched_state_no_lock);
> +			   __entry->guc_prio = ce->guc_prio;
>   			   ),
>   
> -	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
> +	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x, guc_prio=%u",
>   		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
> -		      __entry->guc_sched_state_no_lock)
> +		      __entry->guc_sched_state_no_lock, __entry->guc_prio)
> +);
> +
> +DEFINE_EVENT(intel_context, intel_context_set_prio,
> +	     TP_PROTO(struct intel_context *ce),
> +	     TP_ARGS(ce)
>   );
>   
>   DEFINE_EVENT(intel_context, intel_context_reset,
> @@ -1036,6 +1043,11 @@ trace_i915_request_out(struct i915_request *rq)
>   {
>   }
>   
> +static inline void
> +trace_intel_context_set_prio(struct intel_context *ce)
> +{
> +}
> +
>   static inline void
>   trace_intel_context_reset(struct intel_context *ce)
>   {
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index e54f9efaead0..cb0a5396e655 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -572,6 +572,15 @@ typedef struct drm_i915_irq_wait {
>   #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
>   #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
>   #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
> +/*
> + * Indicates the 2k user priority levels are statically mapped into 3 buckets as
> + * follows:
> + *
> + * -1k to -1	Low priority
> + * 0		Normal priority
> + * 1 to 1k	Highest priority
> + */
> +#define   I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP	(1ul << 5)
>   
>   #define I915_PARAM_HUC_STATUS		 42
>   


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 50/51] drm/i915/guc: Implement GuC priority management
@ 2021-07-22 20:26     ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-22 20:26 UTC (permalink / raw)
  To: Matthew Brost, intel-gfx, dri-devel



On 7/16/2021 1:17 PM, Matthew Brost wrote:
> Implement a simple static mapping algorithm of the i915 priority levels
> (int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as
> follows:
>
> i915 level < 0          -> GuC low level     (3)
> i915 level == 0         -> GuC normal level  (2)
> i915 level < INT_MAX    -> GuC high level    (1)
> i915 level == INT_MAX   -> GuC highest level (0)
>
> We believe this mapping should cover the UMD use cases (3 distinct user
> levels + 1 kernel level).
>
> In addition to static mapping, a simple counter system is attached to
> each context tracking the number of requests inflight on the context at
> each level. This is needed as the GuC levels are per context while in
> the i915 levels are per request.

As a general note on this patch, IMO we could do a better job of 
integrating context-level priority with request-level prio. However, I'm 
aware that this code is going to be overhauled again for drm scheduler 
so I'm not going to comment in that direction and I'll review the result 
again once the drm scheduler patches are available.

>
> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   3 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
>   drivers/gpu/drm/i915/gt/intel_engine_user.c   |   4 +
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 207 +++++++++++++++++-
>   drivers/gpu/drm/i915/i915_request.c           |   5 +
>   drivers/gpu/drm/i915/i915_request.h           |   8 +
>   drivers/gpu/drm/i915/i915_scheduler.c         |   7 +
>   drivers/gpu/drm/i915/i915_scheduler_types.h   |  12 +
>   drivers/gpu/drm/i915/i915_trace.h             |  16 +-
>   include/uapi/drm/i915_drm.h                   |   9 +
>   10 files changed, 274 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 2007dc6f6b99..209cf265bf74 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -245,6 +245,9 @@ static void signal_irq_work(struct irq_work *work)
>   			llist_entry(signal, typeof(*rq), signal_node);
>   		struct list_head cb_list;
>   
> +		if (rq->engine->sched_engine->retire_inflight_request_prio)
> +			rq->engine->sched_engine->retire_inflight_request_prio(rq);
> +
>   		spin_lock(&rq->lock);
>   		list_replace(&rq->fence.cb_list, &cb_list);
>   		__dma_fence_signal__timestamp(&rq->fence, timestamp);
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 005a64f2afa7..fe555551c2d2 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -18,8 +18,9 @@
>   #include "intel_engine_types.h"
>   #include "intel_sseu.h"
>   
> -#define CONTEXT_REDZONE POISON_INUSE
> +#include "uc/intel_guc_fwif.h"
>   
> +#define CONTEXT_REDZONE POISON_INUSE
>   DECLARE_EWMA(runtime, 3, 8);
>   
>   struct i915_gem_context;
> @@ -191,6 +192,12 @@ struct intel_context {
>   
>   	/* GuC context blocked fence */
>   	struct i915_sw_fence guc_blocked;
> +
> +	/*
> +	 * GuC priority management
> +	 */
> +	u8 guc_prio;
> +	u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
>   };
>   
>   #endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 84142127ebd8..8f8bea08e734 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -11,6 +11,7 @@
>   #include "intel_engine.h"
>   #include "intel_engine_user.h"
>   #include "intel_gt.h"
> +#include "uc/intel_guc_submission.h"
>   
>   struct intel_engine_cs *
>   intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
> @@ -115,6 +116,9 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
>   			disabled |= (I915_SCHEDULER_CAP_ENABLED |
>   				     I915_SCHEDULER_CAP_PRIORITY);
>   
> +		if (intel_uc_uses_guc_submission(&i915->gt.uc))
> +			enabled |= I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
> +
>   		for (i = 0; i < ARRAY_SIZE(map); i++) {
>   			if (engine->flags & BIT(map[i].engine))
>   				enabled |= BIT(map[i].sched);
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> index 536fdbc406c6..263ad6a9e4a9 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> @@ -81,7 +81,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
>    */
>   #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
>   #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
> -#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
> +#define SCHED_STATE_NO_LOCK_REGISTERED			BIT(2)
> +#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		3
>   #define SCHED_STATE_NO_LOCK_BLOCKED \
>   	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
>   #define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
> @@ -142,6 +143,24 @@ static inline void decr_context_blocked(struct intel_context *ce)
>   		   &ce->guc_sched_state_no_lock);
>   }
>   
> +static inline bool context_registered(struct intel_context *ce)
> +{
> +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> +		SCHED_STATE_NO_LOCK_REGISTERED);
> +}
> +
> +static inline void set_context_registered(struct intel_context *ce)
> +{
> +	atomic_or(SCHED_STATE_NO_LOCK_REGISTERED,
> +		  &ce->guc_sched_state_no_lock);
> +}
> +
> +static inline void clr_context_registered(struct intel_context *ce)
> +{
> +	atomic_and((u32)~SCHED_STATE_NO_LOCK_REGISTERED,
> +		   &ce->guc_sched_state_no_lock);
> +}
> +
>   /*
>    * Below is a set of functions which control the GuC scheduling state which
>    * require a lock, aside from the special case where the functions are called
> @@ -1080,6 +1099,7 @@ static int steal_guc_id(struct intel_guc *guc)
>   
>   		list_del_init(&ce->guc_id_link);
>   		guc_id = ce->guc_id;
> +		clr_context_registered(ce);
>   		set_context_guc_id_invalid(ce);
>   		return guc_id;
>   	} else {
> @@ -1184,10 +1204,13 @@ static int register_context(struct intel_context *ce, bool loop)
>   	struct intel_guc *guc = ce_to_guc(ce);
>   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
>   		ce->guc_id * sizeof(struct guc_lrc_desc);
> +	int ret;
>   
>   	trace_intel_context_register(ce);
>   
> -	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
> +	ret = __guc_action_register_context(guc, ce->guc_id, offset, loop);
> +	set_context_registered(ce);

Should this setting be conditional to ret == 0 ?

> +	return ret;
>   }
>   
>   static int __guc_action_deregister_context(struct intel_guc *guc,
> @@ -1243,6 +1266,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
>   	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
>   }
>   
> +static inline u8 map_i915_prio_to_guc_prio(int prio);
> +
>   static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   {
>   	struct intel_runtime_pm *runtime_pm =
> @@ -1251,6 +1276,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   	struct intel_guc *guc = &engine->gt->uc.guc;
>   	u32 desc_idx = ce->guc_id;
>   	struct guc_lrc_desc *desc;
> +	const struct i915_gem_context *ctx;
> +	int prio = I915_CONTEXT_DEFAULT_PRIORITY;
>   	bool context_registered;
>   	intel_wakeref_t wakeref;
>   	int ret = 0;
> @@ -1266,6 +1293,12 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   
>   	context_registered = lrc_desc_registered(guc, desc_idx);
>   
> +	rcu_read_lock();
> +	ctx = rcu_dereference(ce->gem_context);
> +	if (ctx)
> +		prio = ctx->sched.priority;
> +	rcu_read_unlock();
> +
>   	reset_lrc_desc(guc, desc_idx);
>   	set_lrc_desc_registered(guc, desc_idx, ce);
>   
> @@ -1274,7 +1307,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
>   	desc->engine_submit_mask = adjust_engine_mask(engine->class,
>   						      engine->mask);
>   	desc->hw_context_desc = ce->lrc.lrca;
> -	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	ce->guc_prio = map_i915_prio_to_guc_prio(prio);
> +	desc->priority = ce->guc_prio;
>   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
>   	guc_context_policy_init(engine, desc);
>   	init_sched_state(ce);
> @@ -1659,11 +1693,17 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
>   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
>   	GEM_BUG_ON(context_enabled(ce));
>   
> +	clr_context_registered(ce);
>   	deregister_context(ce, ce->guc_id, true);
>   }
>   
>   static void __guc_context_destroy(struct intel_context *ce)
>   {
> +	GEM_BUG_ON(ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] ||
> +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] ||
> +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] ||
> +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]);
> +
>   	lrc_fini(ce);
>   	intel_context_fini(ce);
>   
> @@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
>   	return lrc_alloc(ce, ce->engine);
>   }
>   
> +static void guc_context_set_prio(struct intel_guc *guc,
> +				 struct intel_context *ce,
> +				 u8 prio)
> +{
> +	u32 action[] = {
> +		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
> +		ce->guc_id,
> +		prio,
> +	};
> +
> +	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
> +		   prio > GUC_CLIENT_PRIORITY_NORMAL);
> +
> +	if (ce->guc_prio == prio || submission_disabled(guc) ||
> +	    !context_registered(ce))
> +		return;
> +
> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> +
> +	ce->guc_prio = prio;
> +	trace_intel_context_set_prio(ce);
> +}
> +
> +static inline u8 map_i915_prio_to_guc_prio(int prio)
> +{
> +	if (prio == I915_PRIORITY_NORMAL)
> +		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
> +	else if (prio < I915_PRIORITY_NORMAL)
> +		return GUC_CLIENT_PRIORITY_NORMAL;
> +	else if (prio != I915_PRIORITY_BARRIER)

Shouldn't this be I915_PRIORITY_UNPREEMPTABLE?

> +		return GUC_CLIENT_PRIORITY_HIGH;
> +	else
> +		return GUC_CLIENT_PRIORITY_KMD_HIGH;
> +}
> +
> +static inline void add_context_inflight_prio(struct intel_context *ce,
> +					     u8 guc_prio)
> +{
> +	lockdep_assert_held(&ce->guc_active.lock);
> +	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
> +
> +	++ce->guc_prio_count[guc_prio];
> +
> +	/* Overflow protection */
> +	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
> +}
> +
> +static inline void sub_context_inflight_prio(struct intel_context *ce,
> +					     u8 guc_prio)
> +{
> +	lockdep_assert_held(&ce->guc_active.lock);
> +	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
> +
> +	/* Underflow protection */
> +	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
> +
> +	--ce->guc_prio_count[guc_prio];
> +}
> +
> +static inline void update_context_prio(struct intel_context *ce)
> +{
> +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> +	int i;
> +
> +	lockdep_assert_held(&ce->guc_active.lock);
> +
> +	for (i = 0; i < ARRAY_SIZE(ce->guc_prio_count); ++i) {

This amd other loops/checks rely on the prios going from highest to 
lowest starting from zero. Maybe add a build check somewhere?

BUILD_BUG_ON(GUC_PRIO_KMD_HIGH != 0);
BUILD_BUG_ON(GUC_PRIO_KMD_HIGH > GUC_PRIO_LOW);

> +		if (ce->guc_prio_count[i]) {
> +			guc_context_set_prio(guc, ce, i);
> +			break;
> +		}
> +	}
> +}
> +
> +static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio)
> +{
> +	/* Lower value is higher priority */
> +	return new_guc_prio < old_guc_prio;
> +}
> +
>   static void add_to_context(struct i915_request *rq)
>   {
>   	struct intel_context *ce = rq->context;
> +	u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq));
> +
> +	GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI);
>   
>   	spin_lock(&ce->guc_active.lock);
>   	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> +
> +	if (rq->guc_prio == GUC_PRIO_INIT) {
> +		rq->guc_prio = new_guc_prio;
> +		add_context_inflight_prio(ce, rq->guc_prio);
> +	} else if (new_guc_prio_higher(rq->guc_prio, new_guc_prio)) {
> +		sub_context_inflight_prio(ce, rq->guc_prio);
> +		rq->guc_prio = new_guc_prio;
> +		add_context_inflight_prio(ce, rq->guc_prio);
> +	}
> +	update_context_prio(ce);
> +
>   	spin_unlock(&ce->guc_active.lock);
>   }
>   
> +static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce)
> +{

assert_spin_lock_held(&ce->guc_active.lock) ?

> +	if (rq->guc_prio != GUC_PRIO_INIT &&
> +	    rq->guc_prio != GUC_PRIO_FINI) {
> +		sub_context_inflight_prio(ce, rq->guc_prio);
> +		update_context_prio(ce);
> +	}
> +	rq->guc_prio = GUC_PRIO_FINI;
> +}
> +
>   static void remove_from_context(struct i915_request *rq)
>   {
>   	struct intel_context *ce = rq->context;
> @@ -1777,6 +1921,8 @@ static void remove_from_context(struct i915_request *rq)
>   	/* Prevent further __await_execution() registering a cb, then flush */
>   	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
>   
> +	guc_prio_fini(rq, ce);
> +
>   	spin_unlock_irq(&ce->guc_active.lock);
>   
>   	atomic_dec(&ce->guc_id_ref);
> @@ -2058,6 +2204,39 @@ static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
>   	}
>   }
>   
> +static void guc_bump_inflight_request_prio(struct i915_request *rq,
> +					   int prio)
> +{
> +	struct intel_context *ce = rq->context;
> +	u8 new_guc_prio = map_i915_prio_to_guc_prio(prio);
> +
> +	/* Short circuit function */
> +	if (prio < I915_PRIORITY_NORMAL ||
> +	    (rq->guc_prio == GUC_PRIO_FINI) ||
> +	    (rq->guc_prio != GUC_PRIO_INIT &&
> +	     !new_guc_prio_higher(rq->guc_prio, new_guc_prio)))
> +		return;
> +
> +	spin_lock(&ce->guc_active.lock);
> +	if (rq->guc_prio != GUC_PRIO_FINI) {
> +		if (rq->guc_prio != GUC_PRIO_INIT)
> +			sub_context_inflight_prio(ce, rq->guc_prio);
> +		rq->guc_prio = new_guc_prio;
> +		add_context_inflight_prio(ce, rq->guc_prio);
> +		update_context_prio(ce);
> +	}
> +	spin_unlock(&ce->guc_active.lock);
> +}
> +
> +static void guc_retire_inflight_request_prio(struct i915_request *rq)
> +{
> +	struct intel_context *ce = rq->context;
> +
> +	spin_lock(&ce->guc_active.lock);
> +	guc_prio_fini(rq, ce);
> +	spin_unlock(&ce->guc_active.lock);
> +}
> +
>   static void sanitize_hwsp(struct intel_engine_cs *engine)
>   {
>   	struct intel_timeline *tl;
> @@ -2293,6 +2472,10 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
>   		guc->sched_engine->disabled = guc_sched_engine_disabled;
>   		guc->sched_engine->private_data = guc;
>   		guc->sched_engine->destroy = guc_sched_engine_destroy;
> +		guc->sched_engine->bump_inflight_request_prio =
> +			guc_bump_inflight_request_prio;
> +		guc->sched_engine->retire_inflight_request_prio =
> +			guc_retire_inflight_request_prio;
>   		tasklet_setup(&guc->sched_engine->tasklet,
>   			      guc_submission_tasklet);
>   	}
> @@ -2670,6 +2853,22 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
>   	drm_printf(p, "\n");
>   }
>   
> +static inline void guc_log_context_priority(struct drm_printer *p,
> +					    struct intel_context *ce)
> +{
> +	int i;
> +
> +	drm_printf(p, "\t\tPriority: %d\n",
> +		   ce->guc_prio);
> +	drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n");
> +	for (i = GUC_CLIENT_PRIORITY_KMD_HIGH;
> +	     i < GUC_CLIENT_PRIORITY_NUM; ++i) {
> +		drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n",
> +			   i, ce->guc_prio_count[i]);
> +	}
> +	drm_printf(p, "\n");
> +}
> +
>   void intel_guc_submission_print_context_info(struct intel_guc *guc,
>   					     struct drm_printer *p)
>   {
> @@ -2692,6 +2891,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
>   		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
>   			   ce->guc_state.sched_state,
>   			   atomic_read(&ce->guc_sched_state_no_lock));
> +
> +		guc_log_context_priority(p, ce);
>   	}
>   }
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index f3552642b8a1..3fdfada99ba0 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -114,6 +114,9 @@ static void i915_fence_release(struct dma_fence *fence)
>   {
>   	struct i915_request *rq = to_request(fence);
>   
> +	GEM_BUG_ON(rq->guc_prio != GUC_PRIO_INIT &&
> +		   rq->guc_prio != GUC_PRIO_FINI);
> +
>   	/*
>   	 * The request is put onto a RCU freelist (i.e. the address
>   	 * is immediately reused), mark the fences as being freed now.
> @@ -922,6 +925,8 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
>   
>   	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
>   
> +	rq->guc_prio = GUC_PRIO_INIT;
> +
>   	/* We bump the ref for the fence chain */
>   	i915_sw_fence_reinit(&i915_request_get(rq)->submit);
>   	i915_sw_fence_reinit(&i915_request_get(rq)->semaphore);
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index a3d4728ea06c..f0463d19c712 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -293,6 +293,14 @@ struct i915_request {
>   	 */
>   	struct list_head guc_fence_link;
>   
> +	/**
> +	 * Priority level while the request is inflight. Differs slightly than
> +	 * i915 scheduler priority.

I'd say it differs quite a bit. Maybe refer to the comment above 
I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP?

Daniele

> +	 */
> +#define	GUC_PRIO_INIT	0xff
> +#define	GUC_PRIO_FINI	0xfe
> +	u8 guc_prio;
> +
>   	I915_SELFTEST_DECLARE(struct {
>   		struct list_head link;
>   		unsigned long delay;
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 8766a8643469..3fccae3672c1 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -241,6 +241,9 @@ static void __i915_schedule(struct i915_sched_node *node,
>   	/* Fifo and depth-first replacement ensure our deps execute before us */
>   	sched_engine = lock_sched_engine(node, sched_engine, &cache);
>   	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
> +		struct i915_request *from = container_of(dep->signaler,
> +							 struct i915_request,
> +							 sched);
>   		INIT_LIST_HEAD(&dep->dfs_link);
>   
>   		node = dep->signaler;
> @@ -254,6 +257,10 @@ static void __i915_schedule(struct i915_sched_node *node,
>   		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
>   			   sched_engine);
>   
> +		/* Must be called before changing the nodes priority */
> +		if (sched_engine->bump_inflight_request_prio)
> +			sched_engine->bump_inflight_request_prio(from, prio);
> +
>   		WRITE_ONCE(node->attr.priority, prio);
>   
>   		/*
> diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> index eaef233e9080..b0a1b58c7893 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> @@ -179,6 +179,18 @@ struct i915_sched_engine {
>   	void	(*kick_backend)(const struct i915_request *rq,
>   				int prio);
>   
> +	/**
> +	 * @bump_inflight_request_prio: update priority of an inflight request
> +	 */
> +	void	(*bump_inflight_request_prio)(struct i915_request *rq,
> +					      int prio);
> +
> +	/**
> +	 * @retire_inflight_request_prio: indicate request is retired to
> +	 * priority tracking
> +	 */
> +	void	(*retire_inflight_request_prio)(struct i915_request *rq);
> +
>   	/**
>   	 * @schedule: adjust priority of request
>   	 *
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 937d3706af9b..9d2cd14ed882 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -914,6 +914,7 @@ DECLARE_EVENT_CLASS(intel_context,
>   			     __field(int, pin_count)
>   			     __field(u32, sched_state)
>   			     __field(u32, guc_sched_state_no_lock)
> +			     __field(u8, guc_prio)
>   			     ),
>   
>   	    TP_fast_assign(
> @@ -922,11 +923,17 @@ DECLARE_EVENT_CLASS(intel_context,
>   			   __entry->sched_state = ce->guc_state.sched_state;
>   			   __entry->guc_sched_state_no_lock =
>   			   atomic_read(&ce->guc_sched_state_no_lock);
> +			   __entry->guc_prio = ce->guc_prio;
>   			   ),
>   
> -	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
> +	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x, guc_prio=%u",
>   		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
> -		      __entry->guc_sched_state_no_lock)
> +		      __entry->guc_sched_state_no_lock, __entry->guc_prio)
> +);
> +
> +DEFINE_EVENT(intel_context, intel_context_set_prio,
> +	     TP_PROTO(struct intel_context *ce),
> +	     TP_ARGS(ce)
>   );
>   
>   DEFINE_EVENT(intel_context, intel_context_reset,
> @@ -1036,6 +1043,11 @@ trace_i915_request_out(struct i915_request *rq)
>   {
>   }
>   
> +static inline void
> +trace_intel_context_set_prio(struct intel_context *ce)
> +{
> +}
> +
>   static inline void
>   trace_intel_context_reset(struct intel_context *ce)
>   {
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index e54f9efaead0..cb0a5396e655 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -572,6 +572,15 @@ typedef struct drm_i915_irq_wait {
>   #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
>   #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
>   #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
> +/*
> + * Indicates the 2k user priority levels are statically mapped into 3 buckets as
> + * follows:
> + *
> + * -1k to -1	Low priority
> + * 0		Normal priority
> + * 1 to 1k	Highest priority
> + */
> +#define   I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP	(1ul << 5)
>   
>   #define I915_PARAM_HUC_STATUS		 42
>   

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 50/51] drm/i915/guc: Implement GuC priority management
  2021-07-22 20:26     ` [Intel-gfx] " Daniele Ceraolo Spurio
@ 2021-07-22 21:38       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22 21:38 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, john.c.harrison, dri-devel

On Thu, Jul 22, 2021 at 01:26:30PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:17 PM, Matthew Brost wrote:
> > Implement a simple static mapping algorithm of the i915 priority levels
> > (int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as
> > follows:
> > 
> > i915 level < 0          -> GuC low level     (3)
> > i915 level == 0         -> GuC normal level  (2)
> > i915 level < INT_MAX    -> GuC high level    (1)
> > i915 level == INT_MAX   -> GuC highest level (0)
> > 
> > We believe this mapping should cover the UMD use cases (3 distinct user
> > levels + 1 kernel level).
> > 
> > In addition to static mapping, a simple counter system is attached to
> > each context tracking the number of requests inflight on the context at
> > each level. This is needed as the GuC levels are per context while in
> > the i915 levels are per request.
> 
> As a general note on this patch, IMO we could do a better job of integrating
> context-level priority with request-level prio. However, I'm aware that this
> code is going to be overhauled again for drm scheduler so I'm not going to
> comment in that direction and I'll review the result again once the drm
> scheduler patches are available.
>

This will change a bit with DRM scheduler.

> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   3 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
> >   drivers/gpu/drm/i915/gt/intel_engine_user.c   |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 207 +++++++++++++++++-
> >   drivers/gpu/drm/i915/i915_request.c           |   5 +
> >   drivers/gpu/drm/i915/i915_request.h           |   8 +
> >   drivers/gpu/drm/i915/i915_scheduler.c         |   7 +
> >   drivers/gpu/drm/i915/i915_scheduler_types.h   |  12 +
> >   drivers/gpu/drm/i915/i915_trace.h             |  16 +-
> >   include/uapi/drm/i915_drm.h                   |   9 +
> >   10 files changed, 274 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > index 2007dc6f6b99..209cf265bf74 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > @@ -245,6 +245,9 @@ static void signal_irq_work(struct irq_work *work)
> >   			llist_entry(signal, typeof(*rq), signal_node);
> >   		struct list_head cb_list;
> > +		if (rq->engine->sched_engine->retire_inflight_request_prio)
> > +			rq->engine->sched_engine->retire_inflight_request_prio(rq);
> > +
> >   		spin_lock(&rq->lock);
> >   		list_replace(&rq->fence.cb_list, &cb_list);
> >   		__dma_fence_signal__timestamp(&rq->fence, timestamp);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 005a64f2afa7..fe555551c2d2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -18,8 +18,9 @@
> >   #include "intel_engine_types.h"
> >   #include "intel_sseu.h"
> > -#define CONTEXT_REDZONE POISON_INUSE
> > +#include "uc/intel_guc_fwif.h"
> > +#define CONTEXT_REDZONE POISON_INUSE
> >   DECLARE_EWMA(runtime, 3, 8);
> >   struct i915_gem_context;
> > @@ -191,6 +192,12 @@ struct intel_context {
> >   	/* GuC context blocked fence */
> >   	struct i915_sw_fence guc_blocked;
> > +
> > +	/*
> > +	 * GuC priority management
> > +	 */
> > +	u8 guc_prio;
> > +	u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > index 84142127ebd8..8f8bea08e734 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > @@ -11,6 +11,7 @@
> >   #include "intel_engine.h"
> >   #include "intel_engine_user.h"
> >   #include "intel_gt.h"
> > +#include "uc/intel_guc_submission.h"
> >   struct intel_engine_cs *
> >   intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
> > @@ -115,6 +116,9 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
> >   			disabled |= (I915_SCHEDULER_CAP_ENABLED |
> >   				     I915_SCHEDULER_CAP_PRIORITY);
> > +		if (intel_uc_uses_guc_submission(&i915->gt.uc))
> > +			enabled |= I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
> > +
> >   		for (i = 0; i < ARRAY_SIZE(map); i++) {
> >   			if (engine->flags & BIT(map[i].engine))
> >   				enabled |= BIT(map[i].sched);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 536fdbc406c6..263ad6a9e4a9 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -81,7 +81,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> >    */
> >   #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> >   #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
> > -#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
> > +#define SCHED_STATE_NO_LOCK_REGISTERED			BIT(2)
> > +#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		3
> >   #define SCHED_STATE_NO_LOCK_BLOCKED \
> >   	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
> >   #define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
> > @@ -142,6 +143,24 @@ static inline void decr_context_blocked(struct intel_context *ce)
> >   		   &ce->guc_sched_state_no_lock);
> >   }
> > +static inline bool context_registered(struct intel_context *ce)
> > +{
> > +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > +		SCHED_STATE_NO_LOCK_REGISTERED);
> > +}
> > +
> > +static inline void set_context_registered(struct intel_context *ce)
> > +{
> > +	atomic_or(SCHED_STATE_NO_LOCK_REGISTERED,
> > +		  &ce->guc_sched_state_no_lock);
> > +}
> > +
> > +static inline void clr_context_registered(struct intel_context *ce)
> > +{
> > +	atomic_and((u32)~SCHED_STATE_NO_LOCK_REGISTERED,
> > +		   &ce->guc_sched_state_no_lock);
> > +}
> > +
> >   /*
> >    * Below is a set of functions which control the GuC scheduling state which
> >    * require a lock, aside from the special case where the functions are called
> > @@ -1080,6 +1099,7 @@ static int steal_guc_id(struct intel_guc *guc)
> >   		list_del_init(&ce->guc_id_link);
> >   		guc_id = ce->guc_id;
> > +		clr_context_registered(ce);
> >   		set_context_guc_id_invalid(ce);
> >   		return guc_id;
> >   	} else {
> > @@ -1184,10 +1204,13 @@ static int register_context(struct intel_context *ce, bool loop)
> >   	struct intel_guc *guc = ce_to_guc(ce);
> >   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> >   		ce->guc_id * sizeof(struct guc_lrc_desc);
> > +	int ret;
> >   	trace_intel_context_register(ce);
> > -	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
> > +	ret = __guc_action_register_context(guc, ce->guc_id, offset, loop);
> > +	set_context_registered(ce);
> 
> Should this setting be conditional to ret == 0 ?
> 

Yep.

> > +	return ret;
> >   }
> >   static int __guc_action_deregister_context(struct intel_guc *guc,
> > @@ -1243,6 +1266,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
> >   	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> >   }
> > +static inline u8 map_i915_prio_to_guc_prio(int prio);
> > +
> >   static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   {
> >   	struct intel_runtime_pm *runtime_pm =
> > @@ -1251,6 +1276,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	struct intel_guc *guc = &engine->gt->uc.guc;
> >   	u32 desc_idx = ce->guc_id;
> >   	struct guc_lrc_desc *desc;
> > +	const struct i915_gem_context *ctx;
> > +	int prio = I915_CONTEXT_DEFAULT_PRIORITY;
> >   	bool context_registered;
> >   	intel_wakeref_t wakeref;
> >   	int ret = 0;
> > @@ -1266,6 +1293,12 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	context_registered = lrc_desc_registered(guc, desc_idx);
> > +	rcu_read_lock();
> > +	ctx = rcu_dereference(ce->gem_context);
> > +	if (ctx)
> > +		prio = ctx->sched.priority;
> > +	rcu_read_unlock();
> > +
> >   	reset_lrc_desc(guc, desc_idx);
> >   	set_lrc_desc_registered(guc, desc_idx, ce);
> > @@ -1274,7 +1307,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> >   						      engine->mask);
> >   	desc->hw_context_desc = ce->lrc.lrca;
> > -	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +	ce->guc_prio = map_i915_prio_to_guc_prio(prio);
> > +	desc->priority = ce->guc_prio;
> >   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> >   	guc_context_policy_init(engine, desc);
> >   	init_sched_state(ce);
> > @@ -1659,11 +1693,17 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> >   	GEM_BUG_ON(context_enabled(ce));
> > +	clr_context_registered(ce);
> >   	deregister_context(ce, ce->guc_id, true);
> >   }
> >   static void __guc_context_destroy(struct intel_context *ce)
> >   {
> > +	GEM_BUG_ON(ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] ||
> > +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] ||
> > +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] ||
> > +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]);
> > +
> >   	lrc_fini(ce);
> >   	intel_context_fini(ce);
> > @@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
> >   	return lrc_alloc(ce, ce->engine);
> >   }
> > +static void guc_context_set_prio(struct intel_guc *guc,
> > +				 struct intel_context *ce,
> > +				 u8 prio)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
> > +		ce->guc_id,
> > +		prio,
> > +	};
> > +
> > +	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
> > +		   prio > GUC_CLIENT_PRIORITY_NORMAL);
> > +
> > +	if (ce->guc_prio == prio || submission_disabled(guc) ||
> > +	    !context_registered(ce))
> > +		return;
> > +
> > +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > +
> > +	ce->guc_prio = prio;
> > +	trace_intel_context_set_prio(ce);
> > +}
> > +
> > +static inline u8 map_i915_prio_to_guc_prio(int prio)
> > +{
> > +	if (prio == I915_PRIORITY_NORMAL)
> > +		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +	else if (prio < I915_PRIORITY_NORMAL)
> > +		return GUC_CLIENT_PRIORITY_NORMAL;
> > +	else if (prio != I915_PRIORITY_BARRIER)
> 
> Shouldn't this be I915_PRIORITY_UNPREEMPTABLE?
>

No, I915_PRIORITY_UNPREEMPTABLE is an execlists only concept.

> > +		return GUC_CLIENT_PRIORITY_HIGH;
> > +	else
> > +		return GUC_CLIENT_PRIORITY_KMD_HIGH;
> > +}
> > +
> > +static inline void add_context_inflight_prio(struct intel_context *ce,
> > +					     u8 guc_prio)
> > +{
> > +	lockdep_assert_held(&ce->guc_active.lock);
> > +	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
> > +
> > +	++ce->guc_prio_count[guc_prio];
> > +
> > +	/* Overflow protection */
> > +	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
> > +}
> > +
> > +static inline void sub_context_inflight_prio(struct intel_context *ce,
> > +					     u8 guc_prio)
> > +{
> > +	lockdep_assert_held(&ce->guc_active.lock);
> > +	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
> > +
> > +	/* Underflow protection */
> > +	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
> > +
> > +	--ce->guc_prio_count[guc_prio];
> > +}
> > +
> > +static inline void update_context_prio(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> > +	int i;
> > +
> > +	lockdep_assert_held(&ce->guc_active.lock);
> > +
> > +	for (i = 0; i < ARRAY_SIZE(ce->guc_prio_count); ++i) {
> 
> This amd other loops/checks rely on the prios going from highest to lowest
> starting from zero. Maybe add a build check somewhere?
> 
> BUILD_BUG_ON(GUC_PRIO_KMD_HIGH != 0);
> BUILD_BUG_ON(GUC_PRIO_KMD_HIGH > GUC_PRIO_LOW);
>

Good idea.

> > +		if (ce->guc_prio_count[i]) {
> > +			guc_context_set_prio(guc, ce, i);
> > +			break;
> > +		}
> > +	}
> > +}
> > +
> > +static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio)
> > +{
> > +	/* Lower value is higher priority */
> > +	return new_guc_prio < old_guc_prio;
> > +}
> > +
> >   static void add_to_context(struct i915_request *rq)
> >   {
> >   	struct intel_context *ce = rq->context;
> > +	u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq));
> > +
> > +	GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI);
> >   	spin_lock(&ce->guc_active.lock);
> >   	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> > +
> > +	if (rq->guc_prio == GUC_PRIO_INIT) {
> > +		rq->guc_prio = new_guc_prio;
> > +		add_context_inflight_prio(ce, rq->guc_prio);
> > +	} else if (new_guc_prio_higher(rq->guc_prio, new_guc_prio)) {
> > +		sub_context_inflight_prio(ce, rq->guc_prio);
> > +		rq->guc_prio = new_guc_prio;
> > +		add_context_inflight_prio(ce, rq->guc_prio);
> > +	}
> > +	update_context_prio(ce);
> > +
> >   	spin_unlock(&ce->guc_active.lock);
> >   }
> > +static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce)
> > +{
> 
> assert_spin_lock_held(&ce->guc_active.lock) ?
>

Why not.

> > +	if (rq->guc_prio != GUC_PRIO_INIT &&
> > +	    rq->guc_prio != GUC_PRIO_FINI) {
> > +		sub_context_inflight_prio(ce, rq->guc_prio);
> > +		update_context_prio(ce);
> > +	}
> > +	rq->guc_prio = GUC_PRIO_FINI;
> > +}
> > +
> >   static void remove_from_context(struct i915_request *rq)
> >   {
> >   	struct intel_context *ce = rq->context;
> > @@ -1777,6 +1921,8 @@ static void remove_from_context(struct i915_request *rq)
> >   	/* Prevent further __await_execution() registering a cb, then flush */
> >   	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > +	guc_prio_fini(rq, ce);
> > +
> >   	spin_unlock_irq(&ce->guc_active.lock);
> >   	atomic_dec(&ce->guc_id_ref);
> > @@ -2058,6 +2204,39 @@ static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
> >   	}
> >   }
> > +static void guc_bump_inflight_request_prio(struct i915_request *rq,
> > +					   int prio)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +	u8 new_guc_prio = map_i915_prio_to_guc_prio(prio);
> > +
> > +	/* Short circuit function */
> > +	if (prio < I915_PRIORITY_NORMAL ||
> > +	    (rq->guc_prio == GUC_PRIO_FINI) ||
> > +	    (rq->guc_prio != GUC_PRIO_INIT &&
> > +	     !new_guc_prio_higher(rq->guc_prio, new_guc_prio)))
> > +		return;
> > +
> > +	spin_lock(&ce->guc_active.lock);
> > +	if (rq->guc_prio != GUC_PRIO_FINI) {
> > +		if (rq->guc_prio != GUC_PRIO_INIT)
> > +			sub_context_inflight_prio(ce, rq->guc_prio);
> > +		rq->guc_prio = new_guc_prio;
> > +		add_context_inflight_prio(ce, rq->guc_prio);
> > +		update_context_prio(ce);
> > +	}
> > +	spin_unlock(&ce->guc_active.lock);
> > +}
> > +
> > +static void guc_retire_inflight_request_prio(struct i915_request *rq)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +
> > +	spin_lock(&ce->guc_active.lock);
> > +	guc_prio_fini(rq, ce);
> > +	spin_unlock(&ce->guc_active.lock);
> > +}
> > +
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> >   {
> >   	struct intel_timeline *tl;
> > @@ -2293,6 +2472,10 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   		guc->sched_engine->disabled = guc_sched_engine_disabled;
> >   		guc->sched_engine->private_data = guc;
> >   		guc->sched_engine->destroy = guc_sched_engine_destroy;
> > +		guc->sched_engine->bump_inflight_request_prio =
> > +			guc_bump_inflight_request_prio;
> > +		guc->sched_engine->retire_inflight_request_prio =
> > +			guc_retire_inflight_request_prio;
> >   		tasklet_setup(&guc->sched_engine->tasklet,
> >   			      guc_submission_tasklet);
> >   	}
> > @@ -2670,6 +2853,22 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> >   	drm_printf(p, "\n");
> >   }
> > +static inline void guc_log_context_priority(struct drm_printer *p,
> > +					    struct intel_context *ce)
> > +{
> > +	int i;
> > +
> > +	drm_printf(p, "\t\tPriority: %d\n",
> > +		   ce->guc_prio);
> > +	drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n");
> > +	for (i = GUC_CLIENT_PRIORITY_KMD_HIGH;
> > +	     i < GUC_CLIENT_PRIORITY_NUM; ++i) {
> > +		drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n",
> > +			   i, ce->guc_prio_count[i]);
> > +	}
> > +	drm_printf(p, "\n");
> > +}
> > +
> >   void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >   					     struct drm_printer *p)
> >   {
> > @@ -2692,6 +2891,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >   		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> >   			   ce->guc_state.sched_state,
> >   			   atomic_read(&ce->guc_sched_state_no_lock));
> > +
> > +		guc_log_context_priority(p, ce);
> >   	}
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index f3552642b8a1..3fdfada99ba0 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -114,6 +114,9 @@ static void i915_fence_release(struct dma_fence *fence)
> >   {
> >   	struct i915_request *rq = to_request(fence);
> > +	GEM_BUG_ON(rq->guc_prio != GUC_PRIO_INIT &&
> > +		   rq->guc_prio != GUC_PRIO_FINI);
> > +
> >   	/*
> >   	 * The request is put onto a RCU freelist (i.e. the address
> >   	 * is immediately reused), mark the fences as being freed now.
> > @@ -922,6 +925,8 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
> >   	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
> > +	rq->guc_prio = GUC_PRIO_INIT;
> > +
> >   	/* We bump the ref for the fence chain */
> >   	i915_sw_fence_reinit(&i915_request_get(rq)->submit);
> >   	i915_sw_fence_reinit(&i915_request_get(rq)->semaphore);
> > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > index a3d4728ea06c..f0463d19c712 100644
> > --- a/drivers/gpu/drm/i915/i915_request.h
> > +++ b/drivers/gpu/drm/i915/i915_request.h
> > @@ -293,6 +293,14 @@ struct i915_request {
> >   	 */
> >   	struct list_head guc_fence_link;
> > +	/**
> > +	 * Priority level while the request is inflight. Differs slightly than
> > +	 * i915 scheduler priority.
> 
> I'd say it differs quite a bit. Maybe refer to the comment above
> I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP?
>

Sure.

Matt

> Daniele
> 
> > +	 */
> > +#define	GUC_PRIO_INIT	0xff
> > +#define	GUC_PRIO_FINI	0xfe
> > +	u8 guc_prio;
> > +
> >   	I915_SELFTEST_DECLARE(struct {
> >   		struct list_head link;
> >   		unsigned long delay;
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> > index 8766a8643469..3fccae3672c1 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.c
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > @@ -241,6 +241,9 @@ static void __i915_schedule(struct i915_sched_node *node,
> >   	/* Fifo and depth-first replacement ensure our deps execute before us */
> >   	sched_engine = lock_sched_engine(node, sched_engine, &cache);
> >   	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
> > +		struct i915_request *from = container_of(dep->signaler,
> > +							 struct i915_request,
> > +							 sched);
> >   		INIT_LIST_HEAD(&dep->dfs_link);
> >   		node = dep->signaler;
> > @@ -254,6 +257,10 @@ static void __i915_schedule(struct i915_sched_node *node,
> >   		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
> >   			   sched_engine);
> > +		/* Must be called before changing the nodes priority */
> > +		if (sched_engine->bump_inflight_request_prio)
> > +			sched_engine->bump_inflight_request_prio(from, prio);
> > +
> >   		WRITE_ONCE(node->attr.priority, prio);
> >   		/*
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > index eaef233e9080..b0a1b58c7893 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> > +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > @@ -179,6 +179,18 @@ struct i915_sched_engine {
> >   	void	(*kick_backend)(const struct i915_request *rq,
> >   				int prio);
> > +	/**
> > +	 * @bump_inflight_request_prio: update priority of an inflight request
> > +	 */
> > +	void	(*bump_inflight_request_prio)(struct i915_request *rq,
> > +					      int prio);
> > +
> > +	/**
> > +	 * @retire_inflight_request_prio: indicate request is retired to
> > +	 * priority tracking
> > +	 */
> > +	void	(*retire_inflight_request_prio)(struct i915_request *rq);
> > +
> >   	/**
> >   	 * @schedule: adjust priority of request
> >   	 *
> > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> > index 937d3706af9b..9d2cd14ed882 100644
> > --- a/drivers/gpu/drm/i915/i915_trace.h
> > +++ b/drivers/gpu/drm/i915/i915_trace.h
> > @@ -914,6 +914,7 @@ DECLARE_EVENT_CLASS(intel_context,
> >   			     __field(int, pin_count)
> >   			     __field(u32, sched_state)
> >   			     __field(u32, guc_sched_state_no_lock)
> > +			     __field(u8, guc_prio)
> >   			     ),
> >   	    TP_fast_assign(
> > @@ -922,11 +923,17 @@ DECLARE_EVENT_CLASS(intel_context,
> >   			   __entry->sched_state = ce->guc_state.sched_state;
> >   			   __entry->guc_sched_state_no_lock =
> >   			   atomic_read(&ce->guc_sched_state_no_lock);
> > +			   __entry->guc_prio = ce->guc_prio;
> >   			   ),
> > -	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
> > +	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x, guc_prio=%u",
> >   		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
> > -		      __entry->guc_sched_state_no_lock)
> > +		      __entry->guc_sched_state_no_lock, __entry->guc_prio)
> > +);
> > +
> > +DEFINE_EVENT(intel_context, intel_context_set_prio,
> > +	     TP_PROTO(struct intel_context *ce),
> > +	     TP_ARGS(ce)
> >   );
> >   DEFINE_EVENT(intel_context, intel_context_reset,
> > @@ -1036,6 +1043,11 @@ trace_i915_request_out(struct i915_request *rq)
> >   {
> >   }
> > +static inline void
> > +trace_intel_context_set_prio(struct intel_context *ce)
> > +{
> > +}
> > +
> >   static inline void
> >   trace_intel_context_reset(struct intel_context *ce)
> >   {
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index e54f9efaead0..cb0a5396e655 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -572,6 +572,15 @@ typedef struct drm_i915_irq_wait {
> >   #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
> >   #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
> >   #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
> > +/*
> > + * Indicates the 2k user priority levels are statically mapped into 3 buckets as
> > + * follows:
> > + *
> > + * -1k to -1	Low priority
> > + * 0		Normal priority
> > + * 1 to 1k	Highest priority
> > + */
> > +#define   I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP	(1ul << 5)
> >   #define I915_PARAM_HUC_STATUS		 42
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 50/51] drm/i915/guc: Implement GuC priority management
@ 2021-07-22 21:38       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22 21:38 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, dri-devel

On Thu, Jul 22, 2021 at 01:26:30PM -0700, Daniele Ceraolo Spurio wrote:
> 
> 
> On 7/16/2021 1:17 PM, Matthew Brost wrote:
> > Implement a simple static mapping algorithm of the i915 priority levels
> > (int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as
> > follows:
> > 
> > i915 level < 0          -> GuC low level     (3)
> > i915 level == 0         -> GuC normal level  (2)
> > i915 level < INT_MAX    -> GuC high level    (1)
> > i915 level == INT_MAX   -> GuC highest level (0)
> > 
> > We believe this mapping should cover the UMD use cases (3 distinct user
> > levels + 1 kernel level).
> > 
> > In addition to static mapping, a simple counter system is attached to
> > each context tracking the number of requests inflight on the context at
> > each level. This is needed as the GuC levels are per context while in
> > the i915 levels are per request.
> 
> As a general note on this patch, IMO we could do a better job of integrating
> context-level priority with request-level prio. However, I'm aware that this
> code is going to be overhauled again for drm scheduler so I'm not going to
> comment in that direction and I'll review the result again once the drm
> scheduler patches are available.
>

This will change a bit with DRM scheduler.

> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   |   3 +
> >   drivers/gpu/drm/i915/gt/intel_context_types.h |   9 +-
> >   drivers/gpu/drm/i915/gt/intel_engine_user.c   |   4 +
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 207 +++++++++++++++++-
> >   drivers/gpu/drm/i915/i915_request.c           |   5 +
> >   drivers/gpu/drm/i915/i915_request.h           |   8 +
> >   drivers/gpu/drm/i915/i915_scheduler.c         |   7 +
> >   drivers/gpu/drm/i915/i915_scheduler_types.h   |  12 +
> >   drivers/gpu/drm/i915/i915_trace.h             |  16 +-
> >   include/uapi/drm/i915_drm.h                   |   9 +
> >   10 files changed, 274 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > index 2007dc6f6b99..209cf265bf74 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > @@ -245,6 +245,9 @@ static void signal_irq_work(struct irq_work *work)
> >   			llist_entry(signal, typeof(*rq), signal_node);
> >   		struct list_head cb_list;
> > +		if (rq->engine->sched_engine->retire_inflight_request_prio)
> > +			rq->engine->sched_engine->retire_inflight_request_prio(rq);
> > +
> >   		spin_lock(&rq->lock);
> >   		list_replace(&rq->fence.cb_list, &cb_list);
> >   		__dma_fence_signal__timestamp(&rq->fence, timestamp);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > index 005a64f2afa7..fe555551c2d2 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> > @@ -18,8 +18,9 @@
> >   #include "intel_engine_types.h"
> >   #include "intel_sseu.h"
> > -#define CONTEXT_REDZONE POISON_INUSE
> > +#include "uc/intel_guc_fwif.h"
> > +#define CONTEXT_REDZONE POISON_INUSE
> >   DECLARE_EWMA(runtime, 3, 8);
> >   struct i915_gem_context;
> > @@ -191,6 +192,12 @@ struct intel_context {
> >   	/* GuC context blocked fence */
> >   	struct i915_sw_fence guc_blocked;
> > +
> > +	/*
> > +	 * GuC priority management
> > +	 */
> > +	u8 guc_prio;
> > +	u32 guc_prio_count[GUC_CLIENT_PRIORITY_NUM];
> >   };
> >   #endif /* __INTEL_CONTEXT_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > index 84142127ebd8..8f8bea08e734 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> > @@ -11,6 +11,7 @@
> >   #include "intel_engine.h"
> >   #include "intel_engine_user.h"
> >   #include "intel_gt.h"
> > +#include "uc/intel_guc_submission.h"
> >   struct intel_engine_cs *
> >   intel_engine_lookup_user(struct drm_i915_private *i915, u8 class, u8 instance)
> > @@ -115,6 +116,9 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
> >   			disabled |= (I915_SCHEDULER_CAP_ENABLED |
> >   				     I915_SCHEDULER_CAP_PRIORITY);
> > +		if (intel_uc_uses_guc_submission(&i915->gt.uc))
> > +			enabled |= I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP;
> > +
> >   		for (i = 0; i < ARRAY_SIZE(map); i++) {
> >   			if (engine->flags & BIT(map[i].engine))
> >   				enabled |= BIT(map[i].sched);
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 536fdbc406c6..263ad6a9e4a9 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -81,7 +81,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count);
> >    */
> >   #define SCHED_STATE_NO_LOCK_ENABLED			BIT(0)
> >   #define SCHED_STATE_NO_LOCK_PENDING_ENABLE		BIT(1)
> > -#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		2
> > +#define SCHED_STATE_NO_LOCK_REGISTERED			BIT(2)
> > +#define SCHED_STATE_NO_LOCK_BLOCKED_SHIFT		3
> >   #define SCHED_STATE_NO_LOCK_BLOCKED \
> >   	BIT(SCHED_STATE_NO_LOCK_BLOCKED_SHIFT)
> >   #define SCHED_STATE_NO_LOCK_BLOCKED_MASK \
> > @@ -142,6 +143,24 @@ static inline void decr_context_blocked(struct intel_context *ce)
> >   		   &ce->guc_sched_state_no_lock);
> >   }
> > +static inline bool context_registered(struct intel_context *ce)
> > +{
> > +	return (atomic_read(&ce->guc_sched_state_no_lock) &
> > +		SCHED_STATE_NO_LOCK_REGISTERED);
> > +}
> > +
> > +static inline void set_context_registered(struct intel_context *ce)
> > +{
> > +	atomic_or(SCHED_STATE_NO_LOCK_REGISTERED,
> > +		  &ce->guc_sched_state_no_lock);
> > +}
> > +
> > +static inline void clr_context_registered(struct intel_context *ce)
> > +{
> > +	atomic_and((u32)~SCHED_STATE_NO_LOCK_REGISTERED,
> > +		   &ce->guc_sched_state_no_lock);
> > +}
> > +
> >   /*
> >    * Below is a set of functions which control the GuC scheduling state which
> >    * require a lock, aside from the special case where the functions are called
> > @@ -1080,6 +1099,7 @@ static int steal_guc_id(struct intel_guc *guc)
> >   		list_del_init(&ce->guc_id_link);
> >   		guc_id = ce->guc_id;
> > +		clr_context_registered(ce);
> >   		set_context_guc_id_invalid(ce);
> >   		return guc_id;
> >   	} else {
> > @@ -1184,10 +1204,13 @@ static int register_context(struct intel_context *ce, bool loop)
> >   	struct intel_guc *guc = ce_to_guc(ce);
> >   	u32 offset = intel_guc_ggtt_offset(guc, guc->lrc_desc_pool) +
> >   		ce->guc_id * sizeof(struct guc_lrc_desc);
> > +	int ret;
> >   	trace_intel_context_register(ce);
> > -	return __guc_action_register_context(guc, ce->guc_id, offset, loop);
> > +	ret = __guc_action_register_context(guc, ce->guc_id, offset, loop);
> > +	set_context_registered(ce);
> 
> Should this setting be conditional to ret == 0 ?
> 

Yep.

> > +	return ret;
> >   }
> >   static int __guc_action_deregister_context(struct intel_guc *guc,
> > @@ -1243,6 +1266,8 @@ static void guc_context_policy_init(struct intel_engine_cs *engine,
> >   	desc->preemption_timeout = engine->props.preempt_timeout_ms * 1000;
> >   }
> > +static inline u8 map_i915_prio_to_guc_prio(int prio);
> > +
> >   static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   {
> >   	struct intel_runtime_pm *runtime_pm =
> > @@ -1251,6 +1276,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	struct intel_guc *guc = &engine->gt->uc.guc;
> >   	u32 desc_idx = ce->guc_id;
> >   	struct guc_lrc_desc *desc;
> > +	const struct i915_gem_context *ctx;
> > +	int prio = I915_CONTEXT_DEFAULT_PRIORITY;
> >   	bool context_registered;
> >   	intel_wakeref_t wakeref;
> >   	int ret = 0;
> > @@ -1266,6 +1293,12 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	context_registered = lrc_desc_registered(guc, desc_idx);
> > +	rcu_read_lock();
> > +	ctx = rcu_dereference(ce->gem_context);
> > +	if (ctx)
> > +		prio = ctx->sched.priority;
> > +	rcu_read_unlock();
> > +
> >   	reset_lrc_desc(guc, desc_idx);
> >   	set_lrc_desc_registered(guc, desc_idx, ce);
> > @@ -1274,7 +1307,8 @@ static int guc_lrc_desc_pin(struct intel_context *ce, bool loop)
> >   	desc->engine_submit_mask = adjust_engine_mask(engine->class,
> >   						      engine->mask);
> >   	desc->hw_context_desc = ce->lrc.lrca;
> > -	desc->priority = GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +	ce->guc_prio = map_i915_prio_to_guc_prio(prio);
> > +	desc->priority = ce->guc_prio;
> >   	desc->context_flags = CONTEXT_REGISTRATION_FLAG_KMD;
> >   	guc_context_policy_init(engine, desc);
> >   	init_sched_state(ce);
> > @@ -1659,11 +1693,17 @@ static inline void guc_lrc_desc_unpin(struct intel_context *ce)
> >   	GEM_BUG_ON(ce != __get_context(guc, ce->guc_id));
> >   	GEM_BUG_ON(context_enabled(ce));
> > +	clr_context_registered(ce);
> >   	deregister_context(ce, ce->guc_id, true);
> >   }
> >   static void __guc_context_destroy(struct intel_context *ce)
> >   {
> > +	GEM_BUG_ON(ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_HIGH] ||
> > +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_HIGH] ||
> > +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_KMD_NORMAL] ||
> > +		   ce->guc_prio_count[GUC_CLIENT_PRIORITY_NORMAL]);
> > +
> >   	lrc_fini(ce);
> >   	intel_context_fini(ce);
> > @@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
> >   	return lrc_alloc(ce, ce->engine);
> >   }
> > +static void guc_context_set_prio(struct intel_guc *guc,
> > +				 struct intel_context *ce,
> > +				 u8 prio)
> > +{
> > +	u32 action[] = {
> > +		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
> > +		ce->guc_id,
> > +		prio,
> > +	};
> > +
> > +	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
> > +		   prio > GUC_CLIENT_PRIORITY_NORMAL);
> > +
> > +	if (ce->guc_prio == prio || submission_disabled(guc) ||
> > +	    !context_registered(ce))
> > +		return;
> > +
> > +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > +
> > +	ce->guc_prio = prio;
> > +	trace_intel_context_set_prio(ce);
> > +}
> > +
> > +static inline u8 map_i915_prio_to_guc_prio(int prio)
> > +{
> > +	if (prio == I915_PRIORITY_NORMAL)
> > +		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > +	else if (prio < I915_PRIORITY_NORMAL)
> > +		return GUC_CLIENT_PRIORITY_NORMAL;
> > +	else if (prio != I915_PRIORITY_BARRIER)
> 
> Shouldn't this be I915_PRIORITY_UNPREEMPTABLE?
>

No, I915_PRIORITY_UNPREEMPTABLE is an execlists only concept.

> > +		return GUC_CLIENT_PRIORITY_HIGH;
> > +	else
> > +		return GUC_CLIENT_PRIORITY_KMD_HIGH;
> > +}
> > +
> > +static inline void add_context_inflight_prio(struct intel_context *ce,
> > +					     u8 guc_prio)
> > +{
> > +	lockdep_assert_held(&ce->guc_active.lock);
> > +	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
> > +
> > +	++ce->guc_prio_count[guc_prio];
> > +
> > +	/* Overflow protection */
> > +	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
> > +}
> > +
> > +static inline void sub_context_inflight_prio(struct intel_context *ce,
> > +					     u8 guc_prio)
> > +{
> > +	lockdep_assert_held(&ce->guc_active.lock);
> > +	GEM_BUG_ON(guc_prio >= ARRAY_SIZE(ce->guc_prio_count));
> > +
> > +	/* Underflow protection */
> > +	GEM_WARN_ON(!ce->guc_prio_count[guc_prio]);
> > +
> > +	--ce->guc_prio_count[guc_prio];
> > +}
> > +
> > +static inline void update_context_prio(struct intel_context *ce)
> > +{
> > +	struct intel_guc *guc = &ce->engine->gt->uc.guc;
> > +	int i;
> > +
> > +	lockdep_assert_held(&ce->guc_active.lock);
> > +
> > +	for (i = 0; i < ARRAY_SIZE(ce->guc_prio_count); ++i) {
> 
> This amd other loops/checks rely on the prios going from highest to lowest
> starting from zero. Maybe add a build check somewhere?
> 
> BUILD_BUG_ON(GUC_PRIO_KMD_HIGH != 0);
> BUILD_BUG_ON(GUC_PRIO_KMD_HIGH > GUC_PRIO_LOW);
>

Good idea.

> > +		if (ce->guc_prio_count[i]) {
> > +			guc_context_set_prio(guc, ce, i);
> > +			break;
> > +		}
> > +	}
> > +}
> > +
> > +static inline bool new_guc_prio_higher(u8 old_guc_prio, u8 new_guc_prio)
> > +{
> > +	/* Lower value is higher priority */
> > +	return new_guc_prio < old_guc_prio;
> > +}
> > +
> >   static void add_to_context(struct i915_request *rq)
> >   {
> >   	struct intel_context *ce = rq->context;
> > +	u8 new_guc_prio = map_i915_prio_to_guc_prio(rq_prio(rq));
> > +
> > +	GEM_BUG_ON(rq->guc_prio == GUC_PRIO_FINI);
> >   	spin_lock(&ce->guc_active.lock);
> >   	list_move_tail(&rq->sched.link, &ce->guc_active.requests);
> > +
> > +	if (rq->guc_prio == GUC_PRIO_INIT) {
> > +		rq->guc_prio = new_guc_prio;
> > +		add_context_inflight_prio(ce, rq->guc_prio);
> > +	} else if (new_guc_prio_higher(rq->guc_prio, new_guc_prio)) {
> > +		sub_context_inflight_prio(ce, rq->guc_prio);
> > +		rq->guc_prio = new_guc_prio;
> > +		add_context_inflight_prio(ce, rq->guc_prio);
> > +	}
> > +	update_context_prio(ce);
> > +
> >   	spin_unlock(&ce->guc_active.lock);
> >   }
> > +static void guc_prio_fini(struct i915_request *rq, struct intel_context *ce)
> > +{
> 
> assert_spin_lock_held(&ce->guc_active.lock) ?
>

Why not.

> > +	if (rq->guc_prio != GUC_PRIO_INIT &&
> > +	    rq->guc_prio != GUC_PRIO_FINI) {
> > +		sub_context_inflight_prio(ce, rq->guc_prio);
> > +		update_context_prio(ce);
> > +	}
> > +	rq->guc_prio = GUC_PRIO_FINI;
> > +}
> > +
> >   static void remove_from_context(struct i915_request *rq)
> >   {
> >   	struct intel_context *ce = rq->context;
> > @@ -1777,6 +1921,8 @@ static void remove_from_context(struct i915_request *rq)
> >   	/* Prevent further __await_execution() registering a cb, then flush */
> >   	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> > +	guc_prio_fini(rq, ce);
> > +
> >   	spin_unlock_irq(&ce->guc_active.lock);
> >   	atomic_dec(&ce->guc_id_ref);
> > @@ -2058,6 +2204,39 @@ static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
> >   	}
> >   }
> > +static void guc_bump_inflight_request_prio(struct i915_request *rq,
> > +					   int prio)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +	u8 new_guc_prio = map_i915_prio_to_guc_prio(prio);
> > +
> > +	/* Short circuit function */
> > +	if (prio < I915_PRIORITY_NORMAL ||
> > +	    (rq->guc_prio == GUC_PRIO_FINI) ||
> > +	    (rq->guc_prio != GUC_PRIO_INIT &&
> > +	     !new_guc_prio_higher(rq->guc_prio, new_guc_prio)))
> > +		return;
> > +
> > +	spin_lock(&ce->guc_active.lock);
> > +	if (rq->guc_prio != GUC_PRIO_FINI) {
> > +		if (rq->guc_prio != GUC_PRIO_INIT)
> > +			sub_context_inflight_prio(ce, rq->guc_prio);
> > +		rq->guc_prio = new_guc_prio;
> > +		add_context_inflight_prio(ce, rq->guc_prio);
> > +		update_context_prio(ce);
> > +	}
> > +	spin_unlock(&ce->guc_active.lock);
> > +}
> > +
> > +static void guc_retire_inflight_request_prio(struct i915_request *rq)
> > +{
> > +	struct intel_context *ce = rq->context;
> > +
> > +	spin_lock(&ce->guc_active.lock);
> > +	guc_prio_fini(rq, ce);
> > +	spin_unlock(&ce->guc_active.lock);
> > +}
> > +
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> >   {
> >   	struct intel_timeline *tl;
> > @@ -2293,6 +2472,10 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   		guc->sched_engine->disabled = guc_sched_engine_disabled;
> >   		guc->sched_engine->private_data = guc;
> >   		guc->sched_engine->destroy = guc_sched_engine_destroy;
> > +		guc->sched_engine->bump_inflight_request_prio =
> > +			guc_bump_inflight_request_prio;
> > +		guc->sched_engine->retire_inflight_request_prio =
> > +			guc_retire_inflight_request_prio;
> >   		tasklet_setup(&guc->sched_engine->tasklet,
> >   			      guc_submission_tasklet);
> >   	}
> > @@ -2670,6 +2853,22 @@ void intel_guc_submission_print_info(struct intel_guc *guc,
> >   	drm_printf(p, "\n");
> >   }
> > +static inline void guc_log_context_priority(struct drm_printer *p,
> > +					    struct intel_context *ce)
> > +{
> > +	int i;
> > +
> > +	drm_printf(p, "\t\tPriority: %d\n",
> > +		   ce->guc_prio);
> > +	drm_printf(p, "\t\tNumber Requests (lower index == higher priority)\n");
> > +	for (i = GUC_CLIENT_PRIORITY_KMD_HIGH;
> > +	     i < GUC_CLIENT_PRIORITY_NUM; ++i) {
> > +		drm_printf(p, "\t\tNumber requests in priority band[%d]: %d\n",
> > +			   i, ce->guc_prio_count[i]);
> > +	}
> > +	drm_printf(p, "\n");
> > +}
> > +
> >   void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >   					     struct drm_printer *p)
> >   {
> > @@ -2692,6 +2891,8 @@ void intel_guc_submission_print_context_info(struct intel_guc *guc,
> >   		drm_printf(p, "\t\tSchedule State: 0x%x, 0x%x\n\n",
> >   			   ce->guc_state.sched_state,
> >   			   atomic_read(&ce->guc_sched_state_no_lock));
> > +
> > +		guc_log_context_priority(p, ce);
> >   	}
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index f3552642b8a1..3fdfada99ba0 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -114,6 +114,9 @@ static void i915_fence_release(struct dma_fence *fence)
> >   {
> >   	struct i915_request *rq = to_request(fence);
> > +	GEM_BUG_ON(rq->guc_prio != GUC_PRIO_INIT &&
> > +		   rq->guc_prio != GUC_PRIO_FINI);
> > +
> >   	/*
> >   	 * The request is put onto a RCU freelist (i.e. the address
> >   	 * is immediately reused), mark the fences as being freed now.
> > @@ -922,6 +925,8 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp)
> >   	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
> > +	rq->guc_prio = GUC_PRIO_INIT;
> > +
> >   	/* We bump the ref for the fence chain */
> >   	i915_sw_fence_reinit(&i915_request_get(rq)->submit);
> >   	i915_sw_fence_reinit(&i915_request_get(rq)->semaphore);
> > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > index a3d4728ea06c..f0463d19c712 100644
> > --- a/drivers/gpu/drm/i915/i915_request.h
> > +++ b/drivers/gpu/drm/i915/i915_request.h
> > @@ -293,6 +293,14 @@ struct i915_request {
> >   	 */
> >   	struct list_head guc_fence_link;
> > +	/**
> > +	 * Priority level while the request is inflight. Differs slightly than
> > +	 * i915 scheduler priority.
> 
> I'd say it differs quite a bit. Maybe refer to the comment above
> I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP?
>

Sure.

Matt

> Daniele
> 
> > +	 */
> > +#define	GUC_PRIO_INIT	0xff
> > +#define	GUC_PRIO_FINI	0xfe
> > +	u8 guc_prio;
> > +
> >   	I915_SELFTEST_DECLARE(struct {
> >   		struct list_head link;
> >   		unsigned long delay;
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> > index 8766a8643469..3fccae3672c1 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.c
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > @@ -241,6 +241,9 @@ static void __i915_schedule(struct i915_sched_node *node,
> >   	/* Fifo and depth-first replacement ensure our deps execute before us */
> >   	sched_engine = lock_sched_engine(node, sched_engine, &cache);
> >   	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
> > +		struct i915_request *from = container_of(dep->signaler,
> > +							 struct i915_request,
> > +							 sched);
> >   		INIT_LIST_HEAD(&dep->dfs_link);
> >   		node = dep->signaler;
> > @@ -254,6 +257,10 @@ static void __i915_schedule(struct i915_sched_node *node,
> >   		GEM_BUG_ON(node_to_request(node)->engine->sched_engine !=
> >   			   sched_engine);
> > +		/* Must be called before changing the nodes priority */
> > +		if (sched_engine->bump_inflight_request_prio)
> > +			sched_engine->bump_inflight_request_prio(from, prio);
> > +
> >   		WRITE_ONCE(node->attr.priority, prio);
> >   		/*
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > index eaef233e9080..b0a1b58c7893 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> > +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> > @@ -179,6 +179,18 @@ struct i915_sched_engine {
> >   	void	(*kick_backend)(const struct i915_request *rq,
> >   				int prio);
> > +	/**
> > +	 * @bump_inflight_request_prio: update priority of an inflight request
> > +	 */
> > +	void	(*bump_inflight_request_prio)(struct i915_request *rq,
> > +					      int prio);
> > +
> > +	/**
> > +	 * @retire_inflight_request_prio: indicate request is retired to
> > +	 * priority tracking
> > +	 */
> > +	void	(*retire_inflight_request_prio)(struct i915_request *rq);
> > +
> >   	/**
> >   	 * @schedule: adjust priority of request
> >   	 *
> > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> > index 937d3706af9b..9d2cd14ed882 100644
> > --- a/drivers/gpu/drm/i915/i915_trace.h
> > +++ b/drivers/gpu/drm/i915/i915_trace.h
> > @@ -914,6 +914,7 @@ DECLARE_EVENT_CLASS(intel_context,
> >   			     __field(int, pin_count)
> >   			     __field(u32, sched_state)
> >   			     __field(u32, guc_sched_state_no_lock)
> > +			     __field(u8, guc_prio)
> >   			     ),
> >   	    TP_fast_assign(
> > @@ -922,11 +923,17 @@ DECLARE_EVENT_CLASS(intel_context,
> >   			   __entry->sched_state = ce->guc_state.sched_state;
> >   			   __entry->guc_sched_state_no_lock =
> >   			   atomic_read(&ce->guc_sched_state_no_lock);
> > +			   __entry->guc_prio = ce->guc_prio;
> >   			   ),
> > -	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x",
> > +	    TP_printk("guc_id=%d, pin_count=%d sched_state=0x%x,0x%x, guc_prio=%u",
> >   		      __entry->guc_id, __entry->pin_count, __entry->sched_state,
> > -		      __entry->guc_sched_state_no_lock)
> > +		      __entry->guc_sched_state_no_lock, __entry->guc_prio)
> > +);
> > +
> > +DEFINE_EVENT(intel_context, intel_context_set_prio,
> > +	     TP_PROTO(struct intel_context *ce),
> > +	     TP_ARGS(ce)
> >   );
> >   DEFINE_EVENT(intel_context, intel_context_reset,
> > @@ -1036,6 +1043,11 @@ trace_i915_request_out(struct i915_request *rq)
> >   {
> >   }
> > +static inline void
> > +trace_intel_context_set_prio(struct intel_context *ce)
> > +{
> > +}
> > +
> >   static inline void
> >   trace_intel_context_reset(struct intel_context *ce)
> >   {
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index e54f9efaead0..cb0a5396e655 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -572,6 +572,15 @@ typedef struct drm_i915_irq_wait {
> >   #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
> >   #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
> >   #define   I915_SCHEDULER_CAP_ENGINE_BUSY_STATS	(1ul << 4)
> > +/*
> > + * Indicates the 2k user priority levels are statically mapped into 3 buckets as
> > + * follows:
> > + *
> > + * -1k to -1	Low priority
> > + * 0		Normal priority
> > + * 1 to 1k	Highest priority
> > + */
> > +#define   I915_SCHEDULER_CAP_STATIC_PRIORITY_MAP	(1ul << 5)
> >   #define I915_PARAM_HUC_STATUS		 42
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 50/51] drm/i915/guc: Implement GuC priority management
  2021-07-22 21:38       ` [Intel-gfx] " Matthew Brost
@ 2021-07-22 21:50         ` Daniele Ceraolo Spurio
  -1 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-22 21:50 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, john.c.harrison, dri-devel


<snip>

>>> @@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
>>>    	return lrc_alloc(ce, ce->engine);
>>>    }
>>> +static void guc_context_set_prio(struct intel_guc *guc,
>>> +				 struct intel_context *ce,
>>> +				 u8 prio)
>>> +{
>>> +	u32 action[] = {
>>> +		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
>>> +		ce->guc_id,
>>> +		prio,
>>> +	};
>>> +
>>> +	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
>>> +		   prio > GUC_CLIENT_PRIORITY_NORMAL);
>>> +
>>> +	if (ce->guc_prio == prio || submission_disabled(guc) ||
>>> +	    !context_registered(ce))
>>> +		return;
>>> +
>>> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>>> +
>>> +	ce->guc_prio = prio;
>>> +	trace_intel_context_set_prio(ce);
>>> +}
>>> +
>>> +static inline u8 map_i915_prio_to_guc_prio(int prio)
>>> +{
>>> +	if (prio == I915_PRIORITY_NORMAL)
>>> +		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
>>> +	else if (prio < I915_PRIORITY_NORMAL)
>>> +		return GUC_CLIENT_PRIORITY_NORMAL;
>>> +	else if (prio != I915_PRIORITY_BARRIER)
>> Shouldn't this be I915_PRIORITY_UNPREEMPTABLE?
>>
> No, I915_PRIORITY_UNPREEMPTABLE is an execlists only concept.

then we need a

/* we don't expect umpreemptable submissions with the GuC */
GEM_BUG_ON(prio == I915_PRIORITY_UNPREEMPTABLE)

or something, because that prio level would be assigned incorrectly 
otherwise.

Daniele


^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 50/51] drm/i915/guc: Implement GuC priority management
@ 2021-07-22 21:50         ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 221+ messages in thread
From: Daniele Ceraolo Spurio @ 2021-07-22 21:50 UTC (permalink / raw)
  To: Matthew Brost; +Cc: intel-gfx, dri-devel


<snip>

>>> @@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
>>>    	return lrc_alloc(ce, ce->engine);
>>>    }
>>> +static void guc_context_set_prio(struct intel_guc *guc,
>>> +				 struct intel_context *ce,
>>> +				 u8 prio)
>>> +{
>>> +	u32 action[] = {
>>> +		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
>>> +		ce->guc_id,
>>> +		prio,
>>> +	};
>>> +
>>> +	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
>>> +		   prio > GUC_CLIENT_PRIORITY_NORMAL);
>>> +
>>> +	if (ce->guc_prio == prio || submission_disabled(guc) ||
>>> +	    !context_registered(ce))
>>> +		return;
>>> +
>>> +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
>>> +
>>> +	ce->guc_prio = prio;
>>> +	trace_intel_context_set_prio(ce);
>>> +}
>>> +
>>> +static inline u8 map_i915_prio_to_guc_prio(int prio)
>>> +{
>>> +	if (prio == I915_PRIORITY_NORMAL)
>>> +		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
>>> +	else if (prio < I915_PRIORITY_NORMAL)
>>> +		return GUC_CLIENT_PRIORITY_NORMAL;
>>> +	else if (prio != I915_PRIORITY_BARRIER)
>> Shouldn't this be I915_PRIORITY_UNPREEMPTABLE?
>>
> No, I915_PRIORITY_UNPREEMPTABLE is an execlists only concept.

then we need a

/* we don't expect umpreemptable submissions with the GuC */
GEM_BUG_ON(prio == I915_PRIORITY_UNPREEMPTABLE)

or something, because that prio level would be assigned incorrectly 
otherwise.

Daniele

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [PATCH 50/51] drm/i915/guc: Implement GuC priority management
  2021-07-22 21:50         ` [Intel-gfx] " Daniele Ceraolo Spurio
@ 2021-07-22 21:55           ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22 21:55 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, john.c.harrison, dri-devel

On Thu, Jul 22, 2021 at 02:50:24PM -0700, Daniele Ceraolo Spurio wrote:
> 
> <snip>
> 
> > > > @@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
> > > >    	return lrc_alloc(ce, ce->engine);
> > > >    }
> > > > +static void guc_context_set_prio(struct intel_guc *guc,
> > > > +				 struct intel_context *ce,
> > > > +				 u8 prio)
> > > > +{
> > > > +	u32 action[] = {
> > > > +		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
> > > > +		ce->guc_id,
> > > > +		prio,
> > > > +	};
> > > > +
> > > > +	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
> > > > +		   prio > GUC_CLIENT_PRIORITY_NORMAL);
> > > > +
> > > > +	if (ce->guc_prio == prio || submission_disabled(guc) ||
> > > > +	    !context_registered(ce))
> > > > +		return;
> > > > +
> > > > +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > > > +
> > > > +	ce->guc_prio = prio;
> > > > +	trace_intel_context_set_prio(ce);
> > > > +}
> > > > +
> > > > +static inline u8 map_i915_prio_to_guc_prio(int prio)
> > > > +{
> > > > +	if (prio == I915_PRIORITY_NORMAL)
> > > > +		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > > > +	else if (prio < I915_PRIORITY_NORMAL)
> > > > +		return GUC_CLIENT_PRIORITY_NORMAL;
> > > > +	else if (prio != I915_PRIORITY_BARRIER)
> > > Shouldn't this be I915_PRIORITY_UNPREEMPTABLE?
> > > 
> > No, I915_PRIORITY_UNPREEMPTABLE is an execlists only concept.
> 
> then we need a
> 
> /* we don't expect umpreemptable submissions with the GuC */
> GEM_BUG_ON(prio == I915_PRIORITY_UNPREEMPTABLE)
> 

Actually this probably should < I915_PRIORITY_DISPLAY as we really want
pageflips to be high than any user context.

Matt

> or something, because that prio level would be assigned incorrectly
> otherwise.
> 
> Daniele
> 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 50/51] drm/i915/guc: Implement GuC priority management
@ 2021-07-22 21:55           ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-22 21:55 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio; +Cc: intel-gfx, dri-devel

On Thu, Jul 22, 2021 at 02:50:24PM -0700, Daniele Ceraolo Spurio wrote:
> 
> <snip>
> 
> > > > @@ -1756,15 +1796,119 @@ static int guc_context_alloc(struct intel_context *ce)
> > > >    	return lrc_alloc(ce, ce->engine);
> > > >    }
> > > > +static void guc_context_set_prio(struct intel_guc *guc,
> > > > +				 struct intel_context *ce,
> > > > +				 u8 prio)
> > > > +{
> > > > +	u32 action[] = {
> > > > +		INTEL_GUC_ACTION_SET_CONTEXT_PRIORITY,
> > > > +		ce->guc_id,
> > > > +		prio,
> > > > +	};
> > > > +
> > > > +	GEM_BUG_ON(prio < GUC_CLIENT_PRIORITY_KMD_HIGH ||
> > > > +		   prio > GUC_CLIENT_PRIORITY_NORMAL);
> > > > +
> > > > +	if (ce->guc_prio == prio || submission_disabled(guc) ||
> > > > +	    !context_registered(ce))
> > > > +		return;
> > > > +
> > > > +	guc_submission_send_busy_loop(guc, action, ARRAY_SIZE(action), 0, true);
> > > > +
> > > > +	ce->guc_prio = prio;
> > > > +	trace_intel_context_set_prio(ce);
> > > > +}
> > > > +
> > > > +static inline u8 map_i915_prio_to_guc_prio(int prio)
> > > > +{
> > > > +	if (prio == I915_PRIORITY_NORMAL)
> > > > +		return GUC_CLIENT_PRIORITY_KMD_NORMAL;
> > > > +	else if (prio < I915_PRIORITY_NORMAL)
> > > > +		return GUC_CLIENT_PRIORITY_NORMAL;
> > > > +	else if (prio != I915_PRIORITY_BARRIER)
> > > Shouldn't this be I915_PRIORITY_UNPREEMPTABLE?
> > > 
> > No, I915_PRIORITY_UNPREEMPTABLE is an execlists only concept.
> 
> then we need a
> 
> /* we don't expect umpreemptable submissions with the GuC */
> GEM_BUG_ON(prio == I915_PRIORITY_UNPREEMPTABLE)
> 

Actually this probably should < I915_PRIORITY_DISPLAY as we really want
pageflips to be high than any user context.

Matt

> or something, because that prio level would be assigned incorrectly
> otherwise.
> 
> Daniele
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
  2021-07-22 12:46     ` Tvrtko Ursulin
@ 2021-07-26 22:25       ` Matthew Brost
  -1 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-26 22:25 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, dri-devel

On Thu, Jul 22, 2021 at 01:46:08PM +0100, Tvrtko Ursulin wrote:
> 
> On 16/07/2021 21:16, Matthew Brost wrote:
> > With GuC virtual engines the physical engine which a request executes
> > and completes on isn't known to the i915. Therefore we can't attach a
> > request to a physical engines breadcrumbs. To work around this we create
> > a single breadcrumbs per engine class when using GuC submission and
> > direct all physical engine interrupts to this breadcrumbs.
> > 
> > v2:
> >   (John H)
> >    - Rework header file structure so intel_engine_mask_t can be in
> >      intel_engine_types.h
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > CC: John Harrison <John.C.Harrison@Intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
> >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 ++++-
> >   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
> >   drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
> >   drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
> >   .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
> >   drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
> >   9 files changed, 133 insertions(+), 37 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > index 38cc42783dfb..2007dc6f6b99 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > @@ -15,28 +15,14 @@
> >   #include "intel_gt_pm.h"
> >   #include "intel_gt_requests.h"
> > -static bool irq_enable(struct intel_engine_cs *engine)
> > +static bool irq_enable(struct intel_breadcrumbs *b)
> >   {
> > -	if (!engine->irq_enable)
> > -		return false;
> > -
> > -	/* Caller disables interrupts */
> > -	spin_lock(&engine->gt->irq_lock);
> > -	engine->irq_enable(engine);
> > -	spin_unlock(&engine->gt->irq_lock);
> > -
> > -	return true;
> > +	return intel_engine_irq_enable(b->irq_engine);
> >   }
> > -static void irq_disable(struct intel_engine_cs *engine)
> > +static void irq_disable(struct intel_breadcrumbs *b)
> >   {
> > -	if (!engine->irq_disable)
> > -		return;
> > -
> > -	/* Caller disables interrupts */
> > -	spin_lock(&engine->gt->irq_lock);
> > -	engine->irq_disable(engine);
> > -	spin_unlock(&engine->gt->irq_lock);
> > +	intel_engine_irq_disable(b->irq_engine);
> >   }
> >   static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> > @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> >   	WRITE_ONCE(b->irq_armed, true);
> >   	/* Requests may have completed before we could enable the interrupt. */
> > -	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
> > +	if (!b->irq_enabled++ && b->irq_enable(b))
> >   		irq_work_queue(&b->irq_work);
> >   }
> > @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
> >   {
> >   	GEM_BUG_ON(!b->irq_enabled);
> >   	if (!--b->irq_enabled)
> > -		irq_disable(b->irq_engine);
> > +		b->irq_disable(b);
> >   	WRITE_ONCE(b->irq_armed, false);
> >   	intel_gt_pm_put_async(b->irq_engine->gt);
> > @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
> >   	if (!b)
> >   		return NULL;
> > -	b->irq_engine = irq_engine;
> > +	kref_init(&b->ref);
> >   	spin_lock_init(&b->signalers_lock);
> >   	INIT_LIST_HEAD(&b->signalers);
> > @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
> >   	spin_lock_init(&b->irq_lock);
> >   	init_irq_work(&b->irq_work, signal_irq_work);
> > +	b->irq_engine = irq_engine;
> > +	b->irq_enable = irq_enable;
> > +	b->irq_disable = irq_disable;
> > +
> >   	return b;
> >   }
> > @@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
> >   	spin_lock_irqsave(&b->irq_lock, flags);
> >   	if (b->irq_enabled)
> > -		irq_enable(b->irq_engine);
> > +		b->irq_enable(b);
> >   	else
> > -		irq_disable(b->irq_engine);
> > +		b->irq_disable(b);
> >   	spin_unlock_irqrestore(&b->irq_lock, flags);
> >   }
> > @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
> >   	}
> >   }
> > -void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
> > +void intel_breadcrumbs_free(struct kref *kref)
> >   {
> > +	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
> > +
> >   	irq_work_sync(&b->irq_work);
> >   	GEM_BUG_ON(!list_empty(&b->signalers));
> >   	GEM_BUG_ON(b->irq_armed);
> > +
> >   	kfree(b);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> > index 3ce5ce270b04..be0d4f379a85 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> > @@ -9,7 +9,7 @@
> >   #include <linux/atomic.h>
> >   #include <linux/irq_work.h>
> > -#include "intel_engine_types.h"
> > +#include "intel_breadcrumbs_types.h"
> >   struct drm_printer;
> >   struct i915_request;
> > @@ -17,7 +17,7 @@ struct intel_breadcrumbs;
> >   struct intel_breadcrumbs *
> >   intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
> > -void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
> > +void intel_breadcrumbs_free(struct kref *kref);
> >   void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
> >   void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
> > @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
> >   void intel_context_remove_breadcrumbs(struct intel_context *ce,
> >   				      struct intel_breadcrumbs *b);
> > +static inline struct intel_breadcrumbs *
> > +intel_breadcrumbs_get(struct intel_breadcrumbs *b)
> > +{
> > +	kref_get(&b->ref);
> > +	return b;
> > +}
> > +
> > +static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
> > +{
> > +	kref_put(&b->ref, intel_breadcrumbs_free);
> > +}
> > +
> >   #endif /* __INTEL_BREADCRUMBS__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > index 3a084ce8ff5e..72dfd3748c4c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > @@ -7,10 +7,13 @@
> >   #define __INTEL_BREADCRUMBS_TYPES__
> >   #include <linux/irq_work.h>
> > +#include <linux/kref.h>
> >   #include <linux/list.h>
> >   #include <linux/spinlock.h>
> >   #include <linux/types.h>
> > +#include "intel_engine_types.h"
> > +
> >   /*
> >    * Rather than have every client wait upon all user interrupts,
> >    * with the herd waking after every interrupt and each doing the
> > @@ -29,6 +32,7 @@
> >    * the overhead of waking that client is much preferred.
> >    */
> >   struct intel_breadcrumbs {
> > +	struct kref ref;
> >   	atomic_t active;
> >   	spinlock_t signalers_lock; /* protects the list of signalers */
> > @@ -42,7 +46,10 @@ struct intel_breadcrumbs {
> >   	bool irq_armed;
> >   	/* Not all breadcrumbs are attached to physical HW */
> > +	intel_engine_mask_t	engine_mask;
> >   	struct intel_engine_cs *irq_engine;
> > +	bool	(*irq_enable)(struct intel_breadcrumbs *b);
> > +	void	(*irq_disable)(struct intel_breadcrumbs *b);
> >   };
> >   #endif /* __INTEL_BREADCRUMBS_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > index 9fec0aca5f4b..edbde6171bca 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > @@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
> >   void intel_engine_init_execlists(struct intel_engine_cs *engine);
> > +bool intel_engine_irq_enable(struct intel_engine_cs *engine);
> > +void intel_engine_irq_disable(struct intel_engine_cs *engine);
> > +
> >   static inline void __intel_engine_reset(struct intel_engine_cs *engine,
> >   					bool stalled)
> >   {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index b7292d5cb7da..d95d666407f5 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
> >   err_cmd_parser:
> >   	i915_sched_engine_put(engine->sched_engine);
> >   err_sched_engine:
> > -	intel_breadcrumbs_free(engine->breadcrumbs);
> > +	intel_breadcrumbs_put(engine->breadcrumbs);
> >   err_status:
> >   	cleanup_status_page(engine);
> >   	return err;
> > @@ -948,7 +948,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
> >   	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
> >   	i915_sched_engine_put(engine->sched_engine);
> > -	intel_breadcrumbs_free(engine->breadcrumbs);
> > +	intel_breadcrumbs_put(engine->breadcrumbs);
> >   	intel_engine_fini_retire(engine);
> >   	intel_engine_cleanup_cmd_parser(engine);
> > @@ -1265,6 +1265,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
> >   	return true;
> >   }
> > +bool intel_engine_irq_enable(struct intel_engine_cs *engine)
> > +{
> > +	if (!engine->irq_enable)
> > +		return false;
> > +
> > +	/* Caller disables interrupts */
> > +	spin_lock(&engine->gt->irq_lock);
> > +	engine->irq_enable(engine);
> > +	spin_unlock(&engine->gt->irq_lock);
> > +
> > +	return true;
> > +}
> > +
> > +void intel_engine_irq_disable(struct intel_engine_cs *engine)
> > +{
> > +	if (!engine->irq_disable)
> > +		return;
> > +
> > +	/* Caller disables interrupts */
> > +	spin_lock(&engine->gt->irq_lock);
> > +	engine->irq_disable(engine);
> > +	spin_unlock(&engine->gt->irq_lock);
> > +}
> > +
> >   void intel_engines_reset_default_submission(struct intel_gt *gt)
> >   {
> >   	struct intel_engine_cs *engine;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 8ad304b2f2e4..03a81e8d87f4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -21,7 +21,6 @@
> >   #include "i915_pmu.h"
> >   #include "i915_priolist_types.h"
> >   #include "i915_selftest.h"
> > -#include "intel_breadcrumbs_types.h"
> >   #include "intel_sseu.h"
> >   #include "intel_timeline_types.h"
> >   #include "intel_uncore.h"
> > @@ -63,6 +62,7 @@ struct i915_sched_engine;
> >   struct intel_gt;
> >   struct intel_ring;
> >   struct intel_uncore;
> > +struct intel_breadcrumbs;
> >   typedef u8 intel_engine_mask_t;
> >   #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 920707e22eb0..abe48421fd7a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -3407,7 +3407,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >   	intel_context_fini(&ve->context);
> >   	if (ve->base.breadcrumbs)
> > -		intel_breadcrumbs_free(ve->base.breadcrumbs);
> > +		intel_breadcrumbs_put(ve->base.breadcrumbs);
> >   	if (ve->base.sched_engine)
> >   		i915_sched_engine_put(ve->base.sched_engine);
> >   	intel_engine_free_request_pool(&ve->base);
> > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > index 9203c766db80..fc5a65ab1937 100644
> > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > @@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
> >   	GEM_BUG_ON(timer_pending(&mock->hw_delay));
> >   	i915_sched_engine_put(engine->sched_engine);
> > -	intel_breadcrumbs_free(engine->breadcrumbs);
> > +	intel_breadcrumbs_put(engine->breadcrumbs);
> >   	intel_context_unpin(engine->kernel_context);
> >   	intel_context_put(engine->kernel_context);
> > @@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
> >   	return 0;
> >   err_breadcrumbs:
> > -	intel_breadcrumbs_free(engine->breadcrumbs);
> > +	intel_breadcrumbs_put(engine->breadcrumbs);
> >   err_schedule:
> >   	i915_sched_engine_put(engine->sched_engine);
> >   	return -ENOMEM;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 372e0dc7617a..9f28899ff17f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1076,6 +1076,9 @@ static void __guc_context_destroy(struct intel_context *ce)
> >   		struct guc_virtual_engine *ve =
> >   			container_of(ce, typeof(*ve), context);
> > +		if (ve->base.breadcrumbs)
> > +			intel_breadcrumbs_put(ve->base.breadcrumbs);
> > +
> >   		kfree(ve);
> >   	} else {
> >   		intel_context_free(ce);
> > @@ -1366,6 +1369,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
> >   	.get_sibling = guc_virtual_get_sibling,
> >   };
> > +static bool
> > +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
> > +{
> > +	struct intel_engine_cs *sibling;
> > +	intel_engine_mask_t tmp, mask = b->engine_mask;
> > +	bool result = false;
> > +
> > +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> > +		result |= intel_engine_irq_enable(sibling);
> > +
> > +	return result;
> > +}
> > +
> > +static void
> > +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
> > +{
> > +	struct intel_engine_cs *sibling;
> > +	intel_engine_mask_t tmp, mask = b->engine_mask;
> > +
> > +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> > +		intel_engine_irq_disable(sibling);
> > +}
> > +
> > +static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
> > +{
> > +	int i;
> > +
> > +       /*
> > +        * In GuC submission mode we do not know which physical engine a request
> > +        * will be scheduled on, this creates a problem because the breadcrumb
> > +        * interrupt is per physical engine. To work around this we attach
> > +        * requests and direct all breadcrumb interrupts to the first instance
> > +        * of an engine per class. In addition all breadcrumb interrupts are
> > +	* enabled / disabled across an engine class in unison.
> > +        */
> > +	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
> > +		struct intel_engine_cs *sibling =
> > +			engine->gt->engine_class[engine->class][i];
> > +
> > +		if (sibling) {
> > +			if (engine->breadcrumbs != sibling->breadcrumbs) {
> > +				intel_breadcrumbs_put(engine->breadcrumbs);
> 
> So it's still only me that is bothered by the lazyness of the proposed init
> path?
> 

IMO this really isn't all that bad nor the first place I've seen
something like this done. In a quick 2 minute search of the i915 I found
that we do basically the exact same thing in
intel_engine_create_pinned_context for the ce->vm. I doubt this is the
only place in the driver that does something similar to this too.

> intel_engines_init
>   for_each_engine
>     engine_setup_common
>       engine->breadcrumbs = intel_breadcrumbs_create(engine);
>     intel_guc_submission_setup
>       intel_breadcrumbs_put(engine->breadcrumbs);
>       ...
> 
> In other words leaves the code to allocate N objects in common and then it
> frees all but one of them, overriding vfuncs in the remaining instance.
> 
> Why not pull out the setup out of common, when it is obviously not common
> any more?
> 

We can write down your concerns and revisit after this series merges. That
will give a clear side by side of what we have and what you purpose. It
should be easy at that point to make a decision and move on.

Matt

> Regards,
> 
> Tvrtko
> 
> > +				engine->breadcrumbs =
> > +					intel_breadcrumbs_get(sibling->breadcrumbs);
> > +			}
> > +			break;
> > +		}
> > +	}
> > +
> > +	if (engine->breadcrumbs) {
> > +		engine->breadcrumbs->engine_mask |= engine->mask;
> > +		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
> > +		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
> > +	}
> > +}
> > +
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> >   {
> >   	struct intel_timeline *tl;
> > @@ -1589,6 +1648,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   	guc_default_vfuncs(engine);
> >   	guc_default_irqs(engine);
> > +	guc_init_breadcrumbs(engine);
> >   	if (engine->class == RENDER_CLASS)
> >   		rcs_submission_override(engine);
> > @@ -1831,11 +1891,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> >   	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> >   	ve->base.saturated = ALL_ENGINES;
> > -	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> > -	if (!ve->base.breadcrumbs) {
> > -		kfree(ve);
> > -		return ERR_PTR(-ENOMEM);
> > -	}
> >   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> > @@ -1884,6 +1939,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   				sibling->emit_fini_breadcrumb;
> >   			ve->base.emit_fini_breadcrumb_dw =
> >   				sibling->emit_fini_breadcrumb_dw;
> > +			ve->base.breadcrumbs =
> > +				intel_breadcrumbs_get(sibling->breadcrumbs);
> >   			ve->base.flags |= sibling->flags;
> > 

^ permalink raw reply	[flat|nested] 221+ messages in thread

* Re: [Intel-gfx] [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
@ 2021-07-26 22:25       ` Matthew Brost
  0 siblings, 0 replies; 221+ messages in thread
From: Matthew Brost @ 2021-07-26 22:25 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, dri-devel

On Thu, Jul 22, 2021 at 01:46:08PM +0100, Tvrtko Ursulin wrote:
> 
> On 16/07/2021 21:16, Matthew Brost wrote:
> > With GuC virtual engines the physical engine which a request executes
> > and completes on isn't known to the i915. Therefore we can't attach a
> > request to a physical engines breadcrumbs. To work around this we create
> > a single breadcrumbs per engine class when using GuC submission and
> > direct all physical engine interrupts to this breadcrumbs.
> > 
> > v2:
> >   (John H)
> >    - Rework header file structure so intel_engine_mask_t can be in
> >      intel_engine_types.h
> > 
> > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > CC: John Harrison <John.C.Harrison@Intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 41 +++++-------
> >   drivers/gpu/drm/i915/gt/intel_breadcrumbs.h   | 16 ++++-
> >   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  7 ++
> >   drivers/gpu/drm/i915/gt/intel_engine.h        |  3 +
> >   drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 28 +++++++-
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 +-
> >   .../drm/i915/gt/intel_execlists_submission.c  |  2 +-
> >   drivers/gpu/drm/i915/gt/mock_engine.c         |  4 +-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 67 +++++++++++++++++--
> >   9 files changed, 133 insertions(+), 37 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > index 38cc42783dfb..2007dc6f6b99 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> > @@ -15,28 +15,14 @@
> >   #include "intel_gt_pm.h"
> >   #include "intel_gt_requests.h"
> > -static bool irq_enable(struct intel_engine_cs *engine)
> > +static bool irq_enable(struct intel_breadcrumbs *b)
> >   {
> > -	if (!engine->irq_enable)
> > -		return false;
> > -
> > -	/* Caller disables interrupts */
> > -	spin_lock(&engine->gt->irq_lock);
> > -	engine->irq_enable(engine);
> > -	spin_unlock(&engine->gt->irq_lock);
> > -
> > -	return true;
> > +	return intel_engine_irq_enable(b->irq_engine);
> >   }
> > -static void irq_disable(struct intel_engine_cs *engine)
> > +static void irq_disable(struct intel_breadcrumbs *b)
> >   {
> > -	if (!engine->irq_disable)
> > -		return;
> > -
> > -	/* Caller disables interrupts */
> > -	spin_lock(&engine->gt->irq_lock);
> > -	engine->irq_disable(engine);
> > -	spin_unlock(&engine->gt->irq_lock);
> > +	intel_engine_irq_disable(b->irq_engine);
> >   }
> >   static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> > @@ -57,7 +43,7 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
> >   	WRITE_ONCE(b->irq_armed, true);
> >   	/* Requests may have completed before we could enable the interrupt. */
> > -	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
> > +	if (!b->irq_enabled++ && b->irq_enable(b))
> >   		irq_work_queue(&b->irq_work);
> >   }
> > @@ -76,7 +62,7 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
> >   {
> >   	GEM_BUG_ON(!b->irq_enabled);
> >   	if (!--b->irq_enabled)
> > -		irq_disable(b->irq_engine);
> > +		b->irq_disable(b);
> >   	WRITE_ONCE(b->irq_armed, false);
> >   	intel_gt_pm_put_async(b->irq_engine->gt);
> > @@ -281,7 +267,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
> >   	if (!b)
> >   		return NULL;
> > -	b->irq_engine = irq_engine;
> > +	kref_init(&b->ref);
> >   	spin_lock_init(&b->signalers_lock);
> >   	INIT_LIST_HEAD(&b->signalers);
> > @@ -290,6 +276,10 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
> >   	spin_lock_init(&b->irq_lock);
> >   	init_irq_work(&b->irq_work, signal_irq_work);
> > +	b->irq_engine = irq_engine;
> > +	b->irq_enable = irq_enable;
> > +	b->irq_disable = irq_disable;
> > +
> >   	return b;
> >   }
> > @@ -303,9 +293,9 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
> >   	spin_lock_irqsave(&b->irq_lock, flags);
> >   	if (b->irq_enabled)
> > -		irq_enable(b->irq_engine);
> > +		b->irq_enable(b);
> >   	else
> > -		irq_disable(b->irq_engine);
> > +		b->irq_disable(b);
> >   	spin_unlock_irqrestore(&b->irq_lock, flags);
> >   }
> > @@ -325,11 +315,14 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
> >   	}
> >   }
> > -void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
> > +void intel_breadcrumbs_free(struct kref *kref)
> >   {
> > +	struct intel_breadcrumbs *b = container_of(kref, typeof(*b), ref);
> > +
> >   	irq_work_sync(&b->irq_work);
> >   	GEM_BUG_ON(!list_empty(&b->signalers));
> >   	GEM_BUG_ON(b->irq_armed);
> > +
> >   	kfree(b);
> >   }
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> > index 3ce5ce270b04..be0d4f379a85 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.h
> > @@ -9,7 +9,7 @@
> >   #include <linux/atomic.h>
> >   #include <linux/irq_work.h>
> > -#include "intel_engine_types.h"
> > +#include "intel_breadcrumbs_types.h"
> >   struct drm_printer;
> >   struct i915_request;
> > @@ -17,7 +17,7 @@ struct intel_breadcrumbs;
> >   struct intel_breadcrumbs *
> >   intel_breadcrumbs_create(struct intel_engine_cs *irq_engine);
> > -void intel_breadcrumbs_free(struct intel_breadcrumbs *b);
> > +void intel_breadcrumbs_free(struct kref *kref);
> >   void intel_breadcrumbs_reset(struct intel_breadcrumbs *b);
> >   void __intel_breadcrumbs_park(struct intel_breadcrumbs *b);
> > @@ -48,4 +48,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *request);
> >   void intel_context_remove_breadcrumbs(struct intel_context *ce,
> >   				      struct intel_breadcrumbs *b);
> > +static inline struct intel_breadcrumbs *
> > +intel_breadcrumbs_get(struct intel_breadcrumbs *b)
> > +{
> > +	kref_get(&b->ref);
> > +	return b;
> > +}
> > +
> > +static inline void intel_breadcrumbs_put(struct intel_breadcrumbs *b)
> > +{
> > +	kref_put(&b->ref, intel_breadcrumbs_free);
> > +}
> > +
> >   #endif /* __INTEL_BREADCRUMBS__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > index 3a084ce8ff5e..72dfd3748c4c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> > @@ -7,10 +7,13 @@
> >   #define __INTEL_BREADCRUMBS_TYPES__
> >   #include <linux/irq_work.h>
> > +#include <linux/kref.h>
> >   #include <linux/list.h>
> >   #include <linux/spinlock.h>
> >   #include <linux/types.h>
> > +#include "intel_engine_types.h"
> > +
> >   /*
> >    * Rather than have every client wait upon all user interrupts,
> >    * with the herd waking after every interrupt and each doing the
> > @@ -29,6 +32,7 @@
> >    * the overhead of waking that client is much preferred.
> >    */
> >   struct intel_breadcrumbs {
> > +	struct kref ref;
> >   	atomic_t active;
> >   	spinlock_t signalers_lock; /* protects the list of signalers */
> > @@ -42,7 +46,10 @@ struct intel_breadcrumbs {
> >   	bool irq_armed;
> >   	/* Not all breadcrumbs are attached to physical HW */
> > +	intel_engine_mask_t	engine_mask;
> >   	struct intel_engine_cs *irq_engine;
> > +	bool	(*irq_enable)(struct intel_breadcrumbs *b);
> > +	void	(*irq_disable)(struct intel_breadcrumbs *b);
> >   };
> >   #endif /* __INTEL_BREADCRUMBS_TYPES__ */
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h
> > index 9fec0aca5f4b..edbde6171bca 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine.h
> > @@ -212,6 +212,9 @@ void intel_engine_get_instdone(const struct intel_engine_cs *engine,
> >   void intel_engine_init_execlists(struct intel_engine_cs *engine);
> > +bool intel_engine_irq_enable(struct intel_engine_cs *engine);
> > +void intel_engine_irq_disable(struct intel_engine_cs *engine);
> > +
> >   static inline void __intel_engine_reset(struct intel_engine_cs *engine,
> >   					bool stalled)
> >   {
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index b7292d5cb7da..d95d666407f5 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -739,7 +739,7 @@ static int engine_setup_common(struct intel_engine_cs *engine)
> >   err_cmd_parser:
> >   	i915_sched_engine_put(engine->sched_engine);
> >   err_sched_engine:
> > -	intel_breadcrumbs_free(engine->breadcrumbs);
> > +	intel_breadcrumbs_put(engine->breadcrumbs);
> >   err_status:
> >   	cleanup_status_page(engine);
> >   	return err;
> > @@ -948,7 +948,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
> >   	GEM_BUG_ON(!list_empty(&engine->sched_engine->requests));
> >   	i915_sched_engine_put(engine->sched_engine);
> > -	intel_breadcrumbs_free(engine->breadcrumbs);
> > +	intel_breadcrumbs_put(engine->breadcrumbs);
> >   	intel_engine_fini_retire(engine);
> >   	intel_engine_cleanup_cmd_parser(engine);
> > @@ -1265,6 +1265,30 @@ bool intel_engines_are_idle(struct intel_gt *gt)
> >   	return true;
> >   }
> > +bool intel_engine_irq_enable(struct intel_engine_cs *engine)
> > +{
> > +	if (!engine->irq_enable)
> > +		return false;
> > +
> > +	/* Caller disables interrupts */
> > +	spin_lock(&engine->gt->irq_lock);
> > +	engine->irq_enable(engine);
> > +	spin_unlock(&engine->gt->irq_lock);
> > +
> > +	return true;
> > +}
> > +
> > +void intel_engine_irq_disable(struct intel_engine_cs *engine)
> > +{
> > +	if (!engine->irq_disable)
> > +		return;
> > +
> > +	/* Caller disables interrupts */
> > +	spin_lock(&engine->gt->irq_lock);
> > +	engine->irq_disable(engine);
> > +	spin_unlock(&engine->gt->irq_lock);
> > +}
> > +
> >   void intel_engines_reset_default_submission(struct intel_gt *gt)
> >   {
> >   	struct intel_engine_cs *engine;
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 8ad304b2f2e4..03a81e8d87f4 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -21,7 +21,6 @@
> >   #include "i915_pmu.h"
> >   #include "i915_priolist_types.h"
> >   #include "i915_selftest.h"
> > -#include "intel_breadcrumbs_types.h"
> >   #include "intel_sseu.h"
> >   #include "intel_timeline_types.h"
> >   #include "intel_uncore.h"
> > @@ -63,6 +62,7 @@ struct i915_sched_engine;
> >   struct intel_gt;
> >   struct intel_ring;
> >   struct intel_uncore;
> > +struct intel_breadcrumbs;
> >   typedef u8 intel_engine_mask_t;
> >   #define ALL_ENGINES ((intel_engine_mask_t)~0ul)
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index 920707e22eb0..abe48421fd7a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -3407,7 +3407,7 @@ static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >   	intel_context_fini(&ve->context);
> >   	if (ve->base.breadcrumbs)
> > -		intel_breadcrumbs_free(ve->base.breadcrumbs);
> > +		intel_breadcrumbs_put(ve->base.breadcrumbs);
> >   	if (ve->base.sched_engine)
> >   		i915_sched_engine_put(ve->base.sched_engine);
> >   	intel_engine_free_request_pool(&ve->base);
> > diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c
> > index 9203c766db80..fc5a65ab1937 100644
> > --- a/drivers/gpu/drm/i915/gt/mock_engine.c
> > +++ b/drivers/gpu/drm/i915/gt/mock_engine.c
> > @@ -284,7 +284,7 @@ static void mock_engine_release(struct intel_engine_cs *engine)
> >   	GEM_BUG_ON(timer_pending(&mock->hw_delay));
> >   	i915_sched_engine_put(engine->sched_engine);
> > -	intel_breadcrumbs_free(engine->breadcrumbs);
> > +	intel_breadcrumbs_put(engine->breadcrumbs);
> >   	intel_context_unpin(engine->kernel_context);
> >   	intel_context_put(engine->kernel_context);
> > @@ -376,7 +376,7 @@ int mock_engine_init(struct intel_engine_cs *engine)
> >   	return 0;
> >   err_breadcrumbs:
> > -	intel_breadcrumbs_free(engine->breadcrumbs);
> > +	intel_breadcrumbs_put(engine->breadcrumbs);
> >   err_schedule:
> >   	i915_sched_engine_put(engine->sched_engine);
> >   	return -ENOMEM;
> > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > index 372e0dc7617a..9f28899ff17f 100644
> > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > @@ -1076,6 +1076,9 @@ static void __guc_context_destroy(struct intel_context *ce)
> >   		struct guc_virtual_engine *ve =
> >   			container_of(ce, typeof(*ve), context);
> > +		if (ve->base.breadcrumbs)
> > +			intel_breadcrumbs_put(ve->base.breadcrumbs);
> > +
> >   		kfree(ve);
> >   	} else {
> >   		intel_context_free(ce);
> > @@ -1366,6 +1369,62 @@ static const struct intel_context_ops virtual_guc_context_ops = {
> >   	.get_sibling = guc_virtual_get_sibling,
> >   };
> > +static bool
> > +guc_irq_enable_breadcrumbs(struct intel_breadcrumbs *b)
> > +{
> > +	struct intel_engine_cs *sibling;
> > +	intel_engine_mask_t tmp, mask = b->engine_mask;
> > +	bool result = false;
> > +
> > +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> > +		result |= intel_engine_irq_enable(sibling);
> > +
> > +	return result;
> > +}
> > +
> > +static void
> > +guc_irq_disable_breadcrumbs(struct intel_breadcrumbs *b)
> > +{
> > +	struct intel_engine_cs *sibling;
> > +	intel_engine_mask_t tmp, mask = b->engine_mask;
> > +
> > +	for_each_engine_masked(sibling, b->irq_engine->gt, mask, tmp)
> > +		intel_engine_irq_disable(sibling);
> > +}
> > +
> > +static void guc_init_breadcrumbs(struct intel_engine_cs *engine)
> > +{
> > +	int i;
> > +
> > +       /*
> > +        * In GuC submission mode we do not know which physical engine a request
> > +        * will be scheduled on, this creates a problem because the breadcrumb
> > +        * interrupt is per physical engine. To work around this we attach
> > +        * requests and direct all breadcrumb interrupts to the first instance
> > +        * of an engine per class. In addition all breadcrumb interrupts are
> > +	* enabled / disabled across an engine class in unison.
> > +        */
> > +	for (i = 0; i < MAX_ENGINE_INSTANCE; ++i) {
> > +		struct intel_engine_cs *sibling =
> > +			engine->gt->engine_class[engine->class][i];
> > +
> > +		if (sibling) {
> > +			if (engine->breadcrumbs != sibling->breadcrumbs) {
> > +				intel_breadcrumbs_put(engine->breadcrumbs);
> 
> So it's still only me that is bothered by the lazyness of the proposed init
> path?
> 

IMO this really isn't all that bad nor the first place I've seen
something like this done. In a quick 2 minute search of the i915 I found
that we do basically the exact same thing in
intel_engine_create_pinned_context for the ce->vm. I doubt this is the
only place in the driver that does something similar to this too.

> intel_engines_init
>   for_each_engine
>     engine_setup_common
>       engine->breadcrumbs = intel_breadcrumbs_create(engine);
>     intel_guc_submission_setup
>       intel_breadcrumbs_put(engine->breadcrumbs);
>       ...
> 
> In other words leaves the code to allocate N objects in common and then it
> frees all but one of them, overriding vfuncs in the remaining instance.
> 
> Why not pull out the setup out of common, when it is obviously not common
> any more?
> 

We can write down your concerns and revisit after this series merges. That
will give a clear side by side of what we have and what you purpose. It
should be easy at that point to make a decision and move on.

Matt

> Regards,
> 
> Tvrtko
> 
> > +				engine->breadcrumbs =
> > +					intel_breadcrumbs_get(sibling->breadcrumbs);
> > +			}
> > +			break;
> > +		}
> > +	}
> > +
> > +	if (engine->breadcrumbs) {
> > +		engine->breadcrumbs->engine_mask |= engine->mask;
> > +		engine->breadcrumbs->irq_enable = guc_irq_enable_breadcrumbs;
> > +		engine->breadcrumbs->irq_disable = guc_irq_disable_breadcrumbs;
> > +	}
> > +}
> > +
> >   static void sanitize_hwsp(struct intel_engine_cs *engine)
> >   {
> >   	struct intel_timeline *tl;
> > @@ -1589,6 +1648,7 @@ int intel_guc_submission_setup(struct intel_engine_cs *engine)
> >   	guc_default_vfuncs(engine);
> >   	guc_default_irqs(engine);
> > +	guc_init_breadcrumbs(engine);
> >   	if (engine->class == RENDER_CLASS)
> >   		rcs_submission_override(engine);
> > @@ -1831,11 +1891,6 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> >   	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> >   	ve->base.saturated = ALL_ENGINES;
> > -	ve->base.breadcrumbs = intel_breadcrumbs_create(&ve->base);
> > -	if (!ve->base.breadcrumbs) {
> > -		kfree(ve);
> > -		return ERR_PTR(-ENOMEM);
> > -	}
> >   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> > @@ -1884,6 +1939,8 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count)
> >   				sibling->emit_fini_breadcrumb;
> >   			ve->base.emit_fini_breadcrumb_dw =
> >   				sibling->emit_fini_breadcrumb_dw;
> > +			ve->base.breadcrumbs =
> > +				intel_breadcrumbs_get(sibling->breadcrumbs);
> >   			ve->base.flags |= sibling->flags;
> > 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 221+ messages in thread

end of thread, other threads:[~2021-07-26 22:25 UTC | newest]

Thread overview: 221+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-16 20:16 [PATCH 00/51] GuC submission support Matthew Brost
2021-07-16 20:16 ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 01/51] drm/i915/guc: Add new GuC interface defines and structures Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 02/51] drm/i915/guc: Remove GuC stage descriptor, add LRC descriptor Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 03/51] drm/i915/guc: Add LRC descriptor context lookup array Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 04/51] drm/i915/guc: Implement GuC submission tasklet Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-19 23:01   ` John Harrison
2021-07-19 23:01     ` [Intel-gfx] " John Harrison
2021-07-19 22:55     ` Matthew Brost
2021-07-19 22:55       ` [Intel-gfx] " Matthew Brost
2021-07-20  0:26       ` John Harrison
2021-07-20  0:26         ` [Intel-gfx] " John Harrison
2021-07-16 20:16 ` [PATCH 05/51] drm/i915/guc: Add bypass tasklet submission path to GuC Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 06/51] drm/i915/guc: Implement GuC context operations for new inteface Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20  0:23   ` John Harrison
2021-07-20  0:23     ` [Intel-gfx] " John Harrison
2021-07-20  2:45     ` Matthew Brost
2021-07-20  2:45       ` [Intel-gfx] " Matthew Brost
2021-07-20  0:51   ` Daniele Ceraolo Spurio
2021-07-20  0:51     ` [Intel-gfx] " Daniele Ceraolo Spurio
2021-07-20  4:04     ` Matthew Brost
2021-07-20  4:04       ` [Intel-gfx] " Matthew Brost
2021-07-21 23:51       ` Daniele Ceraolo Spurio
2021-07-21 23:51         ` [Intel-gfx] " Daniele Ceraolo Spurio
2021-07-22  7:57         ` Michal Wajdeczko
2021-07-22  7:57           ` Michal Wajdeczko
2021-07-22 15:48           ` Matthew Brost
2021-07-22 15:48             ` Matthew Brost
2021-07-16 20:16 ` [PATCH 07/51] drm/i915/guc: Insert fence on context when deregistering Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 08/51] drm/i915/guc: Defer context unpin until scheduling is disabled Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 09/51] drm/i915/guc: Disable engine barriers with GuC during unpin Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 10/51] drm/i915/guc: Extend deregistration fence to schedule disable Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 11/51] drm/i915: Disable preempt busywait when using GuC scheduling Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 12/51] drm/i915/guc: Ensure request ordering via completion fences Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-19 23:46   ` Daniele Ceraolo Spurio
2021-07-19 23:46     ` [Intel-gfx] " Daniele Ceraolo Spurio
2021-07-20  2:48     ` Matthew Brost
2021-07-20  2:48       ` [Intel-gfx] " Matthew Brost
2021-07-20  2:50       ` Matthew Brost
2021-07-20  2:50         ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 13/51] drm/i915/guc: Disable semaphores when using GuC scheduling Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20  0:33   ` John Harrison
2021-07-20  0:33     ` [Intel-gfx] " John Harrison
2021-07-16 20:16 ` [PATCH 14/51] drm/i915/guc: Ensure G2H response has space in buffer Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 15/51] drm/i915/guc: Update intel_gt_wait_for_idle to work with GuC Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20  1:03   ` John Harrison
2021-07-20  1:03     ` [Intel-gfx] " John Harrison
2021-07-20  1:53     ` Matthew Brost
2021-07-20  1:53       ` [Intel-gfx] " Matthew Brost
2021-07-20 19:49       ` John Harrison
2021-07-20 19:49         ` [Intel-gfx] " John Harrison
2021-07-16 20:16 ` [PATCH 16/51] drm/i915/guc: Update GuC debugfs to support new GuC Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20  1:13   ` John Harrison
2021-07-20  1:13     ` [Intel-gfx] " John Harrison
2021-07-16 20:16 ` [PATCH 17/51] drm/i915/guc: Add several request trace points Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20  1:27   ` John Harrison
2021-07-20  1:27     ` [Intel-gfx] " John Harrison
2021-07-20  2:10     ` Matthew Brost
2021-07-20  2:10       ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 18/51] drm/i915: Add intel_context tracing Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 19/51] drm/i915/guc: GuC virtual engines Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-19 23:33   ` Daniele Ceraolo Spurio
2021-07-19 23:33     ` [Intel-gfx] " Daniele Ceraolo Spurio
2021-07-19 23:27     ` Matthew Brost
2021-07-19 23:27       ` [Intel-gfx] " Matthew Brost
2021-07-19 23:42       ` Daniele Ceraolo Spurio
2021-07-19 23:42         ` [Intel-gfx] " Daniele Ceraolo Spurio
2021-07-19 23:32         ` Matthew Brost
2021-07-19 23:32           ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 20/51] drm/i915: Track 'serial' counts for " Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20  1:28   ` John Harrison
2021-07-20  1:28     ` [Intel-gfx] " John Harrison
2021-07-20  1:54     ` Matthew Brost
2021-07-20  1:54       ` [Intel-gfx] " Matthew Brost
2021-07-20 16:47       ` Matthew Brost
2021-07-20 16:47         ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 21/51] drm/i915: Hold reference to intel_context over life of i915_request Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 22/51] drm/i915/guc: Disable bonding extension with GuC submission Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 23/51] drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20 19:45   ` John Harrison
2021-07-20 19:45     ` [Intel-gfx] " John Harrison
2021-07-22 12:46   ` Tvrtko Ursulin
2021-07-22 12:46     ` Tvrtko Ursulin
2021-07-26 22:25     ` Matthew Brost
2021-07-26 22:25       ` Matthew Brost
2021-07-16 20:16 ` [PATCH 24/51] drm/i915: Add i915_sched_engine destroy vfunc Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20 19:55   ` John Harrison
2021-07-20 19:55     ` [Intel-gfx] " John Harrison
2021-07-20 19:53     ` Matthew Brost
2021-07-20 19:53       ` [Intel-gfx] " Matthew Brost
2021-07-16 20:16 ` [PATCH 25/51] drm/i915: Move active request tracking to a vfunc Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20 20:05   ` John Harrison
2021-07-20 20:05     ` [Intel-gfx] " John Harrison
2021-07-16 20:16 ` [PATCH 26/51] drm/i915/guc: Reset implementation for new GuC interface Matthew Brost
2021-07-16 20:16   ` [Intel-gfx] " Matthew Brost
2021-07-20 20:19   ` John Harrison
2021-07-20 20:19     ` [Intel-gfx] " John Harrison
2021-07-20 20:59     ` Matthew Brost
2021-07-20 20:59       ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 27/51] drm/i915: Reset GPU immediately if submission is disabled Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 28/51] drm/i915/guc: Add disable interrupts to guc sanitize Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 29/51] drm/i915/guc: Suspend/resume implementation for new interface Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 30/51] drm/i915/guc: Handle context reset notification Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-20 20:29   ` John Harrison
2021-07-20 20:29     ` [Intel-gfx] " John Harrison
2021-07-20 20:38     ` Matthew Brost
2021-07-20 20:38       ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 31/51] drm/i915/guc: Handle engine reset failure notification Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 32/51] drm/i915/guc: Enable the timer expired interrupt for GuC Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 33/51] drm/i915/guc: Provide mmio list to be saved/restored on engine reset Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-22  4:47   ` Matthew Brost
2021-07-22  4:47     ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 34/51] drm/i915/guc: Don't complain about reset races Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 35/51] drm/i915/guc: Enable GuC engine reset Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 36/51] drm/i915/guc: Capture error state on context reset Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 37/51] drm/i915/guc: Fix for error capture after full GPU reset with GuC Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 38/51] drm/i915/guc: Hook GuC scheduling policies up Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 39/51] drm/i915/guc: Connect reset modparam updates to GuC policy flags Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:04   ` Matthew Brost
2021-07-16 20:04     ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 40/51] drm/i915/guc: Include scheduling policies in the debugfs state dump Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 41/51] drm/i915/guc: Add golden context to GuC ADS Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-19 17:24   ` Matthew Brost
2021-07-19 17:24     ` Matthew Brost
2021-07-19 18:25     ` John Harrison
2021-07-19 18:25       ` John Harrison
2021-07-19 18:30       ` Matthew Brost
2021-07-19 18:30         ` Matthew Brost
2021-07-16 20:17 ` [PATCH 42/51] drm/i915/guc: Implement banned contexts for GuC submission Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-20 21:41   ` John Harrison
2021-07-20 21:41     ` [Intel-gfx] " John Harrison
2021-07-16 20:17 ` [PATCH 43/51] drm/i915/guc: Support request cancellation Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-22 19:56   ` Daniele Ceraolo Spurio
2021-07-22 19:56     ` [Intel-gfx] " Daniele Ceraolo Spurio
2021-07-22 20:13     ` Matthew Brost
2021-07-22 20:13       ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 44/51] drm/i915/selftest: Better error reporting from hangcheck selftest Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 20:13   ` Matthew Brost
2021-07-16 20:13     ` Matthew Brost
2021-07-16 20:17 ` [PATCH 45/51] drm/i915/selftest: Fix workarounds selftest for GuC submission Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-20 17:14   ` Matthew Brost
2021-07-20 17:14     ` Matthew Brost
2021-07-16 20:17 ` [PATCH 46/51] drm/i915/selftest: Fix MOCS " Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 23:57   ` Matthew Brost
2021-07-16 23:57     ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 47/51] drm/i915/selftest: Increase some timeouts in live_requests Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-20 21:46   ` John Harrison
2021-07-20 21:46     ` [Intel-gfx] " John Harrison
2021-07-22  8:13   ` Tvrtko Ursulin
2021-07-22  8:13     ` Tvrtko Ursulin
2021-07-16 20:17 ` [PATCH 48/51] drm/i915/selftest: Fix hangcheck self test for GuC submission Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 23:43   ` Matthew Brost
2021-07-16 23:43     ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 49/51] drm/i915/selftest: Bump selftest timeouts for hangcheck Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-16 22:23   ` Matthew Brost
2021-07-16 22:23     ` [Intel-gfx] " Matthew Brost
2021-07-22  8:17   ` Tvrtko Ursulin
2021-07-22  8:17     ` Tvrtko Ursulin
2021-07-16 20:17 ` [PATCH 50/51] drm/i915/guc: Implement GuC priority management Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-22 20:26   ` Daniele Ceraolo Spurio
2021-07-22 20:26     ` [Intel-gfx] " Daniele Ceraolo Spurio
2021-07-22 21:38     ` Matthew Brost
2021-07-22 21:38       ` [Intel-gfx] " Matthew Brost
2021-07-22 21:50       ` Daniele Ceraolo Spurio
2021-07-22 21:50         ` [Intel-gfx] " Daniele Ceraolo Spurio
2021-07-22 21:55         ` Matthew Brost
2021-07-22 21:55           ` [Intel-gfx] " Matthew Brost
2021-07-16 20:17 ` [PATCH 51/51] drm/i915/guc: Unblock GuC submission on Gen11+ Matthew Brost
2021-07-16 20:17   ` [Intel-gfx] " Matthew Brost
2021-07-17  1:10 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for GuC submission support (rev3) Patchwork
2021-07-19  9:06 ` [Intel-gfx] [PATCH 00/51] GuC submission support Tvrtko Ursulin
2021-07-19  9:06   ` Tvrtko Ursulin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.