All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Brost <matthew.brost@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 19/46] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids
Date: Tue, 10 Aug 2021 11:12:54 +0200	[thread overview]
Message-ID: <YRJDFntWKZ9sjrPz@phenom.ffwll.local> (raw)
In-Reply-To: <20210809190312.GA123783@DUT151-ICLU.fm.intel.com>

On Mon, Aug 09, 2021 at 07:03:12PM +0000, Matthew Brost wrote:
> On Mon, Aug 09, 2021 at 05:31:38PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:16PM -0700, Matthew Brost wrote:
> > > Assign contexts in parent-child relationship consecutive guc_ids. This
> > > is accomplished by partitioning guc_id space between ones that need to
> > > be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
> > > available guc_ids). The consecutive search is implemented via the bitmap
> > > API.
> > > 
> > > This is a precursor to the full GuC multi-lrc implementation but aligns
> > > to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
> > > when using the GuC multi-lrc interface.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_context.h       |   6 +
> > >  drivers/gpu/drm/i915/gt/intel_reset.c         |   3 +-
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   7 +-
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 222 ++++++++++++------
> > >  .../i915/gt/uc/intel_guc_submission_types.h   |  10 +
> > >  5 files changed, 179 insertions(+), 69 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> > > index c208691fc87d..7ce3b3d2edb7 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > > @@ -54,6 +54,12 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
> > >  	return !!ce->guc_number_children;
> > >  }
> > >  
> > > +static inline struct intel_context *
> > > +intel_context_to_parent(struct intel_context *ce)
> > > +{
> > > +	return intel_context_is_child(ce) ? ce->parent : ce;
> > > +}
> > > +
> > >  void intel_context_bind_parent_child(struct intel_context *parent,
> > >  				     struct intel_context *child);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > > index ea763138197f..c3d4baa1b2b8 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > > @@ -849,6 +849,7 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
> > >  
> > >  static void nop_submit_request(struct i915_request *request)
> > >  {
> > > +	struct intel_context *ce = intel_context_to_parent(request->context);
> > >  	RQ_TRACE(request, "-EIO\n");
> > >  
> > >  	/*
> > > @@ -857,7 +858,7 @@ static void nop_submit_request(struct i915_request *request)
> > >  	 * this for now.
> > >  	 */
> > >  	if (intel_engine_uses_guc(request->engine))
> > > -		intel_guc_decr_num_rq_not_ready(request->context);
> > > +		intel_guc_decr_num_rq_not_ready(ce);
> > >  
> > >  	request = i915_request_mark_eio(request);
> > >  	if (request) {
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > index c0c60ccabfa4..30a0f364db8f 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > @@ -24,6 +24,7 @@ struct __guc_ads_blob;
> > >  
> > >  enum {
> > >  	GUC_SUBMIT_ENGINE_SINGLE_LRC,
> > > +	GUC_SUBMIT_ENGINE_MULTI_LRC,
> > >  	GUC_SUBMIT_ENGINE_MAX
> > >  };
> > >  
> > > @@ -59,8 +60,10 @@ struct intel_guc {
> > >  	struct ida guc_ids;
> > >  	u32 num_guc_ids;
> > >  	u32 max_guc_ids;
> > > -	struct list_head guc_id_list_no_ref;
> > > -	struct list_head guc_id_list_unpinned;
> > > +	unsigned long *guc_ids_bitmap;
> > > +#define MAX_GUC_ID_ORDER	(order_base_2(MAX_ENGINE_INSTANCE + 1))
> > > +	struct list_head guc_id_list_no_ref[MAX_GUC_ID_ORDER + 1];
> > > +	struct list_head guc_id_list_unpinned[MAX_GUC_ID_ORDER + 1];
> > 
> > Random new global lists definitely need kerneldoc about what is on them,
> > how they're linked, what their lifetime rules are and what locks we're
> > holding.
> > 
> > Leaving this all to reviews to figure out, and worse, future readers of
> > your code, is not kind.
> >
> 
> Got it.

I forgot the usual disclaimer: I know that the current code isn't
following this at all. But wee have to start somewhere :-/

> > >  	spinlock_t destroy_lock;	/* protects list / worker */
> > >  	struct list_head destroyed_contexts;
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index f23dd716723f..afb9b4bb8971 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -169,6 +169,15 @@ static void clr_guc_ids_exhausted(struct guc_submit_engine *gse)
> > >  	clear_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
> > >  }
> > >  
> > > +/*
> > > + * We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous
> > 
> > I think it'd be good to put down the reason here for why. Is this a
> > requirement of the guc interface, or just an artifact of our current
> > implementation? In the latter case also explain what exactly the
> > contstraint is (but honestly I can't think of much reasons for that)
> 
> Multi-lrc guc_ids need to be sequential between the parent and children
> - this is a requirement of the GuC submission interface. Can explicitly
> state that here.

Ah yes that makes sense to document. That also gives us a very clear hint
what probably the first step to fix any multi-lrc exhaustion issues are:
We need to scan the entire guc_id space for conseutively free spots are.
Not sure xarray has support for that, but I know the guy who wrote it so
the answer is at most a mail away :-)

This also means I'm a lot less worried about potentially walling ourselves
into a corner with multi-lrc guc_id exhaustion. Might be good to note that
in the commit message too.
-Daniel

> 
> Matt
> 
> > -Daniel
> > 
> > > + * and a different allocation algorithm is used (bitmap vs. ida). We believe the
> > > + * number of multi-lrc contexts in use should be low and 1/16 should be
> > > + * sufficient. Minimum of 32 ids for multi-lrc.
> > > + */
> > > +#define NUMBER_MULTI_LRC_GUC_ID(guc) \
> > > +	((guc)->num_guc_ids / 16 > 32 ? (guc)->num_guc_ids / 16 : 32)
> > > +
> > >  /*
> > >   * Below is a set of functions which control the GuC scheduling state which do
> > >   * not require a lock as all state transitions are mutually exclusive. i.e. It
> > > @@ -405,16 +414,10 @@ static inline void decr_context_blocked(struct intel_context *ce)
> > >  	ce->guc_state.sched_state -= SCHED_STATE_BLOCKED;
> > >  }
> > >  
> > > -static inline struct intel_context *
> > > -to_parent(struct intel_context *ce)
> > > -{
> > > -	return intel_context_is_child(ce) ? ce->parent : ce;
> > > -}
> > > -
> > >  static inline struct intel_context *
> > >  request_to_scheduling_context(struct i915_request *rq)
> > >  {
> > > -	return to_parent(rq->context);
> > > +	return intel_context_to_parent(rq->context);
> > >  }
> > >  
> > >  static inline bool context_guc_id_invalid(struct intel_context *ce)
> > > @@ -1436,7 +1439,7 @@ static void destroy_worker_func(struct work_struct *w);
> > >   */
> > >  int intel_guc_submission_init(struct intel_guc *guc)
> > >  {
> > > -	int ret;
> > > +	int ret, i;
> > >  
> > >  	if (guc_submission_initialized(guc))
> > >  		return 0;
> > > @@ -1448,9 +1451,13 @@ int intel_guc_submission_init(struct intel_guc *guc)
> > >  	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > >  
> > >  	spin_lock_init(&guc->contexts_lock);
> > > -	INIT_LIST_HEAD(&guc->guc_id_list_no_ref);
> > > -	INIT_LIST_HEAD(&guc->guc_id_list_unpinned);
> > > +	for (i = 0; i < MAX_GUC_ID_ORDER + 1; ++i) {
> > > +		INIT_LIST_HEAD(&guc->guc_id_list_no_ref[i]);
> > > +		INIT_LIST_HEAD(&guc->guc_id_list_unpinned[i]);
> > > +	}
> > >  	ida_init(&guc->guc_ids);
> > > +	guc->guc_ids_bitmap =
> > > +		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
> > >  
> > >  	spin_lock_init(&guc->destroy_lock);
> > >  
> > > @@ -1476,6 +1483,8 @@ void intel_guc_submission_fini(struct intel_guc *guc)
> > >  
> > >  		i915_sched_engine_put(sched_engine);
> > >  	}
> > > +
> > > +	bitmap_free(guc->guc_ids_bitmap);
> > >  }
> > >  
> > >  static inline void queue_request(struct i915_sched_engine *sched_engine,
> > > @@ -1499,11 +1508,13 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
> > >  static bool too_many_guc_ids_not_ready(struct guc_submit_engine *gse,
> > >  				       struct intel_context *ce)
> > >  {
> > > -	u32 available_guc_ids, guc_ids_consumed;
> > >  	struct intel_guc *guc = gse->sched_engine.private_data;
> > > +	u32 available_guc_ids = intel_context_is_parent(ce) ?
> > > +		NUMBER_MULTI_LRC_GUC_ID(guc) :
> > > +		guc->num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc);
> > > +	u32 guc_ids_consumed = atomic_read(&gse->num_guc_ids_not_ready);
> > >  
> > > -	available_guc_ids = guc->num_guc_ids;
> > > -	guc_ids_consumed = atomic_read(&gse->num_guc_ids_not_ready);
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  
> > >  	if (TOO_MANY_GUC_IDS_NOT_READY(available_guc_ids, guc_ids_consumed)) {
> > >  		set_and_update_guc_ids_exhausted(gse);
> > > @@ -1517,17 +1528,26 @@ static void incr_num_rq_not_ready(struct intel_context *ce)
> > >  {
> > >  	struct guc_submit_engine *gse = ce_to_gse(ce);
> > >  
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > +	GEM_BUG_ON(!intel_context_is_parent(ce) &&
> > > +		   ce->guc_number_children);
> > > +
> > >  	if (!atomic_fetch_add(1, &ce->guc_num_rq_not_ready))
> > > -		atomic_inc(&gse->num_guc_ids_not_ready);
> > > +		atomic_add(ce->guc_number_children + 1,
> > > +			   &gse->num_guc_ids_not_ready);
> > >  }
> > >  
> > >  void intel_guc_decr_num_rq_not_ready(struct intel_context *ce)
> > >  {
> > >  	struct guc_submit_engine *gse = ce_to_gse(ce);
> > >  
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > +
> > >  	if (atomic_fetch_add(-1, &ce->guc_num_rq_not_ready) == 1) {
> > >  		GEM_BUG_ON(!atomic_read(&gse->num_guc_ids_not_ready));
> > > -		atomic_dec(&gse->num_guc_ids_not_ready);
> > > +
> > > +		atomic_sub(ce->guc_number_children + 1,
> > > +			   &gse->num_guc_ids_not_ready);
> > >  	}
> > >  }
> > >  
> > > @@ -1579,20 +1599,42 @@ static void guc_submit_request(struct i915_request *rq)
> > >  
> > >  	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > >  
> > > -	intel_guc_decr_num_rq_not_ready(rq->context);
> > > +	intel_guc_decr_num_rq_not_ready(request_to_scheduling_context(rq));
> > >  }
> > >  
> > > -static int new_guc_id(struct intel_guc *guc)
> > > +static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > >  {
> > > -	return ida_simple_get(&guc->guc_ids, 0,
> > > -			      guc->num_guc_ids, GFP_KERNEL |
> > > -			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > > +	int ret;
> > > +
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > +
> > > +	if (intel_context_is_parent(ce))
> > > +		ret = bitmap_find_free_region(guc->guc_ids_bitmap,
> > > +					      NUMBER_MULTI_LRC_GUC_ID(guc),
> > > +					      order_base_2(ce->guc_number_children
> > > +							   + 1));
> > > +	else
> > > +		ret = ida_simple_get(&guc->guc_ids,
> > > +				     NUMBER_MULTI_LRC_GUC_ID(guc),
> > > +				     guc->num_guc_ids, GFP_KERNEL |
> > > +				     __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > > +	if (unlikely(ret < 0))
> > > +		return ret;
> > > +
> > > +	ce->guc_id = ret;
> > > +	return 0;
> > >  }
> > >  
> > >  static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > >  {
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  	if (!context_guc_id_invalid(ce)) {
> > > -		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> > > +		if (intel_context_is_parent(ce))
> > > +			bitmap_release_region(guc->guc_ids_bitmap, ce->guc_id,
> > > +					      order_base_2(ce->guc_number_children
> > > +							   + 1));
> > > +		else
> > > +			ida_simple_remove(&guc->guc_ids, ce->guc_id);
> > >  		clr_lrc_desc_registered(guc, ce->guc_id);
> > >  		set_context_guc_id_invalid(ce);
> > >  	}
> > > @@ -1604,6 +1646,8 @@ static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > >  {
> > >  	unsigned long flags;
> > >  
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > +
> > >  	spin_lock_irqsave(&guc->contexts_lock, flags);
> > >  	__release_guc_id(guc, ce);
> > >  	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > > @@ -1618,54 +1662,93 @@ static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > >   * schedule disable H2G + a deregister H2G.
> > >   */
> > >  static struct list_head *get_guc_id_list(struct intel_guc *guc,
> > > +					 u8 number_children,
> > >  					 bool unpinned)
> > >  {
> > > +	GEM_BUG_ON(order_base_2(number_children + 1) > MAX_GUC_ID_ORDER);
> > > +
> > >  	if (unpinned)
> > > -		return &guc->guc_id_list_unpinned;
> > > +		return &guc->guc_id_list_unpinned[order_base_2(number_children + 1)];
> > >  	else
> > > -		return &guc->guc_id_list_no_ref;
> > > +		return &guc->guc_id_list_no_ref[order_base_2(number_children + 1)];
> > >  }
> > >  
> > > -static int steal_guc_id(struct intel_guc *guc, bool unpinned)
> > > +static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce,
> > > +			bool unpinned)
> > >  {
> > > -	struct intel_context *ce;
> > > -	int guc_id;
> > > -	struct list_head *guc_id_list = get_guc_id_list(guc, unpinned);
> > > +	struct intel_context *cn;
> > > +	u8 number_children = ce->guc_number_children;
> > >  
> > >  	lockdep_assert_held(&guc->contexts_lock);
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  
> > > -	if (!list_empty(guc_id_list)) {
> > > -		ce = list_first_entry(guc_id_list,
> > > -				      struct intel_context,
> > > -				      guc_id_link);
> > > +	do {
> > > +		struct list_head *guc_id_list =
> > > +			get_guc_id_list(guc, number_children, unpinned);
> > >  
> > > -		/* Ensure context getting stolen in expected state */
> > > -		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > > -		GEM_BUG_ON(context_guc_id_invalid(ce));
> > > -		GEM_BUG_ON(context_guc_id_stolen(ce));
> > > +		if (!list_empty(guc_id_list)) {
> > > +			u8 cn_o2, ce_o2 =
> > > +				order_base_2(ce->guc_number_children + 1);
> > >  
> > > -		list_del_init(&ce->guc_id_link);
> > > -		guc_id = ce->guc_id;
> > > -		clr_context_registered(ce);
> > > +			cn = list_first_entry(guc_id_list,
> > > +					      struct intel_context,
> > > +					      guc_id_link);
> > > +			cn_o2 = order_base_2(cn->guc_number_children + 1);
> > > +
> > > +			/*
> > > +			 * Corner case where a multi-lrc context steals a guc_id
> > > +			 * from another context that has more guc_id that itself.
> > > +			 */
> > > +			if (cn_o2 != ce_o2) {
> > > +				bitmap_release_region(guc->guc_ids_bitmap,
> > > +						      cn->guc_id,
> > > +						      cn_o2);
> > > +				bitmap_allocate_region(guc->guc_ids_bitmap,
> > > +						       ce->guc_id,
> > > +						       ce_o2);
> > > +			}
> > > +
> > > +			/* Ensure context getting stolen in expected state */
> > > +			GEM_BUG_ON(atomic_read(&cn->guc_id_ref));
> > > +			GEM_BUG_ON(context_guc_id_invalid(cn));
> > > +			GEM_BUG_ON(context_guc_id_stolen(cn));
> > > +			GEM_BUG_ON(ce_to_gse(ce) != ce_to_gse(cn));
> > > +
> > > +			list_del_init(&cn->guc_id_link);
> > > +			ce->guc_id = cn->guc_id;
> > > +
> > > +			/*
> > > +			 * If stealing from the pinned list, defer invalidating
> > > +			 * the guc_id until the retire workqueue processes this
> > > +			 * context.
> > > +			 */
> > > +			clr_context_registered(cn);
> > > +			if (!unpinned) {
> > > +				GEM_BUG_ON(ce_to_gse(cn)->stalled_context);
> > > +				ce_to_gse(cn)->stalled_context =
> > > +					intel_context_get(cn);
> > > +				set_context_guc_id_stolen(cn);
> > > +			} else {
> > > +				set_context_guc_id_invalid(cn);
> > > +			}
> > > +
> > > +			return 0;
> > > +		}
> > >  
> > >  		/*
> > > -		 * If stealing from the pinned list, defer invalidating
> > > -		 * the guc_id until the retire workqueue processes this
> > > -		 * context.
> > > +		 * When using multi-lrc we search the guc_id_lists with the
> > > +		 * least amount of guc_ids required first but will consume a
> > > +		 * block larger of guc_ids if necessary. 2x the children always
> > > +		 * moves you two the next list.
> > >  		 */
> > > -		if (!unpinned) {
> > > -			GEM_BUG_ON(ce_to_gse(ce)->stalled_context);
> > > +		if (!number_children ||
> > > +		    order_base_2(number_children + 1) == MAX_GUC_ID_ORDER)
> > > +			break;
> > >  
> > > -			ce_to_gse(ce)->stalled_context = intel_context_get(ce);
> > > -			set_context_guc_id_stolen(ce);
> > > -		} else {
> > > -			set_context_guc_id_invalid(ce);
> > > -		}
> > > +		number_children *= 2;
> > > +	} while (true);
> > >  
> > > -		return guc_id;
> > > -	} else {
> > > -		return -EAGAIN;
> > > -	}
> > > +	return -EAGAIN;
> > >  }
> > >  
> > >  enum {	/* Return values for pin_guc_id / assign_guc_id */
> > > @@ -1674,17 +1757,18 @@ enum {	/* Return values for pin_guc_id / assign_guc_id */
> > >  	NEW_GUC_ID_ENABLED	= 2,
> > >  };
> > >  
> > > -static int assign_guc_id(struct intel_guc *guc, u16 *out, bool tasklet)
> > > +static int assign_guc_id(struct intel_guc *guc, struct intel_context *ce,
> > > +			 bool tasklet)
> > >  {
> > >  	int ret;
> > >  
> > >  	lockdep_assert_held(&guc->contexts_lock);
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  
> > > -	ret = new_guc_id(guc);
> > > +	ret = new_guc_id(guc, ce);
> > >  	if (unlikely(ret < 0)) {
> > > -		ret = steal_guc_id(guc, true);
> > > -		if (ret >= 0) {
> > > -			*out = ret;
> > > +		ret = steal_guc_id(guc, ce, true);
> > > +		if (!ret) {
> > >  			ret = NEW_GUC_ID_DISABLED;
> > >  		} else if (ret < 0 && tasklet) {
> > >  			/*
> > > @@ -1692,15 +1776,18 @@ static int assign_guc_id(struct intel_guc *guc, u16 *out, bool tasklet)
> > >  			 * enabled if guc_ids are exhausted and we are submitting
> > >  			 * from the tasklet.
> > >  			 */
> > > -			ret = steal_guc_id(guc, false);
> > > -			if (ret >= 0) {
> > > -				*out = ret;
> > > +			ret = steal_guc_id(guc, ce, false);
> > > +			if (!ret)
> > >  				ret = NEW_GUC_ID_ENABLED;
> > > -			}
> > >  		}
> > > -	} else {
> > > -		*out = ret;
> > > -		ret = SAME_GUC_ID;
> > > +	}
> > > +
> > > +	if (!(ret < 0) && intel_context_is_parent(ce)) {
> > > +		struct intel_context *child;
> > > +		int i = 1;
> > > +
> > > +		for_each_child(ce, child)
> > > +			child->guc_id = ce->guc_id + i++;
> > >  	}
> > >  
> > >  	return ret;
> > > @@ -1713,6 +1800,7 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
> > >  	int ret = 0;
> > >  	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> > >  
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > >  
> > >  try_again:
> > > @@ -1724,7 +1812,7 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
> > >  	}
> > >  
> > >  	if (context_guc_id_invalid(ce)) {
> > > -		ret = assign_guc_id(guc, &ce->guc_id, tasklet);
> > > +		ret = assign_guc_id(guc, ce, tasklet);
> > >  		if (unlikely(ret < 0))
> > >  			goto out_unlock;
> > >  	}
> > > @@ -1770,6 +1858,7 @@ static void unpin_guc_id(struct intel_guc *guc,
> > >  	unsigned long flags;
> > >  
> > >  	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  
> > >  	if (unlikely(context_guc_id_invalid(ce)))
> > >  		return;
> > > @@ -1781,7 +1870,8 @@ static void unpin_guc_id(struct intel_guc *guc,
> > >  
> > >  	if (!context_guc_id_invalid(ce) && !context_guc_id_stolen(ce) &&
> > >  	    !atomic_read(&ce->guc_id_ref)) {
> > > -		struct list_head *head = get_guc_id_list(guc, unpinned);
> > > +		struct list_head *head =
> > > +			get_guc_id_list(guc, ce->guc_number_children, unpinned);
> > >  
> > >  		list_add_tail(&ce->guc_id_link, head);
> > >  	}
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
> > > index 7069b7248f55..a5933e07bdd2 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
> > > @@ -22,6 +22,16 @@ struct guc_virtual_engine {
> > >  /*
> > >   * Object which encapsulates the globally operated on i915_sched_engine +
> > >   * the GuC submission state machine described in intel_guc_submission.c.
> > > + *
> > > + * Currently we have two instances of these per GuC. One for single-lrc and one
> > > + * for multi-lrc submission. We split these into two submission engines as they
> > > + * can operate in parallel allowing a blocking condition on one not to affect
> > > + * the other. i.e. guc_ids are statically allocated between these two submission
> > > + * modes. One mode may have guc_ids exhausted which requires blocking while the
> > > + * other has plenty of guc_ids and can make forward progres.
> > > + *
> > > + * In the future if different submission use cases arise we can simply
> > > + * instantiate another of these objects and assign it to the context.
> > >   */
> > >  struct guc_submit_engine {
> > >  	struct i915_sched_engine sched_engine;
> > > -- 
> > > 2.28.0
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Matthew Brost <matthew.brost@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>,
	intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 19/46] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids
Date: Tue, 10 Aug 2021 11:12:54 +0200	[thread overview]
Message-ID: <YRJDFntWKZ9sjrPz@phenom.ffwll.local> (raw)
In-Reply-To: <20210809190312.GA123783@DUT151-ICLU.fm.intel.com>

On Mon, Aug 09, 2021 at 07:03:12PM +0000, Matthew Brost wrote:
> On Mon, Aug 09, 2021 at 05:31:38PM +0200, Daniel Vetter wrote:
> > On Tue, Aug 03, 2021 at 03:29:16PM -0700, Matthew Brost wrote:
> > > Assign contexts in parent-child relationship consecutive guc_ids. This
> > > is accomplished by partitioning guc_id space between ones that need to
> > > be consecutive (1/16 available guc_ids) and ones that do not (15/16 of
> > > available guc_ids). The consecutive search is implemented via the bitmap
> > > API.
> > > 
> > > This is a precursor to the full GuC multi-lrc implementation but aligns
> > > to how GuC mutli-lrc interface is defined - guc_ids must be consecutive
> > > when using the GuC multi-lrc interface.
> > > 
> > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/gt/intel_context.h       |   6 +
> > >  drivers/gpu/drm/i915/gt/intel_reset.c         |   3 +-
> > >  drivers/gpu/drm/i915/gt/uc/intel_guc.h        |   7 +-
> > >  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 222 ++++++++++++------
> > >  .../i915/gt/uc/intel_guc_submission_types.h   |  10 +
> > >  5 files changed, 179 insertions(+), 69 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
> > > index c208691fc87d..7ce3b3d2edb7 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > > @@ -54,6 +54,12 @@ static inline bool intel_context_is_parent(struct intel_context *ce)
> > >  	return !!ce->guc_number_children;
> > >  }
> > >  
> > > +static inline struct intel_context *
> > > +intel_context_to_parent(struct intel_context *ce)
> > > +{
> > > +	return intel_context_is_child(ce) ? ce->parent : ce;
> > > +}
> > > +
> > >  void intel_context_bind_parent_child(struct intel_context *parent,
> > >  				     struct intel_context *child);
> > >  
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> > > index ea763138197f..c3d4baa1b2b8 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > > @@ -849,6 +849,7 @@ static void reset_finish(struct intel_gt *gt, intel_engine_mask_t awake)
> > >  
> > >  static void nop_submit_request(struct i915_request *request)
> > >  {
> > > +	struct intel_context *ce = intel_context_to_parent(request->context);
> > >  	RQ_TRACE(request, "-EIO\n");
> > >  
> > >  	/*
> > > @@ -857,7 +858,7 @@ static void nop_submit_request(struct i915_request *request)
> > >  	 * this for now.
> > >  	 */
> > >  	if (intel_engine_uses_guc(request->engine))
> > > -		intel_guc_decr_num_rq_not_ready(request->context);
> > > +		intel_guc_decr_num_rq_not_ready(ce);
> > >  
> > >  	request = i915_request_mark_eio(request);
> > >  	if (request) {
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > index c0c60ccabfa4..30a0f364db8f 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h
> > > @@ -24,6 +24,7 @@ struct __guc_ads_blob;
> > >  
> > >  enum {
> > >  	GUC_SUBMIT_ENGINE_SINGLE_LRC,
> > > +	GUC_SUBMIT_ENGINE_MULTI_LRC,
> > >  	GUC_SUBMIT_ENGINE_MAX
> > >  };
> > >  
> > > @@ -59,8 +60,10 @@ struct intel_guc {
> > >  	struct ida guc_ids;
> > >  	u32 num_guc_ids;
> > >  	u32 max_guc_ids;
> > > -	struct list_head guc_id_list_no_ref;
> > > -	struct list_head guc_id_list_unpinned;
> > > +	unsigned long *guc_ids_bitmap;
> > > +#define MAX_GUC_ID_ORDER	(order_base_2(MAX_ENGINE_INSTANCE + 1))
> > > +	struct list_head guc_id_list_no_ref[MAX_GUC_ID_ORDER + 1];
> > > +	struct list_head guc_id_list_unpinned[MAX_GUC_ID_ORDER + 1];
> > 
> > Random new global lists definitely need kerneldoc about what is on them,
> > how they're linked, what their lifetime rules are and what locks we're
> > holding.
> > 
> > Leaving this all to reviews to figure out, and worse, future readers of
> > your code, is not kind.
> >
> 
> Got it.

I forgot the usual disclaimer: I know that the current code isn't
following this at all. But wee have to start somewhere :-/

> > >  	spinlock_t destroy_lock;	/* protects list / worker */
> > >  	struct list_head destroyed_contexts;
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > index f23dd716723f..afb9b4bb8971 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
> > > @@ -169,6 +169,15 @@ static void clr_guc_ids_exhausted(struct guc_submit_engine *gse)
> > >  	clear_bit(GSE_STATE_GUC_IDS_EXHAUSTED, &gse->flags);
> > >  }
> > >  
> > > +/*
> > > + * We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous
> > 
> > I think it'd be good to put down the reason here for why. Is this a
> > requirement of the guc interface, or just an artifact of our current
> > implementation? In the latter case also explain what exactly the
> > contstraint is (but honestly I can't think of much reasons for that)
> 
> Multi-lrc guc_ids need to be sequential between the parent and children
> - this is a requirement of the GuC submission interface. Can explicitly
> state that here.

Ah yes that makes sense to document. That also gives us a very clear hint
what probably the first step to fix any multi-lrc exhaustion issues are:
We need to scan the entire guc_id space for conseutively free spots are.
Not sure xarray has support for that, but I know the guy who wrote it so
the answer is at most a mail away :-)

This also means I'm a lot less worried about potentially walling ourselves
into a corner with multi-lrc guc_id exhaustion. Might be good to note that
in the commit message too.
-Daniel

> 
> Matt
> 
> > -Daniel
> > 
> > > + * and a different allocation algorithm is used (bitmap vs. ida). We believe the
> > > + * number of multi-lrc contexts in use should be low and 1/16 should be
> > > + * sufficient. Minimum of 32 ids for multi-lrc.
> > > + */
> > > +#define NUMBER_MULTI_LRC_GUC_ID(guc) \
> > > +	((guc)->num_guc_ids / 16 > 32 ? (guc)->num_guc_ids / 16 : 32)
> > > +
> > >  /*
> > >   * Below is a set of functions which control the GuC scheduling state which do
> > >   * not require a lock as all state transitions are mutually exclusive. i.e. It
> > > @@ -405,16 +414,10 @@ static inline void decr_context_blocked(struct intel_context *ce)
> > >  	ce->guc_state.sched_state -= SCHED_STATE_BLOCKED;
> > >  }
> > >  
> > > -static inline struct intel_context *
> > > -to_parent(struct intel_context *ce)
> > > -{
> > > -	return intel_context_is_child(ce) ? ce->parent : ce;
> > > -}
> > > -
> > >  static inline struct intel_context *
> > >  request_to_scheduling_context(struct i915_request *rq)
> > >  {
> > > -	return to_parent(rq->context);
> > > +	return intel_context_to_parent(rq->context);
> > >  }
> > >  
> > >  static inline bool context_guc_id_invalid(struct intel_context *ce)
> > > @@ -1436,7 +1439,7 @@ static void destroy_worker_func(struct work_struct *w);
> > >   */
> > >  int intel_guc_submission_init(struct intel_guc *guc)
> > >  {
> > > -	int ret;
> > > +	int ret, i;
> > >  
> > >  	if (guc_submission_initialized(guc))
> > >  		return 0;
> > > @@ -1448,9 +1451,13 @@ int intel_guc_submission_init(struct intel_guc *guc)
> > >  	xa_init_flags(&guc->context_lookup, XA_FLAGS_LOCK_IRQ);
> > >  
> > >  	spin_lock_init(&guc->contexts_lock);
> > > -	INIT_LIST_HEAD(&guc->guc_id_list_no_ref);
> > > -	INIT_LIST_HEAD(&guc->guc_id_list_unpinned);
> > > +	for (i = 0; i < MAX_GUC_ID_ORDER + 1; ++i) {
> > > +		INIT_LIST_HEAD(&guc->guc_id_list_no_ref[i]);
> > > +		INIT_LIST_HEAD(&guc->guc_id_list_unpinned[i]);
> > > +	}
> > >  	ida_init(&guc->guc_ids);
> > > +	guc->guc_ids_bitmap =
> > > +		bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID(guc), GFP_KERNEL);
> > >  
> > >  	spin_lock_init(&guc->destroy_lock);
> > >  
> > > @@ -1476,6 +1483,8 @@ void intel_guc_submission_fini(struct intel_guc *guc)
> > >  
> > >  		i915_sched_engine_put(sched_engine);
> > >  	}
> > > +
> > > +	bitmap_free(guc->guc_ids_bitmap);
> > >  }
> > >  
> > >  static inline void queue_request(struct i915_sched_engine *sched_engine,
> > > @@ -1499,11 +1508,13 @@ static inline void queue_request(struct i915_sched_engine *sched_engine,
> > >  static bool too_many_guc_ids_not_ready(struct guc_submit_engine *gse,
> > >  				       struct intel_context *ce)
> > >  {
> > > -	u32 available_guc_ids, guc_ids_consumed;
> > >  	struct intel_guc *guc = gse->sched_engine.private_data;
> > > +	u32 available_guc_ids = intel_context_is_parent(ce) ?
> > > +		NUMBER_MULTI_LRC_GUC_ID(guc) :
> > > +		guc->num_guc_ids - NUMBER_MULTI_LRC_GUC_ID(guc);
> > > +	u32 guc_ids_consumed = atomic_read(&gse->num_guc_ids_not_ready);
> > >  
> > > -	available_guc_ids = guc->num_guc_ids;
> > > -	guc_ids_consumed = atomic_read(&gse->num_guc_ids_not_ready);
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  
> > >  	if (TOO_MANY_GUC_IDS_NOT_READY(available_guc_ids, guc_ids_consumed)) {
> > >  		set_and_update_guc_ids_exhausted(gse);
> > > @@ -1517,17 +1528,26 @@ static void incr_num_rq_not_ready(struct intel_context *ce)
> > >  {
> > >  	struct guc_submit_engine *gse = ce_to_gse(ce);
> > >  
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > +	GEM_BUG_ON(!intel_context_is_parent(ce) &&
> > > +		   ce->guc_number_children);
> > > +
> > >  	if (!atomic_fetch_add(1, &ce->guc_num_rq_not_ready))
> > > -		atomic_inc(&gse->num_guc_ids_not_ready);
> > > +		atomic_add(ce->guc_number_children + 1,
> > > +			   &gse->num_guc_ids_not_ready);
> > >  }
> > >  
> > >  void intel_guc_decr_num_rq_not_ready(struct intel_context *ce)
> > >  {
> > >  	struct guc_submit_engine *gse = ce_to_gse(ce);
> > >  
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > +
> > >  	if (atomic_fetch_add(-1, &ce->guc_num_rq_not_ready) == 1) {
> > >  		GEM_BUG_ON(!atomic_read(&gse->num_guc_ids_not_ready));
> > > -		atomic_dec(&gse->num_guc_ids_not_ready);
> > > +
> > > +		atomic_sub(ce->guc_number_children + 1,
> > > +			   &gse->num_guc_ids_not_ready);
> > >  	}
> > >  }
> > >  
> > > @@ -1579,20 +1599,42 @@ static void guc_submit_request(struct i915_request *rq)
> > >  
> > >  	spin_unlock_irqrestore(&sched_engine->lock, flags);
> > >  
> > > -	intel_guc_decr_num_rq_not_ready(rq->context);
> > > +	intel_guc_decr_num_rq_not_ready(request_to_scheduling_context(rq));
> > >  }
> > >  
> > > -static int new_guc_id(struct intel_guc *guc)
> > > +static int new_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > >  {
> > > -	return ida_simple_get(&guc->guc_ids, 0,
> > > -			      guc->num_guc_ids, GFP_KERNEL |
> > > -			      __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > > +	int ret;
> > > +
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > +
> > > +	if (intel_context_is_parent(ce))
> > > +		ret = bitmap_find_free_region(guc->guc_ids_bitmap,
> > > +					      NUMBER_MULTI_LRC_GUC_ID(guc),
> > > +					      order_base_2(ce->guc_number_children
> > > +							   + 1));
> > > +	else
> > > +		ret = ida_simple_get(&guc->guc_ids,
> > > +				     NUMBER_MULTI_LRC_GUC_ID(guc),
> > > +				     guc->num_guc_ids, GFP_KERNEL |
> > > +				     __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
> > > +	if (unlikely(ret < 0))
> > > +		return ret;
> > > +
> > > +	ce->guc_id = ret;
> > > +	return 0;
> > >  }
> > >  
> > >  static void __release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > >  {
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  	if (!context_guc_id_invalid(ce)) {
> > > -		ida_simple_remove(&guc->guc_ids, ce->guc_id);
> > > +		if (intel_context_is_parent(ce))
> > > +			bitmap_release_region(guc->guc_ids_bitmap, ce->guc_id,
> > > +					      order_base_2(ce->guc_number_children
> > > +							   + 1));
> > > +		else
> > > +			ida_simple_remove(&guc->guc_ids, ce->guc_id);
> > >  		clr_lrc_desc_registered(guc, ce->guc_id);
> > >  		set_context_guc_id_invalid(ce);
> > >  	}
> > > @@ -1604,6 +1646,8 @@ static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > >  {
> > >  	unsigned long flags;
> > >  
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > > +
> > >  	spin_lock_irqsave(&guc->contexts_lock, flags);
> > >  	__release_guc_id(guc, ce);
> > >  	spin_unlock_irqrestore(&guc->contexts_lock, flags);
> > > @@ -1618,54 +1662,93 @@ static void release_guc_id(struct intel_guc *guc, struct intel_context *ce)
> > >   * schedule disable H2G + a deregister H2G.
> > >   */
> > >  static struct list_head *get_guc_id_list(struct intel_guc *guc,
> > > +					 u8 number_children,
> > >  					 bool unpinned)
> > >  {
> > > +	GEM_BUG_ON(order_base_2(number_children + 1) > MAX_GUC_ID_ORDER);
> > > +
> > >  	if (unpinned)
> > > -		return &guc->guc_id_list_unpinned;
> > > +		return &guc->guc_id_list_unpinned[order_base_2(number_children + 1)];
> > >  	else
> > > -		return &guc->guc_id_list_no_ref;
> > > +		return &guc->guc_id_list_no_ref[order_base_2(number_children + 1)];
> > >  }
> > >  
> > > -static int steal_guc_id(struct intel_guc *guc, bool unpinned)
> > > +static int steal_guc_id(struct intel_guc *guc, struct intel_context *ce,
> > > +			bool unpinned)
> > >  {
> > > -	struct intel_context *ce;
> > > -	int guc_id;
> > > -	struct list_head *guc_id_list = get_guc_id_list(guc, unpinned);
> > > +	struct intel_context *cn;
> > > +	u8 number_children = ce->guc_number_children;
> > >  
> > >  	lockdep_assert_held(&guc->contexts_lock);
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  
> > > -	if (!list_empty(guc_id_list)) {
> > > -		ce = list_first_entry(guc_id_list,
> > > -				      struct intel_context,
> > > -				      guc_id_link);
> > > +	do {
> > > +		struct list_head *guc_id_list =
> > > +			get_guc_id_list(guc, number_children, unpinned);
> > >  
> > > -		/* Ensure context getting stolen in expected state */
> > > -		GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > > -		GEM_BUG_ON(context_guc_id_invalid(ce));
> > > -		GEM_BUG_ON(context_guc_id_stolen(ce));
> > > +		if (!list_empty(guc_id_list)) {
> > > +			u8 cn_o2, ce_o2 =
> > > +				order_base_2(ce->guc_number_children + 1);
> > >  
> > > -		list_del_init(&ce->guc_id_link);
> > > -		guc_id = ce->guc_id;
> > > -		clr_context_registered(ce);
> > > +			cn = list_first_entry(guc_id_list,
> > > +					      struct intel_context,
> > > +					      guc_id_link);
> > > +			cn_o2 = order_base_2(cn->guc_number_children + 1);
> > > +
> > > +			/*
> > > +			 * Corner case where a multi-lrc context steals a guc_id
> > > +			 * from another context that has more guc_id that itself.
> > > +			 */
> > > +			if (cn_o2 != ce_o2) {
> > > +				bitmap_release_region(guc->guc_ids_bitmap,
> > > +						      cn->guc_id,
> > > +						      cn_o2);
> > > +				bitmap_allocate_region(guc->guc_ids_bitmap,
> > > +						       ce->guc_id,
> > > +						       ce_o2);
> > > +			}
> > > +
> > > +			/* Ensure context getting stolen in expected state */
> > > +			GEM_BUG_ON(atomic_read(&cn->guc_id_ref));
> > > +			GEM_BUG_ON(context_guc_id_invalid(cn));
> > > +			GEM_BUG_ON(context_guc_id_stolen(cn));
> > > +			GEM_BUG_ON(ce_to_gse(ce) != ce_to_gse(cn));
> > > +
> > > +			list_del_init(&cn->guc_id_link);
> > > +			ce->guc_id = cn->guc_id;
> > > +
> > > +			/*
> > > +			 * If stealing from the pinned list, defer invalidating
> > > +			 * the guc_id until the retire workqueue processes this
> > > +			 * context.
> > > +			 */
> > > +			clr_context_registered(cn);
> > > +			if (!unpinned) {
> > > +				GEM_BUG_ON(ce_to_gse(cn)->stalled_context);
> > > +				ce_to_gse(cn)->stalled_context =
> > > +					intel_context_get(cn);
> > > +				set_context_guc_id_stolen(cn);
> > > +			} else {
> > > +				set_context_guc_id_invalid(cn);
> > > +			}
> > > +
> > > +			return 0;
> > > +		}
> > >  
> > >  		/*
> > > -		 * If stealing from the pinned list, defer invalidating
> > > -		 * the guc_id until the retire workqueue processes this
> > > -		 * context.
> > > +		 * When using multi-lrc we search the guc_id_lists with the
> > > +		 * least amount of guc_ids required first but will consume a
> > > +		 * block larger of guc_ids if necessary. 2x the children always
> > > +		 * moves you two the next list.
> > >  		 */
> > > -		if (!unpinned) {
> > > -			GEM_BUG_ON(ce_to_gse(ce)->stalled_context);
> > > +		if (!number_children ||
> > > +		    order_base_2(number_children + 1) == MAX_GUC_ID_ORDER)
> > > +			break;
> > >  
> > > -			ce_to_gse(ce)->stalled_context = intel_context_get(ce);
> > > -			set_context_guc_id_stolen(ce);
> > > -		} else {
> > > -			set_context_guc_id_invalid(ce);
> > > -		}
> > > +		number_children *= 2;
> > > +	} while (true);
> > >  
> > > -		return guc_id;
> > > -	} else {
> > > -		return -EAGAIN;
> > > -	}
> > > +	return -EAGAIN;
> > >  }
> > >  
> > >  enum {	/* Return values for pin_guc_id / assign_guc_id */
> > > @@ -1674,17 +1757,18 @@ enum {	/* Return values for pin_guc_id / assign_guc_id */
> > >  	NEW_GUC_ID_ENABLED	= 2,
> > >  };
> > >  
> > > -static int assign_guc_id(struct intel_guc *guc, u16 *out, bool tasklet)
> > > +static int assign_guc_id(struct intel_guc *guc, struct intel_context *ce,
> > > +			 bool tasklet)
> > >  {
> > >  	int ret;
> > >  
> > >  	lockdep_assert_held(&guc->contexts_lock);
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  
> > > -	ret = new_guc_id(guc);
> > > +	ret = new_guc_id(guc, ce);
> > >  	if (unlikely(ret < 0)) {
> > > -		ret = steal_guc_id(guc, true);
> > > -		if (ret >= 0) {
> > > -			*out = ret;
> > > +		ret = steal_guc_id(guc, ce, true);
> > > +		if (!ret) {
> > >  			ret = NEW_GUC_ID_DISABLED;
> > >  		} else if (ret < 0 && tasklet) {
> > >  			/*
> > > @@ -1692,15 +1776,18 @@ static int assign_guc_id(struct intel_guc *guc, u16 *out, bool tasklet)
> > >  			 * enabled if guc_ids are exhausted and we are submitting
> > >  			 * from the tasklet.
> > >  			 */
> > > -			ret = steal_guc_id(guc, false);
> > > -			if (ret >= 0) {
> > > -				*out = ret;
> > > +			ret = steal_guc_id(guc, ce, false);
> > > +			if (!ret)
> > >  				ret = NEW_GUC_ID_ENABLED;
> > > -			}
> > >  		}
> > > -	} else {
> > > -		*out = ret;
> > > -		ret = SAME_GUC_ID;
> > > +	}
> > > +
> > > +	if (!(ret < 0) && intel_context_is_parent(ce)) {
> > > +		struct intel_context *child;
> > > +		int i = 1;
> > > +
> > > +		for_each_child(ce, child)
> > > +			child->guc_id = ce->guc_id + i++;
> > >  	}
> > >  
> > >  	return ret;
> > > @@ -1713,6 +1800,7 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
> > >  	int ret = 0;
> > >  	unsigned long flags, tries = PIN_GUC_ID_TRIES;
> > >  
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  	GEM_BUG_ON(atomic_read(&ce->guc_id_ref));
> > >  
> > >  try_again:
> > > @@ -1724,7 +1812,7 @@ static int pin_guc_id(struct intel_guc *guc, struct intel_context *ce,
> > >  	}
> > >  
> > >  	if (context_guc_id_invalid(ce)) {
> > > -		ret = assign_guc_id(guc, &ce->guc_id, tasklet);
> > > +		ret = assign_guc_id(guc, ce, tasklet);
> > >  		if (unlikely(ret < 0))
> > >  			goto out_unlock;
> > >  	}
> > > @@ -1770,6 +1858,7 @@ static void unpin_guc_id(struct intel_guc *guc,
> > >  	unsigned long flags;
> > >  
> > >  	GEM_BUG_ON(atomic_read(&ce->guc_id_ref) < 0);
> > > +	GEM_BUG_ON(intel_context_is_child(ce));
> > >  
> > >  	if (unlikely(context_guc_id_invalid(ce)))
> > >  		return;
> > > @@ -1781,7 +1870,8 @@ static void unpin_guc_id(struct intel_guc *guc,
> > >  
> > >  	if (!context_guc_id_invalid(ce) && !context_guc_id_stolen(ce) &&
> > >  	    !atomic_read(&ce->guc_id_ref)) {
> > > -		struct list_head *head = get_guc_id_list(guc, unpinned);
> > > +		struct list_head *head =
> > > +			get_guc_id_list(guc, ce->guc_number_children, unpinned);
> > >  
> > >  		list_add_tail(&ce->guc_id_link, head);
> > >  	}
> > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
> > > index 7069b7248f55..a5933e07bdd2 100644
> > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
> > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission_types.h
> > > @@ -22,6 +22,16 @@ struct guc_virtual_engine {
> > >  /*
> > >   * Object which encapsulates the globally operated on i915_sched_engine +
> > >   * the GuC submission state machine described in intel_guc_submission.c.
> > > + *
> > > + * Currently we have two instances of these per GuC. One for single-lrc and one
> > > + * for multi-lrc submission. We split these into two submission engines as they
> > > + * can operate in parallel allowing a blocking condition on one not to affect
> > > + * the other. i.e. guc_ids are statically allocated between these two submission
> > > + * modes. One mode may have guc_ids exhausted which requires blocking while the
> > > + * other has plenty of guc_ids and can make forward progres.
> > > + *
> > > + * In the future if different submission use cases arise we can simply
> > > + * instantiate another of these objects and assign it to the context.
> > >   */
> > >  struct guc_submit_engine {
> > >  	struct i915_sched_engine sched_engine;
> > > -- 
> > > 2.28.0
> > > 
> > 
> > -- 
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2021-08-10  9:13 UTC|newest]

Thread overview: 186+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03 22:28 [PATCH 00/46] Parallel submission aka multi-bb execbuf Matthew Brost
2021-08-03 22:28 ` [Intel-gfx] " Matthew Brost
2021-08-03 22:28 ` [PATCH 01/46] drm/i915/guc: Allow flexible number of context ids Matthew Brost
2021-08-03 22:28   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:28 ` [PATCH 02/46] drm/i915/guc: Connect the number of guc_ids to debugfs Matthew Brost
2021-08-03 22:28   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 03/46] drm/i915/guc: Don't return -EAGAIN to user when guc_ids exhausted Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-05  8:27   ` Daniel Vetter
2021-08-05  8:27     ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 04/46] drm/i915/guc: Don't allow requests not ready to consume all guc_ids Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-05  8:29   ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 05/46] drm/i915/guc: Introduce guc_submit_engine object Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 06/46] drm/i915/guc: Check return of __xa_store when registering a context Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 07/46] drm/i915/guc: Non-static lrc descriptor registration buffer Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 08/46] drm/i915/guc: Take GT PM ref when deregistering context Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 09/46] drm/i915: Add GT PM unpark worker Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 10/46] drm/i915/guc: Take engine PM when a context is pinned with GuC submission Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:23   ` Daniel Vetter
2021-08-09 14:23     ` [Intel-gfx] " Daniel Vetter
2021-08-09 18:11     ` Matthew Brost
2021-08-09 18:11       ` [Intel-gfx] " Matthew Brost
2021-08-10  6:43       ` Daniel Vetter
2021-08-10  6:43         ` [Intel-gfx] " Daniel Vetter
2021-08-10 21:29         ` Matthew Brost
2021-08-10 21:29           ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 11/46] drm/i915/guc: Don't call switch_to_kernel_context " Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:27   ` Daniel Vetter
2021-08-09 18:20     ` Matthew Brost
2021-08-10  6:47       ` Daniel Vetter
2021-08-11 17:47         ` Matthew Brost
2021-08-03 22:29 ` [PATCH 12/46] drm/i915/guc: Selftest for GuC flow control Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 13/46] drm/i915: Add logical engine mapping Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:28   ` Daniel Vetter
2021-08-09 14:28     ` [Intel-gfx] " Daniel Vetter
2021-08-09 18:28     ` Matthew Brost
2021-08-09 18:28       ` [Intel-gfx] " Matthew Brost
2021-08-10  6:49       ` Daniel Vetter
2021-08-10  6:49         ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 14/46] drm/i915: Expose logical engine instance to user Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:30   ` Daniel Vetter
2021-08-09 14:30     ` [Intel-gfx] " Daniel Vetter
2021-08-09 18:37     ` Matthew Brost
2021-08-09 18:37       ` [Intel-gfx] " Matthew Brost
2021-08-10  6:53       ` Daniel Vetter
2021-08-10  6:53         ` [Intel-gfx] " Daniel Vetter
2021-08-11 17:55         ` Matthew Brost
2021-08-11 17:55           ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 15/46] drm/i915/guc: Introduce context parent-child relationship Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 14:37   ` Daniel Vetter
2021-08-09 14:40     ` Daniel Vetter
2021-08-09 18:45       ` Matthew Brost
2021-08-09 18:44     ` Matthew Brost
2021-08-10  8:45       ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 16/46] drm/i915/guc: Implement GuC parent-child context pin / unpin functions Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 15:17   ` Daniel Vetter
2021-08-09 18:58     ` Matthew Brost
2021-08-10  8:53       ` Daniel Vetter
2021-08-10  9:07         ` Daniel Vetter
2021-08-11 18:06           ` Matthew Brost
2021-08-12 14:45             ` Daniel Vetter
2021-08-12 14:52               ` Daniel Vetter
2021-08-11 18:23         ` Matthew Brost
2021-08-03 22:29 ` [PATCH 17/46] drm/i915/guc: Add multi-lrc context registration Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 18/46] drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 19/46] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 15:31   ` Daniel Vetter
2021-08-09 15:31     ` [Intel-gfx] " Daniel Vetter
2021-08-09 19:03     ` Matthew Brost
2021-08-09 19:03       ` [Intel-gfx] " Matthew Brost
2021-08-10  9:12       ` Daniel Vetter [this message]
2021-08-10  9:12         ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 20/46] drm/i915/guc: Add hang check to GuC submit engine Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 15:35   ` Daniel Vetter
2021-08-09 15:35     ` [Intel-gfx] " Daniel Vetter
2021-08-09 19:05     ` Matthew Brost
2021-08-09 19:05       ` [Intel-gfx] " Matthew Brost
2021-08-10  9:18       ` Daniel Vetter
2021-08-10  9:18         ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 21/46] drm/i915/guc: Add guc_child_context_destroy Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 15:36   ` Daniel Vetter
2021-08-09 19:06     ` Matthew Brost
2021-08-03 22:29 ` [PATCH 22/46] drm/i915/guc: Implement multi-lrc submission Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 23/46] drm/i915/guc: Insert submit fences between requests in parent-child relationship Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 16:32   ` Daniel Vetter
2021-08-09 16:39     ` Matthew Brost
2021-08-09 17:03       ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 24/46] drm/i915/guc: Implement multi-lrc reset Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 25/46] drm/i915/guc: Update debugfs for GuC multi-lrc Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 16:36   ` Daniel Vetter
2021-08-09 16:36     ` [Intel-gfx] " Daniel Vetter
2021-08-09 19:13     ` Matthew Brost
2021-08-09 19:13       ` [Intel-gfx] " Matthew Brost
2021-08-10  9:23       ` Daniel Vetter
2021-08-10  9:23         ` [Intel-gfx] " Daniel Vetter
2021-08-10  9:27         ` Daniel Vetter
2021-08-10  9:27           ` [Intel-gfx] " Daniel Vetter
2021-08-10 17:29           ` Matthew Brost
2021-08-10 17:29             ` [Intel-gfx] " Matthew Brost
2021-08-11 10:04             ` Daniel Vetter
2021-08-11 10:04               ` [Intel-gfx] " Daniel Vetter
2021-08-11 17:35               ` Matthew Brost
2021-08-11 17:35                 ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 26/46] drm/i915: Connect UAPI to GuC multi-lrc interface Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 16:37   ` Daniel Vetter
2021-08-09 16:37     ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 27/46] drm/i915/doc: Update parallel submit doc to point to i915_drm.h Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 28/46] drm/i915/guc: Add basic GuC multi-lrc selftest Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 29/46] drm/i915/guc: Extend GuC flow control selftest for multi-lrc Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 30/46] drm/i915/guc: Implement no mid batch preemption " Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 31/46] drm/i915: Move secure execbuf check to execbuf2 Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 32/46] drm/i915: Move input/exec fence handling to i915_gem_execbuffer2 Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 33/46] drm/i915: Move output " Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 34/46] drm/i915: Return output fence from i915_gem_do_execbuffer Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 35/46] drm/i915: Store batch index in struct i915_execbuffer Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 36/46] drm/i915: Allow callers of i915_gem_do_execbuffer to override the batch index Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 37/46] drm/i915: Teach execbuf there can be more than one batch in the objects list Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 38/46] drm/i915: Only track object dependencies on first request Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 39/46] drm/i915: Force parallel contexts to use copy engine for reloc Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 16:39   ` Daniel Vetter
2021-08-09 16:39     ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 40/46] drm/i915: Multi-batch execbuffer2 Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 17:02   ` Daniel Vetter
2021-08-09 17:02     ` [Intel-gfx] " Daniel Vetter
2021-08-03 22:29 ` [PATCH 41/46] drm/i915: Eliminate unnecessary VMA calls for multi-BB submission Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 17:07   ` Daniel Vetter
2021-08-09 17:12     ` Daniel Vetter
2021-08-03 22:29 ` [PATCH 42/46] drm/i915: Hold all parallel requests until last request, properly handle error Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 43/46] drm/i915/guc: Handle errors in multi-lrc requests Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 44/46] drm/i915: Enable multi-bb execbuf Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 45/46] drm/i915/execlists: Weak parallel submission support for execlists Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-03 22:29 ` [PATCH 46/46] drm/i915/guc: Add delay before disabling scheduling on contexts Matthew Brost
2021-08-03 22:29   ` [Intel-gfx] " Matthew Brost
2021-08-09 17:17   ` Daniel Vetter
2021-08-09 19:32     ` Matthew Brost
2021-08-11  9:55       ` Daniel Vetter
2021-08-11 17:43         ` Matthew Brost
2021-08-12 14:04           ` Daniel Vetter
2021-08-12 19:26   ` Daniel Vetter
2021-08-03 22:51 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Parallel submission aka multi-bb execbuf (rev2) Patchwork
2021-08-03 22:53 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-08-03 22:57 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2021-08-03 23:19 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-08-05  3:53 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YRJDFntWKZ9sjrPz@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.