Re: [Intel-xe] [PATCH v2] drm/xe: Use fast virtual copy engine for migrate engine on PVC

From: "Chang, Yu bruce" <yu.bruce.chang@intel.com>
To: "Brost, Matthew" <matthew.brost@intel.com>
Cc: "De Marchi, Lucas" <lucas.demarchi@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: [Intel-xe] [PATCH v2] drm/xe: Use fast virtual copy engine for migrate engine on PVC
Date: Fri, 24 Mar 2023 18:12:00 +0000	[thread overview]
Message-ID: <CY8PR11MB6940EB1DADE0C4E4583D62E8C3849@CY8PR11MB6940.namprd11.prod.outlook.com> (raw)
In-Reply-To: <ZB3K0PjzD1LWD7u0@DUT025-TGLU.fm.intel.com>

> -----Original Message-----
> From: Brost, Matthew <matthew.brost@intel.com>
> Sent: Friday, March 24, 2023 9:08 AM
> To: Chang, Yu bruce <yu.bruce.chang@intel.com>
> Cc: De Marchi, Lucas <lucas.demarchi@intel.com>; intel-
> xe@lists.freedesktop.org
> Subject: Re: [Intel-xe] [PATCH v2] drm/xe: Use fast virtual copy engine for
> migrate engine on PVC
> 
> On Fri, Mar 24, 2023 at 09:29:10AM -0600, Chang, Yu bruce wrote:
> >
> >
> > > -----Original Message-----
> > > From: Brost, Matthew <matthew.brost@intel.com>
> > > Sent: Thursday, March 23, 2023 11:59 PM
> > > To: De Marchi, Lucas <lucas.demarchi@intel.com>
> > > Cc: intel-xe@lists.freedesktop.org; Chang, Yu bruce
> > > <yu.bruce.chang@intel.com>
> > > Subject: Re: [Intel-xe] [PATCH v2] drm/xe: Use fast virtual copy
> > > engine for migrate engine on PVC
> > >
> > > On Thu, Mar 23, 2023 at 09:53:11PM -0700, Lucas De Marchi wrote:
> > > > On Thu, Mar 23, 2023 at 06:23:29PM -0700, Matthew Brost wrote:
> > > > > Some copy hardware engine instances are faster than others on
> > > > > PVC, use a virtual engine of these plus the reserved instance
> > > > > for the migrate engine on PVC. The idea being if a fast instance
> > > > > is available it will be used and the throughput of kernel
> > > > > copies, clears, and pagefault servicing will be higher.
> > > >
> > > > how faster and/or why?  If it was related to being link copy
> > > > engine vs main copy engine it was very understandable as the
> > > > commands available are different and optimized for certain usages.
> > > > However below you are setting to the odd link copy engines + the
> > > > main copy engine
> > > > + whatever was reserved for USM.
> > > >
> > > > Without a proper reason here or numbers or spec, it's hard to
> > > > judge where this is coming from and understand in future.
> > > >
> > >
> > > Your right, probably need to get a spec reference or something to justify
> this.
> > > I came up with this bit mask from IM conversation with Bruce, maybe
> > > he can point me to the spec. Also I looked at the i915 code for this
> > > and it is just BCS0
> > > | reserved BCS so definitely need to dig into what is the ideal mask.
> > >
> > Please find the detailed information from the i915 patch below:
> >
> > INTEL_DII: drm/i915/pvc: Force even num engines to use 64B
> >
> >     On PVC observed gt_fatal_7 as arbiter is out of credits while
> >     running Molten Concurrency stress+ 2 HPLs + ProcHot + Warm Idle
> >     + Solar DVFS + ASPM + Link Width Change.
> >
> >     Its root caused to HW bug and SW workaround proposed to use
> >     all even instance engines to do 64B transfer while using
> >     system memory.
> >
> >     So this change implements below scenario :
> >     ------------------------------------------------------------
> >     L7  |  L6  |  L5  |  L4  |  L3  |  L2  |  L1  |  L0  |  Main
> >
> >     8      7      6      5      4      3      2      1
> >
> >     64B    256B   64B    256B   64B    256B   64B    256B   64B
> >     -------------------------------------------------------------
> >
> > Bug-id: 16017236439
> >
> > The 64B will limit the transfer BW. The main copy engine has several
> > backend, So it may not be impacted much, but other link copy engine
> > such as the reserved
> > bcs8 will slow down to possible ~20% for host transfer.
> >
> 
> Hmm, we don't have this WA, indirect BB, WA pages in Xe? I have no idea
> what this is for but for the moment this patch isn't needed. If we pull in this
> WA, then I see why we need this.
> 
> Matt
> 

PVC will need this WA, as long as there is concurrent read and write from smem
for bcs.

-Bruce

> > -Bruce
> > > >
> > > > >
> > > > > v2: Include local change of correct mask for fast instances
> > > > >
> > > > > Cc: Bruce Chang <yu.bruce.chang@intel.com>
> > > > > Signed-off-by: Matthew Brost <matthew.brost@intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/xe/xe_engine.h    |  2 ++
> > > > > drivers/gpu/drm/xe/xe_hw_engine.c | 20 ++++++++++++++++++++
> > > > > drivers/gpu/drm/xe/xe_migrate.c   |  7 ++++---
> > > > > 3 files changed, 26 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/xe/xe_engine.h
> > > > > b/drivers/gpu/drm/xe/xe_engine.h index
> > > > > 1cf7f23c4afd..0a9c35ea3d34
> > > > > 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_engine.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_engine.h
> > > > > @@ -26,6 +26,8 @@ void xe_engine_destroy(struct kref *ref);
> > > > >
> > > > > struct xe_engine *xe_engine_lookup(struct xe_file *xef, u32 id);
> > > > >
> > > > > +u32 xe_hw_engine_fast_copy_logical_mask(struct xe_gt *gt);
> > > > > +
> > > > > static inline struct xe_engine *xe_engine_get(struct xe_engine
> > > > > *engine) {
> > > > > 	kref_get(&engine->refcount);
> > > > > diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > index 63a4efd5edcc..d2b43b189b14 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_hw_engine.c
> > > > > @@ -600,3 +600,23 @@ bool xe_hw_engine_is_reserved(struct
> > > xe_hw_engine *hwe)
> > > > > 	return xe->info.supports_usm && hwe->class ==
> > > XE_ENGINE_CLASS_COPY &&
> > > > > 		hwe->instance == gt->usm.reserved_bcs_instance; }
> > > > > +
> > > > > +u32 xe_hw_engine_fast_copy_logical_mask(struct xe_gt *gt)
> > > >
> > > > this deserves its kernel-doc, probably with similar info asked for
> > > > in the commit message.
> > > >
> > >
> > > I thought I added kernel DoC but apartently forgot. Will fix in next rev.
> > >
> > > Matt
> > >
> > > > Lucas De Marchi
> > > >
> > > > > +{
> > > > > +	struct xe_device *xe = gt_to_xe(gt);
> > > > > +	struct xe_hw_engine *hwe;
> > > > > +	const u32 fast_physical_mask = 0xab;	/* 0, 1, 3, 5, 7 */
> > > > > +	u32 fast_logical_mask = 0;
> > > > > +	enum xe_hw_engine_id id;
> > > > > +
> > > > > +	/* XXX: We only support this function on PVC for now */
> > > > > +	XE_BUG_ON(!(xe->info.platform == XE_PVC));
> > > > > +
> > > > > +	for_each_hw_engine(hwe, gt, id) {
> > > > > +		if ((fast_physical_mask | gt-
> >usm.reserved_bcs_instance) &
> > > > > +		    BIT(hwe->instance))
> > > > > +			fast_logical_mask |= hwe->logical_instance;
> > > > > +	}
> > > > > +
> > > > > +	return fast_logical_mask;
> > > > > +}
> > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > b/drivers/gpu/drm/xe/xe_migrate.c index
> > > > > 11c8af9c6c92..4a7fec5d619d
> > > > > 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > @@ -345,11 +345,12 @@ struct xe_migrate *xe_migrate_init(struct
> > > xe_gt *gt)
> > > > >
> > > XE_ENGINE_CLASS_COPY,
> > > > > 							   gt-
> > > >usm.reserved_bcs_instance,
> > > > > 							   false);
> > > > > -		if (!hwe)
> > > > > +		u32 logical_mask =
> > > xe_hw_engine_fast_copy_logical_mask(gt);
> > > > > +
> > > > > +		if (!hwe || !logical_mask)
> > > > > 			return ERR_PTR(-EINVAL);
> > > > >
> > > > > -		m->eng = xe_engine_create(xe, vm,
> > > > > -					  BIT(hwe->logical_instance),
> 1,
> > > > > +		m->eng = xe_engine_create(xe, vm, logical_mask, 1,
> > > > > 					  hwe, ENGINE_FLAG_KERNEL);
> > > > > 	} else {
> > > > > 		m->eng = xe_engine_create_class(xe, gt, vm,
> > > > > --
> > > > > 2.34.1
> > > > >