All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
@ 2023-01-30 16:17 José Roberto de Souza
  2023-01-30 16:31 ` Lucas De Marchi
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: José Roberto de Souza @ 2023-01-30 16:17 UTC (permalink / raw)
  To: intel-xe; +Cc: Lucas De Marchi

i915 has the same fake workaround to return MI_BATCH_BUFFER_START
nested batch buffer behavior in DG2 and newer platforms to the same
behavior as older platforms.

So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
chained batch buffer level.

v2:
- replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
- move fake workaround to lrc_additional_programming table

Bspec: 45974, 45718
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.c |  1 +
 drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_wa.h |  1 +
 3 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 84a73eeccd297..5d07e1e7bd506 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
 
 		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
 		xe_wa_process_lrc(hwe);
+		xe_wa_process_lrc_additional_programming(hwe);
 
 		default_lrc = drmm_kzalloc(&xe->drm,
 					   xe_lrc_size(xe, hwe->class),
diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
index 3325de3edf691..744b7d0982683 100644
--- a/drivers/gpu/drm/xe/xe_wa.c
+++ b/drivers/gpu/drm/xe/xe_wa.c
@@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
 	{}
 };
 
+static const struct xe_rtp_entry lrc_additional_programming[] = {
+	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
+	  /*
+	   * This is a "fake" workaround defined by software to ensure we
+	   * maintain reliable, backward-compatible behavior for userspace with
+	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
+	   */
+	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
+	  XE_RTP_CLR(RING_MI_MODE(0),
+		     TGL_NESTED_BB_EN,
+		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
+	},
+	{}
+};
+
 static const struct xe_rtp_entry register_whitelist[] = {
 	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
 	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
@@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
 	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
 }
 
+/**
+ * xe_wa_process_lrc_additional_programming - process additional LRC programming
+ * table
+ * @hwe: engine instance to process workarounds for
+ *
+ * Process additional context programming table for this platform, saving in
+ * @hwe all the registers changes that need to be applied on context restore.
+ */
+void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
+{
+	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
+}
+
 /**
  * xe_reg_whitelist_process_engine - process table of registers to whitelist
  * @hwe: engine instance to process whitelist for
diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
index 1a0659690a320..872f3e4ddc73c 100644
--- a/drivers/gpu/drm/xe/xe_wa.h
+++ b/drivers/gpu/drm/xe/xe_wa.h
@@ -12,6 +12,7 @@ struct xe_hw_engine;
 void xe_wa_process_gt(struct xe_gt *gt);
 void xe_wa_process_engine(struct xe_hw_engine *hwe);
 void xe_wa_process_lrc(struct xe_hw_engine *hwe);
+void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
 
 void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
 void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-30 16:17 [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START José Roberto de Souza
@ 2023-01-30 16:31 ` Lucas De Marchi
  2023-01-30 17:15 ` Matt Roper
  2023-01-31  7:24 ` Jani Nikula
  2 siblings, 0 replies; 14+ messages in thread
From: Lucas De Marchi @ 2023-01-30 16:31 UTC (permalink / raw)
  To: José Roberto de Souza; +Cc: intel-xe

On Mon, Jan 30, 2023 at 08:17:23AM -0800, Jose Souza wrote:
>i915 has the same fake workaround to return MI_BATCH_BUFFER_START
>nested batch buffer behavior in DG2 and newer platforms to the same
>behavior as older platforms.
>
>So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
>chained batch buffer level.
>
>v2:
>- replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
>- move fake workaround to lrc_additional_programming table
>
>Bspec: 45974, 45718
>Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>Cc: Matt Roper <matthew.d.roper@intel.com>
>Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
>---
> drivers/gpu/drm/xe/xe_gt.c |  1 +
> drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
> drivers/gpu/drm/xe/xe_wa.h |  1 +
> 3 files changed, 30 insertions(+)
>
>diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>index 84a73eeccd297..5d07e1e7bd506 100644
>--- a/drivers/gpu/drm/xe/xe_gt.c
>+++ b/drivers/gpu/drm/xe/xe_gt.c
>@@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
>
> 		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
> 		xe_wa_process_lrc(hwe);
>+		xe_wa_process_lrc_additional_programming(hwe);

it's in a separate struct just to avoid mixing stuff, but it should be
handled by xe_wa_process_lrc().

>
> 		default_lrc = drmm_kzalloc(&xe->drm,
> 					   xe_lrc_size(xe, hwe->class),
>diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
>index 3325de3edf691..744b7d0982683 100644
>--- a/drivers/gpu/drm/xe/xe_wa.c
>+++ b/drivers/gpu/drm/xe/xe_wa.c
>@@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
> 	{}
> };
>
>+static const struct xe_rtp_entry lrc_additional_programming[] = {
>+	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
>+	  /*
>+	   * This is a "fake" workaround defined by software to ensure we
>+	   * maintain reliable, backward-compatible behavior for userspace with
>+	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
>+	   */
>+	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
>+	  XE_RTP_CLR(RING_MI_MODE(0),
>+		     TGL_NESTED_BB_EN,
>+		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
>+	},
>+	{}
>+};
>+
> static const struct xe_rtp_entry register_whitelist[] = {
> 	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
> 	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
>@@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
> 	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
> }
>
>+/**
>+ * xe_wa_process_lrc_additional_programming - process additional LRC programming
>+ * table
>+ * @hwe: engine instance to process workarounds for
>+ *
>+ * Process additional context programming table for this platform, saving in
>+ * @hwe all the registers changes that need to be applied on context restore.
>+ */
>+void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
>+{
>+	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);

move this to xe_wa_process_lrc()

>+}
>+
> /**
>  * xe_reg_whitelist_process_engine - process table of registers to whitelist
>  * @hwe: engine instance to process whitelist for
>diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
>index 1a0659690a320..872f3e4ddc73c 100644
>--- a/drivers/gpu/drm/xe/xe_wa.h
>+++ b/drivers/gpu/drm/xe/xe_wa.h
>@@ -12,6 +12,7 @@ struct xe_hw_engine;
> void xe_wa_process_gt(struct xe_gt *gt);
> void xe_wa_process_engine(struct xe_hw_engine *hwe);
> void xe_wa_process_lrc(struct xe_hw_engine *hwe);
>+void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);

and drop this


Lucas De Marchi

>
> void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
> void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
>-- 
>2.39.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-30 16:17 [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START José Roberto de Souza
  2023-01-30 16:31 ` Lucas De Marchi
@ 2023-01-30 17:15 ` Matt Roper
  2023-01-30 18:04   ` Souza, Jose
  2023-01-31  7:24 ` Jani Nikula
  2 siblings, 1 reply; 14+ messages in thread
From: Matt Roper @ 2023-01-30 17:15 UTC (permalink / raw)
  To: José Roberto de Souza; +Cc: Lucas De Marchi, intel-xe

On Mon, Jan 30, 2023 at 08:17:23AM -0800, José Roberto de Souza wrote:
> i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> nested batch buffer behavior in DG2 and newer platforms to the same
> behavior as older platforms.
> 
> So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> chained batch buffer level.

I was kind of assuming we'd just drop this setting for the Xe driver.  I
believe hardware will be removing the option to turn off nested
batchbuffers in an upcoming platform, so userspace is going to have to
adapt to the new behavior soon anyway; doing it while moving to a new
KMD seems like the easiest time to make that happen since the UMDs are
already updating their programming models.


Matt

> 
> v2:
> - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> - move fake workaround to lrc_additional_programming table
> 
> Bspec: 45974, 45718
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt.c |  1 +
>  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_wa.h |  1 +
>  3 files changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 84a73eeccd297..5d07e1e7bd506 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
>  
>  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
>  		xe_wa_process_lrc(hwe);
> +		xe_wa_process_lrc_additional_programming(hwe);
>  
>  		default_lrc = drmm_kzalloc(&xe->drm,
>  					   xe_lrc_size(xe, hwe->class),
> diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> index 3325de3edf691..744b7d0982683 100644
> --- a/drivers/gpu/drm/xe/xe_wa.c
> +++ b/drivers/gpu/drm/xe/xe_wa.c
> @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
>  	{}
>  };
>  
> +static const struct xe_rtp_entry lrc_additional_programming[] = {
> +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
> +	  /*
> +	   * This is a "fake" workaround defined by software to ensure we
> +	   * maintain reliable, backward-compatible behavior for userspace with
> +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> +	   */
> +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> +	  XE_RTP_CLR(RING_MI_MODE(0),
> +		     TGL_NESTED_BB_EN,
> +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> +	},
> +	{}
> +};
> +
>  static const struct xe_rtp_entry register_whitelist[] = {
>  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
>  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
>  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
>  }
>  
> +/**
> + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> + * table
> + * @hwe: engine instance to process workarounds for
> + *
> + * Process additional context programming table for this platform, saving in
> + * @hwe all the registers changes that need to be applied on context restore.
> + */
> +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> +{
> +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> +}
> +
>  /**
>   * xe_reg_whitelist_process_engine - process table of registers to whitelist
>   * @hwe: engine instance to process whitelist for
> diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> index 1a0659690a320..872f3e4ddc73c 100644
> --- a/drivers/gpu/drm/xe/xe_wa.h
> +++ b/drivers/gpu/drm/xe/xe_wa.h
> @@ -12,6 +12,7 @@ struct xe_hw_engine;
>  void xe_wa_process_gt(struct xe_gt *gt);
>  void xe_wa_process_engine(struct xe_hw_engine *hwe);
>  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
>  
>  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
>  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
> -- 
> 2.39.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-30 17:15 ` Matt Roper
@ 2023-01-30 18:04   ` Souza, Jose
  2023-01-30 19:27     ` Lucas De Marchi
  0 siblings, 1 reply; 14+ messages in thread
From: Souza, Jose @ 2023-01-30 18:04 UTC (permalink / raw)
  To: Roper, Matthew D; +Cc: De Marchi, Lucas, intel-xe

On Mon, 2023-01-30 at 09:15 -0800, Matt Roper wrote:
> On Mon, Jan 30, 2023 at 08:17:23AM -0800, José Roberto de Souza wrote:
> > i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> > nested batch buffer behavior in DG2 and newer platforms to the same
> > behavior as older platforms.
> > 
> > So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> > chained batch buffer level.
> 
> I was kind of assuming we'd just drop this setting for the Xe driver.  I
> believe hardware will be removing the option to turn off nested
> batchbuffers in an upcoming platform, so userspace is going to have to
> adapt to the new behavior soon anyway; doing it while moving to a new
> KMD seems like the easiest time to make that happen since the UMDs are
> already updating their programming models.

This would bring even more changes to track when debugging issues in Xe KMD port.
Better do this after Xe KMD is stabilized and with better CI coverage in UMDs and KMDs.

> 
> 
> Matt
> 
> > 
> > v2:
> > - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> > - move fake workaround to lrc_additional_programming table
> > 
> > Bspec: 45974, 45718
> > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_gt.c |  1 +
> >  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_wa.h |  1 +
> >  3 files changed, 30 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > index 84a73eeccd297..5d07e1e7bd506 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
> >  
> >  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
> >  		xe_wa_process_lrc(hwe);
> > +		xe_wa_process_lrc_additional_programming(hwe);
> >  
> >  		default_lrc = drmm_kzalloc(&xe->drm,
> >  					   xe_lrc_size(xe, hwe->class),
> > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> > index 3325de3edf691..744b7d0982683 100644
> > --- a/drivers/gpu/drm/xe/xe_wa.c
> > +++ b/drivers/gpu/drm/xe/xe_wa.c
> > @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
> >  	{}
> >  };
> >  
> > +static const struct xe_rtp_entry lrc_additional_programming[] = {
> > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
> > +	  /*
> > +	   * This is a "fake" workaround defined by software to ensure we
> > +	   * maintain reliable, backward-compatible behavior for userspace with
> > +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> > +	   */
> > +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> > +	  XE_RTP_CLR(RING_MI_MODE(0),
> > +		     TGL_NESTED_BB_EN,
> > +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> > +	},
> > +	{}
> > +};
> > +
> >  static const struct xe_rtp_entry register_whitelist[] = {
> >  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
> >  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> > @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
> >  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
> >  }
> >  
> > +/**
> > + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> > + * table
> > + * @hwe: engine instance to process workarounds for
> > + *
> > + * Process additional context programming table for this platform, saving in
> > + * @hwe all the registers changes that need to be applied on context restore.
> > + */
> > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> > +{
> > +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> > +}
> > +
> >  /**
> >   * xe_reg_whitelist_process_engine - process table of registers to whitelist
> >   * @hwe: engine instance to process whitelist for
> > diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> > index 1a0659690a320..872f3e4ddc73c 100644
> > --- a/drivers/gpu/drm/xe/xe_wa.h
> > +++ b/drivers/gpu/drm/xe/xe_wa.h
> > @@ -12,6 +12,7 @@ struct xe_hw_engine;
> >  void xe_wa_process_gt(struct xe_gt *gt);
> >  void xe_wa_process_engine(struct xe_hw_engine *hwe);
> >  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
> >  
> >  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
> >  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
> > -- 
> > 2.39.1
> > 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-30 18:04   ` Souza, Jose
@ 2023-01-30 19:27     ` Lucas De Marchi
  2023-01-30 19:43       ` Souza, Jose
  0 siblings, 1 reply; 14+ messages in thread
From: Lucas De Marchi @ 2023-01-30 19:27 UTC (permalink / raw)
  To: Souza, Jose; +Cc: intel-xe

On Mon, Jan 30, 2023 at 10:04:25AM -0800, Jose Souza wrote:
>On Mon, 2023-01-30 at 09:15 -0800, Matt Roper wrote:
>> On Mon, Jan 30, 2023 at 08:17:23AM -0800, José Roberto de Souza wrote:
>> > i915 has the same fake workaround to return MI_BATCH_BUFFER_START
>> > nested batch buffer behavior in DG2 and newer platforms to the same
>> > behavior as older platforms.
>> >
>> > So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
>> > chained batch buffer level.
>>
>> I was kind of assuming we'd just drop this setting for the Xe driver.  I
>> believe hardware will be removing the option to turn off nested
>> batchbuffers in an upcoming platform, so userspace is going to have to
>> adapt to the new behavior soon anyway; doing it while moving to a new
>> KMD seems like the easiest time to make that happen since the UMDs are
>> already updating their programming models.
>
>This would bring even more changes to track when debugging issues in Xe KMD port.
>Better do this after Xe KMD is stabilized and with better CI coverage in UMDs and KMDs.

but if we start supporting these platforms with nested bb disabled,
we can't change it later. At least not for the older platforms that had
it that way.

Lucas De Marchi

>
>>
>>
>> Matt
>>
>> >
>> > v2:
>> > - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
>> > - move fake workaround to lrc_additional_programming table
>> >
>> > Bspec: 45974, 45718
>> > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> > Cc: Matt Roper <matthew.d.roper@intel.com>
>> > Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
>> > ---
>> >  drivers/gpu/drm/xe/xe_gt.c |  1 +
>> >  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
>> >  drivers/gpu/drm/xe/xe_wa.h |  1 +
>> >  3 files changed, 30 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>> > index 84a73eeccd297..5d07e1e7bd506 100644
>> > --- a/drivers/gpu/drm/xe/xe_gt.c
>> > +++ b/drivers/gpu/drm/xe/xe_gt.c
>> > @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
>> >
>> >  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
>> >  		xe_wa_process_lrc(hwe);
>> > +		xe_wa_process_lrc_additional_programming(hwe);
>> >
>> >  		default_lrc = drmm_kzalloc(&xe->drm,
>> >  					   xe_lrc_size(xe, hwe->class),
>> > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
>> > index 3325de3edf691..744b7d0982683 100644
>> > --- a/drivers/gpu/drm/xe/xe_wa.c
>> > +++ b/drivers/gpu/drm/xe/xe_wa.c
>> > @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
>> >  	{}
>> >  };
>> >
>> > +static const struct xe_rtp_entry lrc_additional_programming[] = {
>> > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
>> > +	  /*
>> > +	   * This is a "fake" workaround defined by software to ensure we
>> > +	   * maintain reliable, backward-compatible behavior for userspace with
>> > +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
>> > +	   */
>> > +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
>> > +	  XE_RTP_CLR(RING_MI_MODE(0),
>> > +		     TGL_NESTED_BB_EN,
>> > +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
>> > +	},
>> > +	{}
>> > +};
>> > +
>> >  static const struct xe_rtp_entry register_whitelist[] = {
>> >  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
>> >  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
>> > @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
>> >  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
>> >  }
>> >
>> > +/**
>> > + * xe_wa_process_lrc_additional_programming - process additional LRC programming
>> > + * table
>> > + * @hwe: engine instance to process workarounds for
>> > + *
>> > + * Process additional context programming table for this platform, saving in
>> > + * @hwe all the registers changes that need to be applied on context restore.
>> > + */
>> > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
>> > +{
>> > +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
>> > +}
>> > +
>> >  /**
>> >   * xe_reg_whitelist_process_engine - process table of registers to whitelist
>> >   * @hwe: engine instance to process whitelist for
>> > diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
>> > index 1a0659690a320..872f3e4ddc73c 100644
>> > --- a/drivers/gpu/drm/xe/xe_wa.h
>> > +++ b/drivers/gpu/drm/xe/xe_wa.h
>> > @@ -12,6 +12,7 @@ struct xe_hw_engine;
>> >  void xe_wa_process_gt(struct xe_gt *gt);
>> >  void xe_wa_process_engine(struct xe_hw_engine *hwe);
>> >  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
>> > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
>> >
>> >  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
>> >  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
>> > --
>> > 2.39.1
>> >
>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-30 19:27     ` Lucas De Marchi
@ 2023-01-30 19:43       ` Souza, Jose
  2023-02-01 21:20         ` Rodrigo Vivi
  0 siblings, 1 reply; 14+ messages in thread
From: Souza, Jose @ 2023-01-30 19:43 UTC (permalink / raw)
  To: De Marchi, Lucas; +Cc: intel-xe

On Mon, 2023-01-30 at 11:27 -0800, Lucas De Marchi wrote:
> On Mon, Jan 30, 2023 at 10:04:25AM -0800, Jose Souza wrote:
> > On Mon, 2023-01-30 at 09:15 -0800, Matt Roper wrote:
> > > On Mon, Jan 30, 2023 at 08:17:23AM -0800, José Roberto de Souza wrote:
> > > > i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> > > > nested batch buffer behavior in DG2 and newer platforms to the same
> > > > behavior as older platforms.
> > > > 
> > > > So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> > > > chained batch buffer level.
> > > 
> > > I was kind of assuming we'd just drop this setting for the Xe driver.  I
> > > believe hardware will be removing the option to turn off nested
> > > batchbuffers in an upcoming platform, so userspace is going to have to
> > > adapt to the new behavior soon anyway; doing it while moving to a new
> > > KMD seems like the easiest time to make that happen since the UMDs are
> > > already updating their programming models.
> > 
> > This would bring even more changes to track when debugging issues in Xe KMD port.
> > Better do this after Xe KMD is stabilized and with better CI coverage in UMDs and KMDs.
> 
> but if we start supporting these platforms with nested bb disabled,
> we can't change it later. At least not for the older platforms that had
> it that way.

Xe KMD will not support by default TGL, DG2... anyways.
Why have different hardware behavior between KMDs then?

> 
> Lucas De Marchi
> 
> > 
> > > 
> > > 
> > > Matt
> > > 
> > > > 
> > > > v2:
> > > > - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> > > > - move fake workaround to lrc_additional_programming table
> > > > 
> > > > Bspec: 45974, 45718
> > > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/xe_gt.c |  1 +
> > > >  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
> > > >  drivers/gpu/drm/xe/xe_wa.h |  1 +
> > > >  3 files changed, 30 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > index 84a73eeccd297..5d07e1e7bd506 100644
> > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
> > > > 
> > > >  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
> > > >  		xe_wa_process_lrc(hwe);
> > > > +		xe_wa_process_lrc_additional_programming(hwe);
> > > > 
> > > >  		default_lrc = drmm_kzalloc(&xe->drm,
> > > >  					   xe_lrc_size(xe, hwe->class),
> > > > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> > > > index 3325de3edf691..744b7d0982683 100644
> > > > --- a/drivers/gpu/drm/xe/xe_wa.c
> > > > +++ b/drivers/gpu/drm/xe/xe_wa.c
> > > > @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
> > > >  	{}
> > > >  };
> > > > 
> > > > +static const struct xe_rtp_entry lrc_additional_programming[] = {
> > > > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
> > > > +	  /*
> > > > +	   * This is a "fake" workaround defined by software to ensure we
> > > > +	   * maintain reliable, backward-compatible behavior for userspace with
> > > > +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> > > > +	   */
> > > > +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> > > > +	  XE_RTP_CLR(RING_MI_MODE(0),
> > > > +		     TGL_NESTED_BB_EN,
> > > > +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> > > > +	},
> > > > +	{}
> > > > +};
> > > > +
> > > >  static const struct xe_rtp_entry register_whitelist[] = {
> > > >  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
> > > >  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> > > > @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
> > > >  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
> > > >  }
> > > > 
> > > > +/**
> > > > + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> > > > + * table
> > > > + * @hwe: engine instance to process workarounds for
> > > > + *
> > > > + * Process additional context programming table for this platform, saving in
> > > > + * @hwe all the registers changes that need to be applied on context restore.
> > > > + */
> > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> > > > +{
> > > > +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> > > > +}
> > > > +
> > > >  /**
> > > >   * xe_reg_whitelist_process_engine - process table of registers to whitelist
> > > >   * @hwe: engine instance to process whitelist for
> > > > diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> > > > index 1a0659690a320..872f3e4ddc73c 100644
> > > > --- a/drivers/gpu/drm/xe/xe_wa.h
> > > > +++ b/drivers/gpu/drm/xe/xe_wa.h
> > > > @@ -12,6 +12,7 @@ struct xe_hw_engine;
> > > >  void xe_wa_process_gt(struct xe_gt *gt);
> > > >  void xe_wa_process_engine(struct xe_hw_engine *hwe);
> > > >  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
> > > > 
> > > >  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
> > > >  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
> > > > --
> > > > 2.39.1
> > > > 
> > > 
> > 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-30 16:17 [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START José Roberto de Souza
  2023-01-30 16:31 ` Lucas De Marchi
  2023-01-30 17:15 ` Matt Roper
@ 2023-01-31  7:24 ` Jani Nikula
  2023-01-31 13:50   ` Souza, Jose
  2 siblings, 1 reply; 14+ messages in thread
From: Jani Nikula @ 2023-01-31  7:24 UTC (permalink / raw)
  To: José Roberto de Souza, intel-xe; +Cc: Lucas De Marchi

On Mon, 30 Jan 2023, José Roberto de Souza <jose.souza@intel.com> wrote:
> i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> nested batch buffer behavior in DG2 and newer platforms to the same
> behavior as older platforms.
>
> So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> chained batch buffer level.
>
> v2:
> - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> - move fake workaround to lrc_additional_programming table
>
> Bspec: 45974, 45718
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt.c |  1 +
>  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_wa.h |  1 +
>  3 files changed, 30 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index 84a73eeccd297..5d07e1e7bd506 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
>  
>  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
>  		xe_wa_process_lrc(hwe);
> +		xe_wa_process_lrc_additional_programming(hwe);
>  
>  		default_lrc = drmm_kzalloc(&xe->drm,
>  					   xe_lrc_size(xe, hwe->class),
> diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> index 3325de3edf691..744b7d0982683 100644
> --- a/drivers/gpu/drm/xe/xe_wa.c
> +++ b/drivers/gpu/drm/xe/xe_wa.c
> @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
>  	{}
>  };
>  
> +static const struct xe_rtp_entry lrc_additional_programming[] = {
> +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),

Please add newline after {, and indent with tabs not spaces.

> +	  /*
> +	   * This is a "fake" workaround defined by software to ensure we
> +	   * maintain reliable, backward-compatible behavior for userspace with
> +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> +	   */
> +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> +	  XE_RTP_CLR(RING_MI_MODE(0),
> +		     TGL_NESTED_BB_EN,
> +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> +	},
> +	{}
> +};
> +
>  static const struct xe_rtp_entry register_whitelist[] = {
>  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
>  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
>  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
>  }
>  
> +/**
> + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> + * table
> + * @hwe: engine instance to process workarounds for
> + *
> + * Process additional context programming table for this platform, saving in
> + * @hwe all the registers changes that need to be applied on context restore.
> + */
> +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> +{
> +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> +}
> +
>  /**
>   * xe_reg_whitelist_process_engine - process table of registers to whitelist
>   * @hwe: engine instance to process whitelist for
> diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> index 1a0659690a320..872f3e4ddc73c 100644
> --- a/drivers/gpu/drm/xe/xe_wa.h
> +++ b/drivers/gpu/drm/xe/xe_wa.h
> @@ -12,6 +12,7 @@ struct xe_hw_engine;
>  void xe_wa_process_gt(struct xe_gt *gt);
>  void xe_wa_process_engine(struct xe_hw_engine *hwe);
>  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
>  
>  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
>  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-31  7:24 ` Jani Nikula
@ 2023-01-31 13:50   ` Souza, Jose
  2023-01-31 14:04     ` Jani Nikula
  0 siblings, 1 reply; 14+ messages in thread
From: Souza, Jose @ 2023-01-31 13:50 UTC (permalink / raw)
  To: intel-xe, jani.nikula; +Cc: De Marchi, Lucas

On Tue, 2023-01-31 at 09:24 +0200, Jani Nikula wrote:
> On Mon, 30 Jan 2023, José Roberto de Souza <jose.souza@intel.com> wrote:
> > i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> > nested batch buffer behavior in DG2 and newer platforms to the same
> > behavior as older platforms.
> > 
> > So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> > chained batch buffer level.
> > 
> > v2:
> > - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> > - move fake workaround to lrc_additional_programming table
> > 
> > Bspec: 45974, 45718
> > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > Cc: Matt Roper <matthew.d.roper@intel.com>
> > Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_gt.c |  1 +
> >  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_wa.h |  1 +
> >  3 files changed, 30 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > index 84a73eeccd297..5d07e1e7bd506 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
> >  
> >  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
> >  		xe_wa_process_lrc(hwe);
> > +		xe_wa_process_lrc_additional_programming(hwe);
> >  
> >  		default_lrc = drmm_kzalloc(&xe->drm,
> >  					   xe_lrc_size(xe, hwe->class),
> > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> > index 3325de3edf691..744b7d0982683 100644
> > --- a/drivers/gpu/drm/xe/xe_wa.c
> > +++ b/drivers/gpu/drm/xe/xe_wa.c
> > @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
> >  	{}
> >  };
> >  
> > +static const struct xe_rtp_entry lrc_additional_programming[] = {
> > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
> 
> Please add newline after {, and indent with tabs not spaces.

All the xe_rtp_entry arrays have this style pattern, just following what is already in the file.

> 
> > +	  /*
> > +	   * This is a "fake" workaround defined by software to ensure we
> > +	   * maintain reliable, backward-compatible behavior for userspace with
> > +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> > +	   */
> > +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> > +	  XE_RTP_CLR(RING_MI_MODE(0),
> > +		     TGL_NESTED_BB_EN,
> > +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> > +	},
> > +	{}
> > +};
> > +
> >  static const struct xe_rtp_entry register_whitelist[] = {
> >  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
> >  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> > @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
> >  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
> >  }
> >  
> > +/**
> > + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> > + * table
> > + * @hwe: engine instance to process workarounds for
> > + *
> > + * Process additional context programming table for this platform, saving in
> > + * @hwe all the registers changes that need to be applied on context restore.
> > + */
> > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> > +{
> > +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> > +}
> > +
> >  /**
> >   * xe_reg_whitelist_process_engine - process table of registers to whitelist
> >   * @hwe: engine instance to process whitelist for
> > diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> > index 1a0659690a320..872f3e4ddc73c 100644
> > --- a/drivers/gpu/drm/xe/xe_wa.h
> > +++ b/drivers/gpu/drm/xe/xe_wa.h
> > @@ -12,6 +12,7 @@ struct xe_hw_engine;
> >  void xe_wa_process_gt(struct xe_gt *gt);
> >  void xe_wa_process_engine(struct xe_hw_engine *hwe);
> >  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
> >  
> >  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
> >  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-31 13:50   ` Souza, Jose
@ 2023-01-31 14:04     ` Jani Nikula
  2023-01-31 15:31       ` Lucas De Marchi
  0 siblings, 1 reply; 14+ messages in thread
From: Jani Nikula @ 2023-01-31 14:04 UTC (permalink / raw)
  To: Souza, Jose, intel-xe; +Cc: De Marchi, Lucas

On Tue, 31 Jan 2023, "Souza, Jose" <jose.souza@intel.com> wrote:
> On Tue, 2023-01-31 at 09:24 +0200, Jani Nikula wrote:
>> On Mon, 30 Jan 2023, José Roberto de Souza <jose.souza@intel.com> wrote:
>> > +static const struct xe_rtp_entry lrc_additional_programming[] = {
>> > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
>> 
>> Please add newline after {, and indent with tabs not spaces.
>
> All the xe_rtp_entry arrays have this style pattern, just following
> what is already in the file.

Yuck. :p


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-31 14:04     ` Jani Nikula
@ 2023-01-31 15:31       ` Lucas De Marchi
  0 siblings, 0 replies; 14+ messages in thread
From: Lucas De Marchi @ 2023-01-31 15:31 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-xe, Rodrigo Vivi

On Tue, Jan 31, 2023 at 04:04:08PM +0200, Jani Nikula wrote:
>On Tue, 31 Jan 2023, "Souza, Jose" <jose.souza@intel.com> wrote:
>> On Tue, 2023-01-31 at 09:24 +0200, Jani Nikula wrote:
>>> On Mon, 30 Jan 2023, José Roberto de Souza <jose.souza@intel.com> wrote:
>>> > +static const struct xe_rtp_entry lrc_additional_programming[] = {
>>> > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
>>>
>>> Please add newline after {, and indent with tabs not spaces.
>>
>> All the xe_rtp_entry arrays have this style pattern, just following
>> what is already in the file.
>
>Yuck. :p

and I'm the guilty one. It is a table and when adding a lot of entries
the ", {" alone in the line for each entry consumes a lot of vertical
waste space, so I decided to merge them. I considered merging the "}"
too which would balance it, but looked nice to have at least a separator
between each entry.

As we add more entries we may decide another format for the table.
For this series I think the pending decision is more on the UABI side:
do we want to support this for older platforms knowing that future ones
will have this setting changed?

+Rodrigo, +Matt

Lucas De Marchi

>
>
>-- 
>Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-01-30 19:43       ` Souza, Jose
@ 2023-02-01 21:20         ` Rodrigo Vivi
  2023-02-02 14:02           ` Souza, Jose
  0 siblings, 1 reply; 14+ messages in thread
From: Rodrigo Vivi @ 2023-02-01 21:20 UTC (permalink / raw)
  To: Souza, Jose; +Cc: De Marchi, Lucas, intel-xe

On Mon, Jan 30, 2023 at 07:43:58PM +0000, Souza, Jose wrote:
> On Mon, 2023-01-30 at 11:27 -0800, Lucas De Marchi wrote:
> > On Mon, Jan 30, 2023 at 10:04:25AM -0800, Jose Souza wrote:
> > > On Mon, 2023-01-30 at 09:15 -0800, Matt Roper wrote:
> > > > On Mon, Jan 30, 2023 at 08:17:23AM -0800, José Roberto de Souza wrote:
> > > > > i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> > > > > nested batch buffer behavior in DG2 and newer platforms to the same
> > > > > behavior as older platforms.
> > > > > 
> > > > > So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> > > > > chained batch buffer level.
> > > > 
> > > > I was kind of assuming we'd just drop this setting for the Xe driver.  I
> > > > believe hardware will be removing the option to turn off nested
> > > > batchbuffers in an upcoming platform, so userspace is going to have to
> > > > adapt to the new behavior soon anyway; doing it while moving to a new
> > > > KMD seems like the easiest time to make that happen since the UMDs are
> > > > already updating their programming models.
> > > 
> > > This would bring even more changes to track when debugging issues in Xe KMD port.
> > > Better do this after Xe KMD is stabilized and with better CI coverage in UMDs and KMDs.
> > 
> > but if we start supporting these platforms with nested bb disabled,
> > we can't change it later. At least not for the older platforms that had
> > it that way.
> 
> Xe KMD will not support by default TGL, DG2... anyways.
> Why have different hardware behavior between KMDs then?

For the older platforms we will always be behind the force_probe.
So we will be able to change the uapi behavior later.

However, I tend to agree with Matt and Lucas that it would be better
if we align with the future, instead of trying to align with i915
first.

But I'd like to hear from Jose, how much of a trouble this is for the
user space? What are the advantages of aligning with i915's current
behavior first and then move later?

If you noticed all other uapi, there's not a much of alignment with
i915 anyway. Why should we align on this?

Thanks,
Rodrigo.

> 
> > 
> > Lucas De Marchi
> > 
> > > 
> > > > 
> > > > 
> > > > Matt
> > > > 
> > > > > 
> > > > > v2:
> > > > > - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> > > > > - move fake workaround to lrc_additional_programming table
> > > > > 
> > > > > Bspec: 45974, 45718
> > > > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/xe/xe_gt.c |  1 +
> > > > >  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
> > > > >  drivers/gpu/drm/xe/xe_wa.h |  1 +
> > > > >  3 files changed, 30 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > > index 84a73eeccd297..5d07e1e7bd506 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > > @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
> > > > > 
> > > > >  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
> > > > >  		xe_wa_process_lrc(hwe);
> > > > > +		xe_wa_process_lrc_additional_programming(hwe);
> > > > > 
> > > > >  		default_lrc = drmm_kzalloc(&xe->drm,
> > > > >  					   xe_lrc_size(xe, hwe->class),
> > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> > > > > index 3325de3edf691..744b7d0982683 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_wa.c
> > > > > +++ b/drivers/gpu/drm/xe/xe_wa.c
> > > > > @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
> > > > >  	{}
> > > > >  };
> > > > > 
> > > > > +static const struct xe_rtp_entry lrc_additional_programming[] = {
> > > > > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
> > > > > +	  /*
> > > > > +	   * This is a "fake" workaround defined by software to ensure we
> > > > > +	   * maintain reliable, backward-compatible behavior for userspace with
> > > > > +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> > > > > +	   */
> > > > > +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> > > > > +	  XE_RTP_CLR(RING_MI_MODE(0),
> > > > > +		     TGL_NESTED_BB_EN,
> > > > > +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> > > > > +	},
> > > > > +	{}
> > > > > +};
> > > > > +
> > > > >  static const struct xe_rtp_entry register_whitelist[] = {
> > > > >  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
> > > > >  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> > > > > @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
> > > > >  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
> > > > >  }
> > > > > 
> > > > > +/**
> > > > > + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> > > > > + * table
> > > > > + * @hwe: engine instance to process workarounds for
> > > > > + *
> > > > > + * Process additional context programming table for this platform, saving in
> > > > > + * @hwe all the registers changes that need to be applied on context restore.
> > > > > + */
> > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> > > > > +{
> > > > > +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> > > > > +}
> > > > > +
> > > > >  /**
> > > > >   * xe_reg_whitelist_process_engine - process table of registers to whitelist
> > > > >   * @hwe: engine instance to process whitelist for
> > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> > > > > index 1a0659690a320..872f3e4ddc73c 100644
> > > > > --- a/drivers/gpu/drm/xe/xe_wa.h
> > > > > +++ b/drivers/gpu/drm/xe/xe_wa.h
> > > > > @@ -12,6 +12,7 @@ struct xe_hw_engine;
> > > > >  void xe_wa_process_gt(struct xe_gt *gt);
> > > > >  void xe_wa_process_engine(struct xe_hw_engine *hwe);
> > > > >  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
> > > > > 
> > > > >  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
> > > > >  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
> > > > > --
> > > > > 2.39.1
> > > > > 
> > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-02-01 21:20         ` Rodrigo Vivi
@ 2023-02-02 14:02           ` Souza, Jose
  2023-02-02 23:14             ` Matt Roper
  0 siblings, 1 reply; 14+ messages in thread
From: Souza, Jose @ 2023-02-02 14:02 UTC (permalink / raw)
  To: Vivi, Rodrigo; +Cc: De Marchi, Lucas, intel-xe

On Wed, 2023-02-01 at 16:20 -0500, Rodrigo Vivi wrote:
> On Mon, Jan 30, 2023 at 07:43:58PM +0000, Souza, Jose wrote:
> > On Mon, 2023-01-30 at 11:27 -0800, Lucas De Marchi wrote:
> > > On Mon, Jan 30, 2023 at 10:04:25AM -0800, Jose Souza wrote:
> > > > On Mon, 2023-01-30 at 09:15 -0800, Matt Roper wrote:
> > > > > On Mon, Jan 30, 2023 at 08:17:23AM -0800, José Roberto de Souza wrote:
> > > > > > i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> > > > > > nested batch buffer behavior in DG2 and newer platforms to the same
> > > > > > behavior as older platforms.
> > > > > > 
> > > > > > So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> > > > > > chained batch buffer level.
> > > > > 
> > > > > I was kind of assuming we'd just drop this setting for the Xe driver.  I
> > > > > believe hardware will be removing the option to turn off nested
> > > > > batchbuffers in an upcoming platform, so userspace is going to have to
> > > > > adapt to the new behavior soon anyway; doing it while moving to a new
> > > > > KMD seems like the easiest time to make that happen since the UMDs are
> > > > > already updating their programming models.
> > > > 
> > > > This would bring even more changes to track when debugging issues in Xe KMD port.
> > > > Better do this after Xe KMD is stabilized and with better CI coverage in UMDs and KMDs.
> > > 
> > > but if we start supporting these platforms with nested bb disabled,
> > > we can't change it later. At least not for the older platforms that had
> > > it that way.
> > 
> > Xe KMD will not support by default TGL, DG2... anyways.
> > Why have different hardware behavior between KMDs then?
> 
> For the older platforms we will always be behind the force_probe.
> So we will be able to change the uapi behavior later.
> 
> However, I tend to agree with Matt and Lucas that it would be better
> if we align with the future, instead of trying to align with i915
> first.
> 
> But I'd like to hear from Jose, how much of a trouble this is for the
> user space? What are the advantages of aligning with i915's current
> behavior first and then move later?
> 
> If you noticed all other uapi, there's not a much of alignment with
> i915 anyway. Why should we align on this?

i915 x Xe changes don't affect the files and functions that fill the batch buffers in Mesa.
Changing the behavior of MI_BATCH_BUFFER_START would be the first change that actually goes to batch buffers.
Minimal batch buffer changes means less surface to debug in issues that only happens with Xe KMD.

That is why I would like to postpone this behavior change to when Xe KMD and UMDs support to it are more stable.

> 
> Thanks,
> Rodrigo.
> 
> > 
> > > 
> > > Lucas De Marchi
> > > 
> > > > 
> > > > > 
> > > > > 
> > > > > Matt
> > > > > 
> > > > > > 
> > > > > > v2:
> > > > > > - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> > > > > > - move fake workaround to lrc_additional_programming table
> > > > > > 
> > > > > > Bspec: 45974, 45718
> > > > > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > > Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/xe/xe_gt.c |  1 +
> > > > > >  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
> > > > > >  drivers/gpu/drm/xe/xe_wa.h |  1 +
> > > > > >  3 files changed, 30 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > index 84a73eeccd297..5d07e1e7bd506 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
> > > > > > 
> > > > > >  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
> > > > > >  		xe_wa_process_lrc(hwe);
> > > > > > +		xe_wa_process_lrc_additional_programming(hwe);
> > > > > > 
> > > > > >  		default_lrc = drmm_kzalloc(&xe->drm,
> > > > > >  					   xe_lrc_size(xe, hwe->class),
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> > > > > > index 3325de3edf691..744b7d0982683 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_wa.c
> > > > > > +++ b/drivers/gpu/drm/xe/xe_wa.c
> > > > > > @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
> > > > > >  	{}
> > > > > >  };
> > > > > > 
> > > > > > +static const struct xe_rtp_entry lrc_additional_programming[] = {
> > > > > > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
> > > > > > +	  /*
> > > > > > +	   * This is a "fake" workaround defined by software to ensure we
> > > > > > +	   * maintain reliable, backward-compatible behavior for userspace with
> > > > > > +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> > > > > > +	   */
> > > > > > +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> > > > > > +	  XE_RTP_CLR(RING_MI_MODE(0),
> > > > > > +		     TGL_NESTED_BB_EN,
> > > > > > +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> > > > > > +	},
> > > > > > +	{}
> > > > > > +};
> > > > > > +
> > > > > >  static const struct xe_rtp_entry register_whitelist[] = {
> > > > > >  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
> > > > > >  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> > > > > > @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
> > > > > >  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
> > > > > >  }
> > > > > > 
> > > > > > +/**
> > > > > > + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> > > > > > + * table
> > > > > > + * @hwe: engine instance to process workarounds for
> > > > > > + *
> > > > > > + * Process additional context programming table for this platform, saving in
> > > > > > + * @hwe all the registers changes that need to be applied on context restore.
> > > > > > + */
> > > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> > > > > > +{
> > > > > > +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> > > > > > +}
> > > > > > +
> > > > > >  /**
> > > > > >   * xe_reg_whitelist_process_engine - process table of registers to whitelist
> > > > > >   * @hwe: engine instance to process whitelist for
> > > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> > > > > > index 1a0659690a320..872f3e4ddc73c 100644
> > > > > > --- a/drivers/gpu/drm/xe/xe_wa.h
> > > > > > +++ b/drivers/gpu/drm/xe/xe_wa.h
> > > > > > @@ -12,6 +12,7 @@ struct xe_hw_engine;
> > > > > >  void xe_wa_process_gt(struct xe_gt *gt);
> > > > > >  void xe_wa_process_engine(struct xe_hw_engine *hwe);
> > > > > >  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> > > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
> > > > > > 
> > > > > >  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
> > > > > >  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
> > > > > > --
> > > > > > 2.39.1
> > > > > > 
> > > > > 
> > > > 
> > 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-02-02 14:02           ` Souza, Jose
@ 2023-02-02 23:14             ` Matt Roper
  2023-02-06 22:09               ` Rodrigo Vivi
  0 siblings, 1 reply; 14+ messages in thread
From: Matt Roper @ 2023-02-02 23:14 UTC (permalink / raw)
  To: Souza, Jose; +Cc: De Marchi, Lucas, intel-xe, Vivi, Rodrigo

On Thu, Feb 02, 2023 at 02:02:28PM +0000, Souza, Jose wrote:
> On Wed, 2023-02-01 at 16:20 -0500, Rodrigo Vivi wrote:
> > On Mon, Jan 30, 2023 at 07:43:58PM +0000, Souza, Jose wrote:
> > > On Mon, 2023-01-30 at 11:27 -0800, Lucas De Marchi wrote:
> > > > On Mon, Jan 30, 2023 at 10:04:25AM -0800, Jose Souza wrote:
> > > > > On Mon, 2023-01-30 at 09:15 -0800, Matt Roper wrote:
> > > > > > On Mon, Jan 30, 2023 at 08:17:23AM -0800, José Roberto de Souza wrote:
> > > > > > > i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> > > > > > > nested batch buffer behavior in DG2 and newer platforms to the same
> > > > > > > behavior as older platforms.
> > > > > > > 
> > > > > > > So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> > > > > > > chained batch buffer level.
> > > > > > 
> > > > > > I was kind of assuming we'd just drop this setting for the Xe driver.  I
> > > > > > believe hardware will be removing the option to turn off nested
> > > > > > batchbuffers in an upcoming platform, so userspace is going to have to
> > > > > > adapt to the new behavior soon anyway; doing it while moving to a new
> > > > > > KMD seems like the easiest time to make that happen since the UMDs are
> > > > > > already updating their programming models.
> > > > > 
> > > > > This would bring even more changes to track when debugging issues in Xe KMD port.
> > > > > Better do this after Xe KMD is stabilized and with better CI coverage in UMDs and KMDs.
> > > > 
> > > > but if we start supporting these platforms with nested bb disabled,
> > > > we can't change it later. At least not for the older platforms that had
> > > > it that way.
> > > 
> > > Xe KMD will not support by default TGL, DG2... anyways.
> > > Why have different hardware behavior between KMDs then?
> > 
> > For the older platforms we will always be behind the force_probe.
> > So we will be able to change the uapi behavior later.
> > 
> > However, I tend to agree with Matt and Lucas that it would be better
> > if we align with the future, instead of trying to align with i915
> > first.
> > 
> > But I'd like to hear from Jose, how much of a trouble this is for the
> > user space? What are the advantages of aligning with i915's current
> > behavior first and then move later?
> > 
> > If you noticed all other uapi, there's not a much of alignment with
> > i915 anyway. Why should we align on this?
> 
> i915 x Xe changes don't affect the files and functions that fill the batch buffers in Mesa.
> Changing the behavior of MI_BATCH_BUFFER_START would be the first change that actually goes to batch buffers.
> Minimal batch buffer changes means less surface to debug in issues that only happens with Xe KMD.
> 
> That is why I would like to postpone this behavior change to when Xe KMD and UMDs support to it are more stable.

The other thing to consider here is that the Xe driver will presumably
be adding SRIOV support soon.  If our behavior differs from the Windows
driver, then I believe userspace Windows software running in a VF under
a Linux PF could potentially see unexpected behavior.


Matt

> 
> > 
> > Thanks,
> > Rodrigo.
> > 
> > > 
> > > > 
> > > > Lucas De Marchi
> > > > 
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > Matt
> > > > > > 
> > > > > > > 
> > > > > > > v2:
> > > > > > > - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> > > > > > > - move fake workaround to lrc_additional_programming table
> > > > > > > 
> > > > > > > Bspec: 45974, 45718
> > > > > > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > > > Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/xe/xe_gt.c |  1 +
> > > > > > >  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
> > > > > > >  drivers/gpu/drm/xe/xe_wa.h |  1 +
> > > > > > >  3 files changed, 30 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > > index 84a73eeccd297..5d07e1e7bd506 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > > @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
> > > > > > > 
> > > > > > >  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
> > > > > > >  		xe_wa_process_lrc(hwe);
> > > > > > > +		xe_wa_process_lrc_additional_programming(hwe);
> > > > > > > 
> > > > > > >  		default_lrc = drmm_kzalloc(&xe->drm,
> > > > > > >  					   xe_lrc_size(xe, hwe->class),
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> > > > > > > index 3325de3edf691..744b7d0982683 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_wa.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_wa.c
> > > > > > > @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
> > > > > > >  	{}
> > > > > > >  };
> > > > > > > 
> > > > > > > +static const struct xe_rtp_entry lrc_additional_programming[] = {
> > > > > > > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
> > > > > > > +	  /*
> > > > > > > +	   * This is a "fake" workaround defined by software to ensure we
> > > > > > > +	   * maintain reliable, backward-compatible behavior for userspace with
> > > > > > > +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> > > > > > > +	   */
> > > > > > > +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> > > > > > > +	  XE_RTP_CLR(RING_MI_MODE(0),
> > > > > > > +		     TGL_NESTED_BB_EN,
> > > > > > > +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> > > > > > > +	},
> > > > > > > +	{}
> > > > > > > +};
> > > > > > > +
> > > > > > >  static const struct xe_rtp_entry register_whitelist[] = {
> > > > > > >  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
> > > > > > >  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> > > > > > > @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
> > > > > > >  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
> > > > > > >  }
> > > > > > > 
> > > > > > > +/**
> > > > > > > + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> > > > > > > + * table
> > > > > > > + * @hwe: engine instance to process workarounds for
> > > > > > > + *
> > > > > > > + * Process additional context programming table for this platform, saving in
> > > > > > > + * @hwe all the registers changes that need to be applied on context restore.
> > > > > > > + */
> > > > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> > > > > > > +{
> > > > > > > +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> > > > > > > +}
> > > > > > > +
> > > > > > >  /**
> > > > > > >   * xe_reg_whitelist_process_engine - process table of registers to whitelist
> > > > > > >   * @hwe: engine instance to process whitelist for
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> > > > > > > index 1a0659690a320..872f3e4ddc73c 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_wa.h
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_wa.h
> > > > > > > @@ -12,6 +12,7 @@ struct xe_hw_engine;
> > > > > > >  void xe_wa_process_gt(struct xe_gt *gt);
> > > > > > >  void xe_wa_process_engine(struct xe_hw_engine *hwe);
> > > > > > >  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> > > > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
> > > > > > > 
> > > > > > >  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
> > > > > > >  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
> > > > > > > --
> > > > > > > 2.39.1
> > > > > > > 
> > > > > > 
> > > > > 
> > > 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START
  2023-02-02 23:14             ` Matt Roper
@ 2023-02-06 22:09               ` Rodrigo Vivi
  0 siblings, 0 replies; 14+ messages in thread
From: Rodrigo Vivi @ 2023-02-06 22:09 UTC (permalink / raw)
  To: Matt Roper; +Cc: De Marchi, Lucas, intel-xe

On Thu, Feb 02, 2023 at 03:14:36PM -0800, Matt Roper wrote:
> On Thu, Feb 02, 2023 at 02:02:28PM +0000, Souza, Jose wrote:
> > On Wed, 2023-02-01 at 16:20 -0500, Rodrigo Vivi wrote:
> > > On Mon, Jan 30, 2023 at 07:43:58PM +0000, Souza, Jose wrote:
> > > > On Mon, 2023-01-30 at 11:27 -0800, Lucas De Marchi wrote:
> > > > > On Mon, Jan 30, 2023 at 10:04:25AM -0800, Jose Souza wrote:
> > > > > > On Mon, 2023-01-30 at 09:15 -0800, Matt Roper wrote:
> > > > > > > On Mon, Jan 30, 2023 at 08:17:23AM -0800, José Roberto de Souza wrote:
> > > > > > > > i915 has the same fake workaround to return MI_BATCH_BUFFER_START
> > > > > > > > nested batch buffer behavior in DG2 and newer platforms to the same
> > > > > > > > behavior as older platforms.
> > > > > > > > 
> > > > > > > > So here cleaning up TGL_NESTED_BB_EN in MI_MODE to disable third level
> > > > > > > > chained batch buffer level.
> > > > > > > 
> > > > > > > I was kind of assuming we'd just drop this setting for the Xe driver.  I
> > > > > > > believe hardware will be removing the option to turn off nested
> > > > > > > batchbuffers in an upcoming platform, so userspace is going to have to
> > > > > > > adapt to the new behavior soon anyway; doing it while moving to a new
> > > > > > > KMD seems like the easiest time to make that happen since the UMDs are
> > > > > > > already updating their programming models.
> > > > > > 
> > > > > > This would bring even more changes to track when debugging issues in Xe KMD port.
> > > > > > Better do this after Xe KMD is stabilized and with better CI coverage in UMDs and KMDs.
> > > > > 
> > > > > but if we start supporting these platforms with nested bb disabled,
> > > > > we can't change it later. At least not for the older platforms that had
> > > > > it that way.
> > > > 
> > > > Xe KMD will not support by default TGL, DG2... anyways.
> > > > Why have different hardware behavior between KMDs then?
> > > 
> > > For the older platforms we will always be behind the force_probe.
> > > So we will be able to change the uapi behavior later.
> > > 
> > > However, I tend to agree with Matt and Lucas that it would be better
> > > if we align with the future, instead of trying to align with i915
> > > first.
> > > 
> > > But I'd like to hear from Jose, how much of a trouble this is for the
> > > user space? What are the advantages of aligning with i915's current
> > > behavior first and then move later?
> > > 
> > > If you noticed all other uapi, there's not a much of alignment with
> > > i915 anyway. Why should we align on this?
> > 
> > i915 x Xe changes don't affect the files and functions that fill the batch buffers in Mesa.
> > Changing the behavior of MI_BATCH_BUFFER_START would be the first change that actually goes to batch buffers.
> > Minimal batch buffer changes means less surface to debug in issues that only happens with Xe KMD.
> > 
> > That is why I would like to postpone this behavior change to when Xe KMD and UMDs support to it are more stable.
> 
> The other thing to consider here is that the Xe driver will presumably
> be adding SRIOV support soon.  If our behavior differs from the Windows
> driver, then I believe userspace Windows software running in a VF under
> a Linux PF could potentially see unexpected behavior.

hmm good point. Although we are protected by the force_probe and still
not integrated to the tree yet it looks to me that the right thing to do
is to align with the final api as soon as we can.

> 
> 
> Matt
> 
> > 
> > > 
> > > Thanks,
> > > Rodrigo.
> > > 
> > > > 
> > > > > 
> > > > > Lucas De Marchi
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Matt
> > > > > > > 
> > > > > > > > 
> > > > > > > > v2:
> > > > > > > > - replace IP_VERSION_FOREVER by XE_RTP_END_VERSION_UNDEFINED
> > > > > > > > - move fake workaround to lrc_additional_programming table
> > > > > > > > 
> > > > > > > > Bspec: 45974, 45718
> > > > > > > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > > > > > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > > > > > > Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
> > > > > > > > ---
> > > > > > > >  drivers/gpu/drm/xe/xe_gt.c |  1 +
> > > > > > > >  drivers/gpu/drm/xe/xe_wa.c | 28 ++++++++++++++++++++++++++++
> > > > > > > >  drivers/gpu/drm/xe/xe_wa.h |  1 +
> > > > > > > >  3 files changed, 30 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > > > index 84a73eeccd297..5d07e1e7bd506 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > > > > > > @@ -311,6 +311,7 @@ int xe_gt_record_default_lrcs(struct xe_gt *gt)
> > > > > > > > 
> > > > > > > >  		xe_reg_sr_init(&hwe->reg_lrc, "LRC", xe);
> > > > > > > >  		xe_wa_process_lrc(hwe);
> > > > > > > > +		xe_wa_process_lrc_additional_programming(hwe);
> > > > > > > > 
> > > > > > > >  		default_lrc = drmm_kzalloc(&xe->drm,
> > > > > > > >  					   xe_lrc_size(xe, hwe->class),
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.c b/drivers/gpu/drm/xe/xe_wa.c
> > > > > > > > index 3325de3edf691..744b7d0982683 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_wa.c
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_wa.c
> > > > > > > > @@ -288,6 +288,21 @@ static const struct xe_rtp_entry lrc_was[] = {
> > > > > > > >  	{}
> > > > > > > >  };
> > > > > > > > 
> > > > > > > > +static const struct xe_rtp_entry lrc_additional_programming[] = {
> > > > > > > > +	{ XE_RTP_NAME("FakeWaDisableNestedBBMode"),
> > > > > > > > +	  /*
> > > > > > > > +	   * This is a "fake" workaround defined by software to ensure we
> > > > > > > > +	   * maintain reliable, backward-compatible behavior for userspace with
> > > > > > > > +	   * regards to how nested MI_BATCH_BUFFER_START commands are handled.
> > > > > > > > +	   */
> > > > > > > > +	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1255, XE_RTP_END_VERSION_UNDEFINED)),
> > > > > > > > +	  XE_RTP_CLR(RING_MI_MODE(0),
> > > > > > > > +		     TGL_NESTED_BB_EN,
> > > > > > > > +		     XE_RTP_FLAG(MASKED_REG, ENGINE_BASE))
> > > > > > > > +	},
> > > > > > > > +	{}
> > > > > > > > +};
> > > > > > > > +
> > > > > > > >  static const struct xe_rtp_entry register_whitelist[] = {
> > > > > > > >  	{ XE_RTP_NAME("WaAllowPMDepthAndInvocationCountAccessFromUMD, 1408556865"),
> > > > > > > >  	  XE_RTP_RULES(GRAPHICS_VERSION_RANGE(1200, 1210), ENGINE_CLASS(RENDER)),
> > > > > > > > @@ -362,6 +377,19 @@ void xe_wa_process_lrc(struct xe_hw_engine *hwe)
> > > > > > > >  	xe_rtp_process(lrc_was, &hwe->reg_lrc, hwe->gt, hwe);
> > > > > > > >  }
> > > > > > > > 
> > > > > > > > +/**
> > > > > > > > + * xe_wa_process_lrc_additional_programming - process additional LRC programming
> > > > > > > > + * table
> > > > > > > > + * @hwe: engine instance to process workarounds for
> > > > > > > > + *
> > > > > > > > + * Process additional context programming table for this platform, saving in
> > > > > > > > + * @hwe all the registers changes that need to be applied on context restore.
> > > > > > > > + */
> > > > > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe)
> > > > > > > > +{
> > > > > > > > +	xe_rtp_process(lrc_additional_programming, &hwe->reg_lrc, hwe->gt, hwe);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  /**
> > > > > > > >   * xe_reg_whitelist_process_engine - process table of registers to whitelist
> > > > > > > >   * @hwe: engine instance to process whitelist for
> > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_wa.h b/drivers/gpu/drm/xe/xe_wa.h
> > > > > > > > index 1a0659690a320..872f3e4ddc73c 100644
> > > > > > > > --- a/drivers/gpu/drm/xe/xe_wa.h
> > > > > > > > +++ b/drivers/gpu/drm/xe/xe_wa.h
> > > > > > > > @@ -12,6 +12,7 @@ struct xe_hw_engine;
> > > > > > > >  void xe_wa_process_gt(struct xe_gt *gt);
> > > > > > > >  void xe_wa_process_engine(struct xe_hw_engine *hwe);
> > > > > > > >  void xe_wa_process_lrc(struct xe_hw_engine *hwe);
> > > > > > > > +void xe_wa_process_lrc_additional_programming(struct xe_hw_engine *hwe);
> > > > > > > > 
> > > > > > > >  void xe_reg_whitelist_process_engine(struct xe_hw_engine *hwe);
> > > > > > > >  void xe_reg_whitelist_apply(struct xe_hw_engine *hwe);
> > > > > > > > --
> > > > > > > > 2.39.1
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > 
> > 
> 
> -- 
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-02-06 22:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-30 16:17 [Intel-xe] [PATCH v2] drm/xe: Add fake workaround to maintain backward compatible in MI_BATCH_BUFFER_START José Roberto de Souza
2023-01-30 16:31 ` Lucas De Marchi
2023-01-30 17:15 ` Matt Roper
2023-01-30 18:04   ` Souza, Jose
2023-01-30 19:27     ` Lucas De Marchi
2023-01-30 19:43       ` Souza, Jose
2023-02-01 21:20         ` Rodrigo Vivi
2023-02-02 14:02           ` Souza, Jose
2023-02-02 23:14             ` Matt Roper
2023-02-06 22:09               ` Rodrigo Vivi
2023-01-31  7:24 ` Jani Nikula
2023-01-31 13:50   ` Souza, Jose
2023-01-31 14:04     ` Jani Nikula
2023-01-31 15:31       ` Lucas De Marchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.