All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-19 19:42 ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: tvrtko.ursulin, thomas.hellstrom, michael.cheng, wayne.boyer,
	daniel.vetter, casey.g.bowman, lucas.demarchi, dri-devel, chris

To align with the discussion in [1][2], this patch series drops all usage of
wbvind_on_all_cpus within i915 by either replacing the call with certain
drm clflush helpers, or reverting to a previous logic.

[1]. https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html
[2]. https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5

Michael Cheng (4):
  i915/gem: drop wbinvd_on_all_cpus usage
  Revert "drm/i915/gem: Almagamate clflushes on suspend"
  i915/gem: Revert i915_gem_freeze to previous logic
  drm/i915/gt: Revert ggtt_resume to previous logic

 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56 ++++++++++++++--------
 drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
 drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
 4 files changed, 46 insertions(+), 38 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-19 19:42 ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: thomas.hellstrom, michael.cheng, daniel.vetter, lucas.demarchi,
	dri-devel, chris

To align with the discussion in [1][2], this patch series drops all usage of
wbvind_on_all_cpus within i915 by either replacing the call with certain
drm clflush helpers, or reverting to a previous logic.

[1]. https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html
[2]. https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5

Michael Cheng (4):
  i915/gem: drop wbinvd_on_all_cpus usage
  Revert "drm/i915/gem: Almagamate clflushes on suspend"
  i915/gem: Revert i915_gem_freeze to previous logic
  drm/i915/gt: Revert ggtt_resume to previous logic

 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56 ++++++++++++++--------
 drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
 drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
 4 files changed, 46 insertions(+), 38 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
  2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
@ 2022-03-19 19:42   ` Michael Cheng
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: tvrtko.ursulin, thomas.hellstrom, michael.cheng, wayne.boyer,
	daniel.vetter, casey.g.bowman, lucas.demarchi, dri-devel, chris

Previous concern with using drm_clflush_sg was that we don't know what the
sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
everything at once to avoid paranoia.

To make i915 more architecture-neutral and be less paranoid, lets attempt to
use drm_clflush_sg to flush the pages for when the GPU wants to read
from main memory.

Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..b0a5baaebc43 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -8,6 +8,7 @@
 #include <linux/highmem.h>
 #include <linux/dma-resv.h>
 #include <linux/module.h>
+#include <drm/drm_cache.h>
 
 #include <asm/smp.h>
 
@@ -250,16 +251,10 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
 	 * DG1 is special here since it still snoops transactions even with
 	 * CACHE_NONE. This is not the case with other HAS_SNOOP platforms. We
 	 * might need to revisit this as we add new discrete platforms.
-	 *
-	 * XXX: Consider doing a vmap flush or something, where possible.
-	 * Currently we just do a heavy handed wbinvd_on_all_cpus() here since
-	 * the underlying sg_table might not even point to struct pages, so we
-	 * can't just call drm_clflush_sg or similar, like we do elsewhere in
-	 * the driver.
 	 */
 	if (i915_gem_object_can_bypass_llc(obj) ||
 	    (!HAS_LLC(i915) && !IS_DG1(i915)))
-		wbinvd_on_all_cpus();
+		drm_clflush_sg(pages);
 
 	sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
 	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
@ 2022-03-19 19:42   ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: thomas.hellstrom, michael.cheng, daniel.vetter, lucas.demarchi,
	dri-devel, chris

Previous concern with using drm_clflush_sg was that we don't know what the
sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
everything at once to avoid paranoia.

To make i915 more architecture-neutral and be less paranoid, lets attempt to
use drm_clflush_sg to flush the pages for when the GPU wants to read
from main memory.

Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..b0a5baaebc43 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -8,6 +8,7 @@
 #include <linux/highmem.h>
 #include <linux/dma-resv.h>
 #include <linux/module.h>
+#include <drm/drm_cache.h>
 
 #include <asm/smp.h>
 
@@ -250,16 +251,10 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
 	 * DG1 is special here since it still snoops transactions even with
 	 * CACHE_NONE. This is not the case with other HAS_SNOOP platforms. We
 	 * might need to revisit this as we add new discrete platforms.
-	 *
-	 * XXX: Consider doing a vmap flush or something, where possible.
-	 * Currently we just do a heavy handed wbinvd_on_all_cpus() here since
-	 * the underlying sg_table might not even point to struct pages, so we
-	 * can't just call drm_clflush_sg or similar, like we do elsewhere in
-	 * the driver.
 	 */
 	if (i915_gem_object_can_bypass_llc(obj) ||
 	    (!HAS_LLC(i915) && !IS_DG1(i915)))
-		wbinvd_on_all_cpus();
+		drm_clflush_sg(pages);
 
 	sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
 	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 2/4] Revert "drm/i915/gem: Almagamate clflushes on suspend"
  2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
@ 2022-03-19 19:42   ` Michael Cheng
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: tvrtko.ursulin, thomas.hellstrom, michael.cheng, wayne.boyer,
	daniel.vetter, casey.g.bowman, lucas.demarchi, dri-devel, chris

As we are making i915 more architecture-neutral, lets revert this commit
to the previous logic [1] to avoid using wbinvd_on_all_cpus.

[1]. ac05a22cd07a ("drm/i915/gem: Almagamate clflushes on suspend")

Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c | 41 +++++++++++++++++---------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 00359ec9d58b..3f20961bb59b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -13,13 +13,6 @@
 #include "i915_driver.h"
 #include "i915_drv.h"
 
-#if defined(CONFIG_X86)
-#include <asm/smp.h>
-#else
-#define wbinvd_on_all_cpus() \
-	pr_warn(DRIVER_NAME ": Missing cache flush in %s\n", __func__)
-#endif
-
 void i915_gem_suspend(struct drm_i915_private *i915)
 {
 	GEM_TRACE("%s\n", dev_name(i915->drm.dev));
@@ -123,6 +116,13 @@ int i915_gem_backup_suspend(struct drm_i915_private *i915)
 	return ret;
 }
 
+static struct drm_i915_gem_object *first_mm_object(struct list_head *list)
+{
+	return list_first_entry_or_null(list,
+					struct drm_i915_gem_object,
+					mm.link);
+}
+
 void i915_gem_suspend_late(struct drm_i915_private *i915)
 {
 	struct drm_i915_gem_object *obj;
@@ -132,7 +132,6 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 		NULL
 	}, **phase;
 	unsigned long flags;
-	bool flush = false;
 
 	/*
 	 * Neither the BIOS, ourselves or any other kernel
@@ -158,15 +157,29 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 
 	spin_lock_irqsave(&i915->mm.obj_lock, flags);
 	for (phase = phases; *phase; phase++) {
-		list_for_each_entry(obj, *phase, mm.link) {
-			if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ))
-				flush |= (obj->read_domains & I915_GEM_DOMAIN_CPU) == 0;
-			__start_cpu_write(obj); /* presume auto-hibernate */
+		LIST_HEAD(keep);
+
+		while ((obj = first_mm_object(*phase))) {
+			list_move_tail(&obj->mm.link, &keep);
+
+			/* Beware the background _i915_gem_free_objects */
+			if (!kref_get_unless_zero(&obj->base.refcount))
+				continue;
+
+			spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
+
+			i915_gem_object_lock(obj, NULL);
+			drm_WARN_ON(&i915->drm,
+			    i915_gem_object_set_to_gtt_domain(obj, false));
+			i915_gem_object_unlock(obj);
+			i915_gem_object_put(obj);
+
+			spin_lock_irqsave(&i915->mm.obj_lock, flags);
 		}
+
+		list_splice_tail(&keep, *phase);
 	}
 	spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
-	if (flush)
-		wbinvd_on_all_cpus();
 }
 
 int i915_gem_freeze(struct drm_i915_private *i915)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-gfx] [PATCH 2/4] Revert "drm/i915/gem: Almagamate clflushes on suspend"
@ 2022-03-19 19:42   ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: thomas.hellstrom, michael.cheng, daniel.vetter, lucas.demarchi,
	dri-devel, chris

As we are making i915 more architecture-neutral, lets revert this commit
to the previous logic [1] to avoid using wbinvd_on_all_cpus.

[1]. ac05a22cd07a ("drm/i915/gem: Almagamate clflushes on suspend")

Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c | 41 +++++++++++++++++---------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 00359ec9d58b..3f20961bb59b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -13,13 +13,6 @@
 #include "i915_driver.h"
 #include "i915_drv.h"
 
-#if defined(CONFIG_X86)
-#include <asm/smp.h>
-#else
-#define wbinvd_on_all_cpus() \
-	pr_warn(DRIVER_NAME ": Missing cache flush in %s\n", __func__)
-#endif
-
 void i915_gem_suspend(struct drm_i915_private *i915)
 {
 	GEM_TRACE("%s\n", dev_name(i915->drm.dev));
@@ -123,6 +116,13 @@ int i915_gem_backup_suspend(struct drm_i915_private *i915)
 	return ret;
 }
 
+static struct drm_i915_gem_object *first_mm_object(struct list_head *list)
+{
+	return list_first_entry_or_null(list,
+					struct drm_i915_gem_object,
+					mm.link);
+}
+
 void i915_gem_suspend_late(struct drm_i915_private *i915)
 {
 	struct drm_i915_gem_object *obj;
@@ -132,7 +132,6 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 		NULL
 	}, **phase;
 	unsigned long flags;
-	bool flush = false;
 
 	/*
 	 * Neither the BIOS, ourselves or any other kernel
@@ -158,15 +157,29 @@ void i915_gem_suspend_late(struct drm_i915_private *i915)
 
 	spin_lock_irqsave(&i915->mm.obj_lock, flags);
 	for (phase = phases; *phase; phase++) {
-		list_for_each_entry(obj, *phase, mm.link) {
-			if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ))
-				flush |= (obj->read_domains & I915_GEM_DOMAIN_CPU) == 0;
-			__start_cpu_write(obj); /* presume auto-hibernate */
+		LIST_HEAD(keep);
+
+		while ((obj = first_mm_object(*phase))) {
+			list_move_tail(&obj->mm.link, &keep);
+
+			/* Beware the background _i915_gem_free_objects */
+			if (!kref_get_unless_zero(&obj->base.refcount))
+				continue;
+
+			spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
+
+			i915_gem_object_lock(obj, NULL);
+			drm_WARN_ON(&i915->drm,
+			    i915_gem_object_set_to_gtt_domain(obj, false));
+			i915_gem_object_unlock(obj);
+			i915_gem_object_put(obj);
+
+			spin_lock_irqsave(&i915->mm.obj_lock, flags);
 		}
+
+		list_splice_tail(&keep, *phase);
 	}
 	spin_unlock_irqrestore(&i915->mm.obj_lock, flags);
-	if (flush)
-		wbinvd_on_all_cpus();
 }
 
 int i915_gem_freeze(struct drm_i915_private *i915)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 3/4] i915/gem: Revert i915_gem_freeze to previous logic
  2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
@ 2022-03-19 19:42   ` Michael Cheng
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: tvrtko.ursulin, thomas.hellstrom, michael.cheng, wayne.boyer,
	daniel.vetter, casey.g.bowman, lucas.demarchi, dri-devel, chris

This patch reverts i915_gem_freeze to previous logic [1] to avoid using
wbinvd_on_all_cpus.

[1]. https://patchwork.freedesktop.org/patch/415007/?series=86058&rev=2

Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 3f20961bb59b..f78f2f004d6c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -212,13 +212,18 @@ int i915_gem_freeze_late(struct drm_i915_private *i915)
 	 * the objects as well, see i915_gem_freeze()
 	 */
 
-	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
-		i915_gem_shrink(NULL, i915, -1UL, NULL, ~0);
+	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
+	i915_gem_shrink(NULL, i915, -1UL, NULL, ~0);
 	i915_gem_drain_freed_objects(i915);
 
-	wbinvd_on_all_cpus();
-	list_for_each_entry(obj, &i915->mm.shrink_list, mm.link)
-		__start_cpu_write(obj);
+	list_for_each_entry(obj, &i915->mm.shrink_list, mm.link) {
+		i915_gem_object_lock(obj, NULL);
+		drm_WARN_ON(&i915->drm,
+			i915_gem_object_set_to_cpu_domain(obj, true));
+		i915_gem_object_unlock(obj);
+	}
+
+	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-gfx] [PATCH 3/4] i915/gem: Revert i915_gem_freeze to previous logic
@ 2022-03-19 19:42   ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: thomas.hellstrom, michael.cheng, daniel.vetter, lucas.demarchi,
	dri-devel, chris

This patch reverts i915_gem_freeze to previous logic [1] to avoid using
wbinvd_on_all_cpus.

[1]. https://patchwork.freedesktop.org/patch/415007/?series=86058&rev=2

Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_pm.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
index 3f20961bb59b..f78f2f004d6c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c
@@ -212,13 +212,18 @@ int i915_gem_freeze_late(struct drm_i915_private *i915)
 	 * the objects as well, see i915_gem_freeze()
 	 */
 
-	with_intel_runtime_pm(&i915->runtime_pm, wakeref)
-		i915_gem_shrink(NULL, i915, -1UL, NULL, ~0);
+	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
+	i915_gem_shrink(NULL, i915, -1UL, NULL, ~0);
 	i915_gem_drain_freed_objects(i915);
 
-	wbinvd_on_all_cpus();
-	list_for_each_entry(obj, &i915->mm.shrink_list, mm.link)
-		__start_cpu_write(obj);
+	list_for_each_entry(obj, &i915->mm.shrink_list, mm.link) {
+		i915_gem_object_lock(obj, NULL);
+		drm_WARN_ON(&i915->drm,
+			i915_gem_object_set_to_cpu_domain(obj, true));
+		i915_gem_object_unlock(obj);
+	}
+
+	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH 4/4] drm/i915/gt: Revert ggtt_resume to previous logic
  2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
@ 2022-03-19 19:42   ` Michael Cheng
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: tvrtko.ursulin, thomas.hellstrom, michael.cheng, wayne.boyer,
	daniel.vetter, casey.g.bowman, lucas.demarchi, dri-devel, chris

To avoid having to call wbinvd_on_all_cpus, revert i915_ggtt_resume and
i915_ggtt_resume_vm to previous logic [1].

[1]. 64b95df91f44 drm/i915: Assume exclusive access to objects inside resume

Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c | 17 ++++++-----------
 drivers/gpu/drm/i915/gt/intel_gtt.h  |  2 +-
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 04191fe2ee34..811bfd9d8d80 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -1305,10 +1305,9 @@ void i915_ggtt_disable_guc(struct i915_ggtt *ggtt)
  * Returns %true if restoring the mapping for any object that was in a write
  * domain before suspend.
  */
-bool i915_ggtt_resume_vm(struct i915_address_space *vm)
+void i915_ggtt_resume_vm(struct i915_address_space *vm)
 {
 	struct i915_vma *vma;
-	bool write_domain_objs = false;
 
 	drm_WARN_ON(&vm->i915->drm, !vm->is_ggtt && !vm->is_dpt);
 
@@ -1325,28 +1324,24 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
 		vma->ops->bind_vma(vm, NULL, vma->resource,
 				   obj ? obj->cache_level : 0,
 				   was_bound);
-		if (obj) { /* only used during resume => exclusive access */
-			write_domain_objs |= fetch_and_zero(&obj->write_domain);
-			obj->read_domains |= I915_GEM_DOMAIN_GTT;
+		if (obj) {
+			i915_gem_object_lock(obj, NULL);
+			WARN_ON(i915_gem_object_set_to_gtt_domain(obj, false));
+			i915_gem_object_unlock(obj);
 		}
 	}
 
-	return write_domain_objs;
 }
 
 void i915_ggtt_resume(struct i915_ggtt *ggtt)
 {
-	bool flush;
 
 	intel_gt_check_and_clear_faults(ggtt->vm.gt);
 
-	flush = i915_ggtt_resume_vm(&ggtt->vm);
+	i915_ggtt_resume_vm(&ggtt->vm);
 
 	ggtt->invalidate(ggtt);
 
-	if (flush)
-		wbinvd_on_all_cpus();
-
 	if (GRAPHICS_VER(ggtt->vm.i915) >= 8)
 		setup_private_pat(ggtt->vm.gt->uncore);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 4529b5e9f6e6..c86092054988 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -567,7 +567,7 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
 				     unsigned long lmem_pt_obj_flags);
 
 void i915_ggtt_suspend_vm(struct i915_address_space *vm);
-bool i915_ggtt_resume_vm(struct i915_address_space *vm);
+void i915_ggtt_resume_vm(struct i915_address_space *vm);
 void i915_ggtt_suspend(struct i915_ggtt *gtt);
 void i915_ggtt_resume(struct i915_ggtt *ggtt);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-gfx] [PATCH 4/4] drm/i915/gt: Revert ggtt_resume to previous logic
@ 2022-03-19 19:42   ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-19 19:42 UTC (permalink / raw)
  To: intel-gfx
  Cc: thomas.hellstrom, michael.cheng, daniel.vetter, lucas.demarchi,
	dri-devel, chris

To avoid having to call wbinvd_on_all_cpus, revert i915_ggtt_resume and
i915_ggtt_resume_vm to previous logic [1].

[1]. 64b95df91f44 drm/i915: Assume exclusive access to objects inside resume

Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_ggtt.c | 17 ++++++-----------
 drivers/gpu/drm/i915/gt/intel_gtt.h  |  2 +-
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 04191fe2ee34..811bfd9d8d80 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -1305,10 +1305,9 @@ void i915_ggtt_disable_guc(struct i915_ggtt *ggtt)
  * Returns %true if restoring the mapping for any object that was in a write
  * domain before suspend.
  */
-bool i915_ggtt_resume_vm(struct i915_address_space *vm)
+void i915_ggtt_resume_vm(struct i915_address_space *vm)
 {
 	struct i915_vma *vma;
-	bool write_domain_objs = false;
 
 	drm_WARN_ON(&vm->i915->drm, !vm->is_ggtt && !vm->is_dpt);
 
@@ -1325,28 +1324,24 @@ bool i915_ggtt_resume_vm(struct i915_address_space *vm)
 		vma->ops->bind_vma(vm, NULL, vma->resource,
 				   obj ? obj->cache_level : 0,
 				   was_bound);
-		if (obj) { /* only used during resume => exclusive access */
-			write_domain_objs |= fetch_and_zero(&obj->write_domain);
-			obj->read_domains |= I915_GEM_DOMAIN_GTT;
+		if (obj) {
+			i915_gem_object_lock(obj, NULL);
+			WARN_ON(i915_gem_object_set_to_gtt_domain(obj, false));
+			i915_gem_object_unlock(obj);
 		}
 	}
 
-	return write_domain_objs;
 }
 
 void i915_ggtt_resume(struct i915_ggtt *ggtt)
 {
-	bool flush;
 
 	intel_gt_check_and_clear_faults(ggtt->vm.gt);
 
-	flush = i915_ggtt_resume_vm(&ggtt->vm);
+	i915_ggtt_resume_vm(&ggtt->vm);
 
 	ggtt->invalidate(ggtt);
 
-	if (flush)
-		wbinvd_on_all_cpus();
-
 	if (GRAPHICS_VER(ggtt->vm.i915) >= 8)
 		setup_private_pat(ggtt->vm.gt->uncore);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h
index 4529b5e9f6e6..c86092054988 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
@@ -567,7 +567,7 @@ struct i915_ppgtt *i915_ppgtt_create(struct intel_gt *gt,
 				     unsigned long lmem_pt_obj_flags);
 
 void i915_ggtt_suspend_vm(struct i915_address_space *vm);
-bool i915_ggtt_resume_vm(struct i915_address_space *vm);
+void i915_ggtt_resume_vm(struct i915_address_space *vm);
 void i915_ggtt_suspend(struct i915_ggtt *gtt);
 void i915_ggtt_resume(struct i915_ggtt *ggtt);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Drop wbinvd_on_all_cpus usage
  2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
                   ` (4 preceding siblings ...)
  (?)
@ 2022-03-19 20:15 ` Patchwork
  -1 siblings, 0 replies; 60+ messages in thread
From: Patchwork @ 2022-03-19 20:15 UTC (permalink / raw)
  To: Michael Cheng; +Cc: intel-gfx

== Series Details ==

Series: Drop wbinvd_on_all_cpus usage
URL   : https://patchwork.freedesktop.org/series/101560/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
a6e7b94ada85 i915/gem: drop wbinvd_on_all_cpus usage
-:10: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#10: 
To make i915 more architecture-neutral and be less paranoid, lets attempt to

total: 0 errors, 1 warnings, 0 checks, 24 lines checked
6f0d153682e1 Revert "drm/i915/gem: Almagamate clflushes on suspend"
-:9: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit ac05a22cd07a ("drm/i915/gem: Almagamate clflushes on suspend")'
#9: 
[1]. ac05a22cd07a ("drm/i915/gem: Almagamate clflushes on suspend")

-:75: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#75: FILE: drivers/gpu/drm/i915/gem/i915_gem_pm.c:173:
+			drm_WARN_ON(&i915->drm,
+			    i915_gem_object_set_to_gtt_domain(obj, false));

total: 1 errors, 0 warnings, 1 checks, 68 lines checked
edf596eb3f94 i915/gem: Revert i915_gem_freeze to previous logic
-:34: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#34: FILE: drivers/gpu/drm/i915/gem/i915_gem_pm.c:222:
+		drm_WARN_ON(&i915->drm,
+			i915_gem_object_set_to_cpu_domain(obj, true));

total: 0 errors, 0 warnings, 1 checks, 23 lines checked
4c9cb24c8fef drm/i915/gt: Revert ggtt_resume to previous logic
-:9: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#9: 
[1]. 64b95df91f44 drm/i915: Assume exclusive access to objects inside resume

-:9: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 64b95df91f44 ("drm/i915: Assume exclusive access to objects inside resume")'
#9: 
[1]. 64b95df91f44 drm/i915: Assume exclusive access to objects inside resume

total: 1 errors, 1 warnings, 0 checks, 52 lines checked



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Drop wbinvd_on_all_cpus usage
  2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
                   ` (5 preceding siblings ...)
  (?)
@ 2022-03-19 20:16 ` Patchwork
  -1 siblings, 0 replies; 60+ messages in thread
From: Patchwork @ 2022-03-19 20:16 UTC (permalink / raw)
  To: Michael Cheng; +Cc: intel-gfx

== Series Details ==

Series: Drop wbinvd_on_all_cpus usage
URL   : https://patchwork.freedesktop.org/series/101560/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.



^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Drop wbinvd_on_all_cpus usage
  2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
                   ` (6 preceding siblings ...)
  (?)
@ 2022-03-19 20:45 ` Patchwork
  -1 siblings, 0 replies; 60+ messages in thread
From: Patchwork @ 2022-03-19 20:45 UTC (permalink / raw)
  To: Michael Cheng; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 2657 bytes --]

== Series Details ==

Series: Drop wbinvd_on_all_cpus usage
URL   : https://patchwork.freedesktop.org/series/101560/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11385 -> Patchwork_22619
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/index.html

Participating hosts (47 -> 39)
------------------------------

  Missing    (8): fi-bdw-5557u shard-tglu bat-dg2-8 fi-bsw-cyan fi-pnv-d510 shard-rkl shard-dg1 fi-bdw-samus 

Known issues
------------

  Here are the changes found in Patchwork_22619 that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@kms_busy@basic@modeset:
    - {bat-adlp-6}:       [DMESG-WARN][1] ([i915#3576]) -> [PASS][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/bat-adlp-6/igt@kms_busy@basic@modeset.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/bat-adlp-6/igt@kms_busy@basic@modeset.html

  * igt@kms_cursor_legacy@basic-flip-before-cursor-varying-size:
    - {bat-adlm-1}:       [INCOMPLETE][3] -> [PASS][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/bat-adlm-1/igt@kms_cursor_legacy@basic-flip-before-cursor-varying-size.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/bat-adlm-1/igt@kms_cursor_legacy@basic-flip-before-cursor-varying-size.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#3576]: https://gitlab.freedesktop.org/drm/intel/issues/3576
  [i915#4258]: https://gitlab.freedesktop.org/drm/intel/issues/4258
  [i915#5185]: https://gitlab.freedesktop.org/drm/intel/issues/5185
  [i915#5193]: https://gitlab.freedesktop.org/drm/intel/issues/5193


Build changes
-------------

  * Linux: CI_DRM_11385 -> Patchwork_22619

  CI-20190529: 20190529
  CI_DRM_11385: 3babe046f5f5544ec772cd443f9d5ca24e342348 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6386: 0fcd59ad25b2960c0b654f90dfe4dd9e7c7b874d @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git
  Patchwork_22619: 4c9cb24c8fefe438004ca31c014f9755acdb8906 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

4c9cb24c8fef drm/i915/gt: Revert ggtt_resume to previous logic
edf596eb3f94 i915/gem: Revert i915_gem_freeze to previous logic
6f0d153682e1 Revert "drm/i915/gem: Almagamate clflushes on suspend"
a6e7b94ada85 i915/gem: drop wbinvd_on_all_cpus usage

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/index.html

[-- Attachment #2: Type: text/html, Size: 3116 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Intel-gfx] ✓ Fi.CI.IGT: success for Drop wbinvd_on_all_cpus usage
  2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
                   ` (7 preceding siblings ...)
  (?)
@ 2022-03-19 22:04 ` Patchwork
  -1 siblings, 0 replies; 60+ messages in thread
From: Patchwork @ 2022-03-19 22:04 UTC (permalink / raw)
  To: Michael Cheng; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 30253 bytes --]

== Series Details ==

Series: Drop wbinvd_on_all_cpus usage
URL   : https://patchwork.freedesktop.org/series/101560/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_11385_full -> Patchwork_22619_full
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  

Participating hosts (12 -> 13)
------------------------------

  Additional (1): shard-dg1 

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_22619_full:

### IGT changes ###

#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@gem_exec_suspend@basic-s0@smem:
    - {shard-dg1}:        NOTRUN -> [DMESG-FAIL][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-dg1-16/igt@gem_exec_suspend@basic-s0@smem.html

  * igt@gem_softpin@noreloc-s3:
    - {shard-dg1}:        NOTRUN -> [DMESG-WARN][2]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-dg1-15/igt@gem_softpin@noreloc-s3.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-indfb-fliptrack-mmap-gtt:
    - {shard-rkl}:        [PASS][3] -> [INCOMPLETE][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-6/igt@kms_frontbuffer_tracking@fbcpsr-1p-indfb-fliptrack-mmap-gtt.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-5/igt@kms_frontbuffer_tracking@fbcpsr-1p-indfb-fliptrack-mmap-gtt.html

  * igt@kms_vblank@pipe-b-ts-continuation-suspend:
    - {shard-dg1}:        NOTRUN -> [INCOMPLETE][5] +8 similar issues
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-dg1-19/igt@kms_vblank@pipe-b-ts-continuation-suspend.html

  * igt@perf@disabled-read-error:
    - {shard-rkl}:        NOTRUN -> [INCOMPLETE][6]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-5/igt@perf@disabled-read-error.html

  
Known issues
------------

  Here are the changes found in Patchwork_22619_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_shared@q-promotion@vecs0:
    - shard-skl:          [PASS][7] -> [DMESG-WARN][8] ([i915#1982])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-skl10/igt@gem_ctx_shared@q-promotion@vecs0.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl4/igt@gem_ctx_shared@q-promotion@vecs0.html

  * igt@gem_exec_balancer@parallel-balancer:
    - shard-iclb:         [PASS][9] -> [SKIP][10] ([i915#4525])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-iclb1/igt@gem_exec_balancer@parallel-balancer.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb6/igt@gem_exec_balancer@parallel-balancer.html

  * igt@gem_exec_balancer@parallel-contexts:
    - shard-kbl:          NOTRUN -> [DMESG-WARN][11] ([i915#5076])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl7/igt@gem_exec_balancer@parallel-contexts.html

  * igt@gem_exec_capture@pi@rcs0:
    - shard-skl:          NOTRUN -> [INCOMPLETE][12] ([i915#4547])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl3/igt@gem_exec_capture@pi@rcs0.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-iclb:         [PASS][13] -> [FAIL][14] ([i915#2842])
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-iclb7/igt@gem_exec_fair@basic-none-share@rcs0.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb2/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - shard-glk:          [PASS][15] -> [FAIL][16] ([i915#2842])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-glk5/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-glk1/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-pace@vecs0:
    - shard-kbl:          [PASS][17] -> [FAIL][18] ([i915#2842])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-kbl1/igt@gem_exec_fair@basic-pace@vecs0.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl4/igt@gem_exec_fair@basic-pace@vecs0.html

  * igt@gem_exec_fair@basic-throttle@rcs0:
    - shard-glk:          NOTRUN -> [FAIL][19] ([i915#2842])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-glk8/igt@gem_exec_fair@basic-throttle@rcs0.html

  * igt@gem_exec_params@no-blt:
    - shard-tglb:         NOTRUN -> [SKIP][20] ([fdo#109283])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-tglb5/igt@gem_exec_params@no-blt.html

  * igt@gem_exec_whisper@basic-queues-forked:
    - shard-glk:          NOTRUN -> [DMESG-WARN][21] ([i915#118])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-glk8/igt@gem_exec_whisper@basic-queues-forked.html

  * igt@gem_lmem_swapping@heavy-verify-random:
    - shard-skl:          NOTRUN -> [SKIP][22] ([fdo#109271] / [i915#4613]) +3 similar issues
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl10/igt@gem_lmem_swapping@heavy-verify-random.html

  * igt@gem_lmem_swapping@parallel-random:
    - shard-kbl:          NOTRUN -> [SKIP][23] ([fdo#109271] / [i915#4613]) +2 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl1/igt@gem_lmem_swapping@parallel-random.html

  * igt@gem_userptr_blits@vma-merge:
    - shard-skl:          NOTRUN -> [FAIL][24] ([i915#3318])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl9/igt@gem_userptr_blits@vma-merge.html

  * igt@i915_pm_dc@dc6-dpms:
    - shard-skl:          NOTRUN -> [FAIL][25] ([i915#454])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl9/igt@i915_pm_dc@dc6-dpms.html

  * igt@i915_pm_dc@dc9-dpms:
    - shard-iclb:         [PASS][26] -> [SKIP][27] ([i915#4281])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-iclb8/igt@i915_pm_dc@dc9-dpms.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb3/igt@i915_pm_dc@dc9-dpms.html

  * igt@kms_big_fb@linear-8bpp-rotate-90:
    - shard-iclb:         NOTRUN -> [SKIP][28] ([fdo#110725] / [fdo#111614])
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb5/igt@kms_big_fb@linear-8bpp-rotate-90.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-180-hflip:
    - shard-apl:          NOTRUN -> [SKIP][29] ([fdo#109271] / [i915#3777])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-apl1/igt@kms_big_fb@x-tiled-max-hw-stride-64bpp-rotate-180-hflip.html

  * igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180-hflip:
    - shard-kbl:          NOTRUN -> [SKIP][30] ([fdo#109271] / [i915#3777]) +1 similar issue
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl4/igt@kms_big_fb@y-tiled-max-hw-stride-32bpp-rotate-180-hflip.html

  * igt@kms_big_fb@yf-tiled-64bpp-rotate-270:
    - shard-iclb:         NOTRUN -> [SKIP][31] ([fdo#110723])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb5/igt@kms_big_fb@yf-tiled-64bpp-rotate-270.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip:
    - shard-skl:          NOTRUN -> [SKIP][32] ([fdo#109271] / [i915#3777]) +4 similar issues
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl9/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-0-hflip-async-flip.html

  * igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-180-async-flip:
    - shard-skl:          NOTRUN -> [FAIL][33] ([i915#3743]) +1 similar issue
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl8/igt@kms_big_fb@yf-tiled-max-hw-stride-32bpp-rotate-180-async-flip.html

  * igt@kms_ccs@pipe-a-bad-rotation-90-y_tiled_gen12_mc_ccs:
    - shard-glk:          NOTRUN -> [SKIP][34] ([fdo#109271] / [i915#3886]) +1 similar issue
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-glk8/igt@kms_ccs@pipe-a-bad-rotation-90-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-a-crc-primary-basic-y_tiled_gen12_rc_ccs_cc:
    - shard-apl:          NOTRUN -> [SKIP][35] ([fdo#109271] / [i915#3886])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-apl1/igt@kms_ccs@pipe-a-crc-primary-basic-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_rc_ccs_cc:
    - shard-kbl:          NOTRUN -> [SKIP][36] ([fdo#109271] / [i915#3886]) +2 similar issues
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl1/igt@kms_ccs@pipe-a-crc-primary-rotation-180-y_tiled_gen12_rc_ccs_cc.html

  * igt@kms_ccs@pipe-b-ccs-on-another-bo-y_tiled_gen12_mc_ccs:
    - shard-skl:          NOTRUN -> [SKIP][37] ([fdo#109271] / [i915#3886]) +9 similar issues
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl9/igt@kms_ccs@pipe-b-ccs-on-another-bo-y_tiled_gen12_mc_ccs.html

  * igt@kms_ccs@pipe-d-crc-sprite-planes-basic-yf_tiled_ccs:
    - shard-glk:          NOTRUN -> [SKIP][38] ([fdo#109271]) +25 similar issues
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-glk8/igt@kms_ccs@pipe-d-crc-sprite-planes-basic-yf_tiled_ccs.html

  * igt@kms_cdclk@mode-transition:
    - shard-apl:          NOTRUN -> [SKIP][39] ([fdo#109271]) +8 similar issues
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-apl1/igt@kms_cdclk@mode-transition.html

  * igt@kms_chamelium@dp-hpd-with-enabled-mode:
    - shard-glk:          NOTRUN -> [SKIP][40] ([fdo#109271] / [fdo#111827]) +4 similar issues
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-glk8/igt@kms_chamelium@dp-hpd-with-enabled-mode.html

  * igt@kms_color_chamelium@pipe-a-ctm-0-75:
    - shard-kbl:          NOTRUN -> [SKIP][41] ([fdo#109271] / [fdo#111827]) +5 similar issues
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl1/igt@kms_color_chamelium@pipe-a-ctm-0-75.html

  * igt@kms_color_chamelium@pipe-b-ctm-max:
    - shard-skl:          NOTRUN -> [SKIP][42] ([fdo#109271] / [fdo#111827]) +17 similar issues
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl7/igt@kms_color_chamelium@pipe-b-ctm-max.html
    - shard-tglb:         NOTRUN -> [SKIP][43] ([fdo#109284] / [fdo#111827])
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-tglb5/igt@kms_color_chamelium@pipe-b-ctm-max.html

  * igt@kms_color_chamelium@pipe-d-ctm-negative:
    - shard-iclb:         NOTRUN -> [SKIP][44] ([fdo#109278] / [fdo#109284] / [fdo#111827])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb5/igt@kms_color_chamelium@pipe-d-ctm-negative.html

  * igt@kms_content_protection@srm:
    - shard-kbl:          NOTRUN -> [TIMEOUT][45] ([i915#1319])
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl1/igt@kms_content_protection@srm.html

  * igt@kms_cursor_crc@pipe-a-cursor-32x10-onscreen:
    - shard-iclb:         NOTRUN -> [SKIP][46] ([fdo#109278]) +1 similar issue
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb5/igt@kms_cursor_crc@pipe-a-cursor-32x10-onscreen.html

  * igt@kms_cursor_crc@pipe-a-cursor-512x170-sliding:
    - shard-tglb:         NOTRUN -> [SKIP][47] ([fdo#109279] / [i915#3359])
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-tglb5/igt@kms_cursor_crc@pipe-a-cursor-512x170-sliding.html

  * igt@kms_cursor_crc@pipe-a-cursor-512x512-offscreen:
    - shard-iclb:         NOTRUN -> [SKIP][48] ([fdo#109278] / [fdo#109279])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb5/igt@kms_cursor_crc@pipe-a-cursor-512x512-offscreen.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-apl:          [PASS][49] -> [INCOMPLETE][50] ([i915#180] / [i915#1982])
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-apl2/igt@kms_fbcon_fbt@fbc-suspend.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-apl8/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@flip-vs-suspend-interruptible@a-dp1:
    - shard-kbl:          [PASS][51] -> [DMESG-WARN][52] ([i915#180]) +5 similar issues
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-kbl3/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl6/igt@kms_flip@flip-vs-suspend-interruptible@a-dp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@b-dp1:
    - shard-apl:          [PASS][53] -> [DMESG-WARN][54] ([i915#180])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-apl7/igt@kms_flip@flip-vs-suspend-interruptible@b-dp1.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-apl4/igt@kms_flip@flip-vs-suspend-interruptible@b-dp1.html

  * igt@kms_flip@plain-flip-ts-check-interruptible@a-edp1:
    - shard-skl:          [PASS][55] -> [FAIL][56] ([i915#2122]) +1 similar issue
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-skl1/igt@kms_flip@plain-flip-ts-check-interruptible@a-edp1.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl6/igt@kms_flip@plain-flip-ts-check-interruptible@a-edp1.html

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-indfb-pgflip-blt:
    - shard-iclb:         NOTRUN -> [SKIP][57] ([fdo#109280]) +2 similar issues
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb5/igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-indfb-pgflip-blt.html

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-d:
    - shard-kbl:          NOTRUN -> [SKIP][58] ([fdo#109271] / [i915#533])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl4/igt@kms_pipe_crc_basic@hang-read-crc-pipe-d.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-d:
    - shard-skl:          NOTRUN -> [SKIP][59] ([fdo#109271] / [i915#533]) +1 similar issue
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl9/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-d.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb:
    - shard-skl:          NOTRUN -> [FAIL][60] ([i915#265])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl3/igt@kms_plane_alpha_blend@pipe-a-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-7efc:
    - shard-kbl:          NOTRUN -> [FAIL][61] ([fdo#108145] / [i915#265])
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl1/igt@kms_plane_alpha_blend@pipe-b-alpha-7efc.html

  * igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb:
    - shard-kbl:          NOTRUN -> [FAIL][62] ([i915#265])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl1/igt@kms_plane_alpha_blend@pipe-b-alpha-transparent-fb.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-basic:
    - shard-glk:          NOTRUN -> [FAIL][63] ([fdo#108145] / [i915#265])
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-glk8/igt@kms_plane_alpha_blend@pipe-c-alpha-basic.html

  * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min:
    - shard-skl:          NOTRUN -> [FAIL][64] ([fdo#108145] / [i915#265]) +5 similar issues
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl7/igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min.html

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          [PASS][65] -> [FAIL][66] ([fdo#108145] / [i915#265])
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-skl10/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl4/igt@kms_plane_alpha_blend@pipe-c-coverage-7efc.html

  * igt@kms_plane_scaling@downscale-with-rotation-factor-0-5@pipe-a-dp-1-downscale-with-rotation:
    - shard-kbl:          NOTRUN -> [SKIP][67] ([fdo#109271]) +79 similar issues
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl7/igt@kms_plane_scaling@downscale-with-rotation-factor-0-5@pipe-a-dp-1-downscale-with-rotation.html

  * igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area:
    - shard-skl:          NOTRUN -> [SKIP][68] ([fdo#109271] / [i915#658]) +2 similar issues
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl3/igt@kms_psr2_sf@overlay-plane-update-sf-dmg-area.html

  * igt@kms_psr2_su@page_flip-p010:
    - shard-kbl:          NOTRUN -> [SKIP][69] ([fdo#109271] / [i915#658])
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl1/igt@kms_psr2_su@page_flip-p010.html

  * igt@kms_psr@psr2_cursor_mmap_cpu:
    - shard-iclb:         [PASS][70] -> [SKIP][71] ([fdo#109441]) +1 similar issue
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-iclb2/igt@kms_psr@psr2_cursor_mmap_cpu.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb8/igt@kms_psr@psr2_cursor_mmap_cpu.html

  * igt@kms_psr@psr2_cursor_plane_onoff:
    - shard-iclb:         NOTRUN -> [SKIP][72] ([fdo#109441])
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb5/igt@kms_psr@psr2_cursor_plane_onoff.html

  * igt@kms_sysfs_edid_timing:
    - shard-kbl:          NOTRUN -> [FAIL][73] ([IGT#2])
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl1/igt@kms_sysfs_edid_timing.html

  * igt@kms_writeback@writeback-check-output:
    - shard-kbl:          NOTRUN -> [SKIP][74] ([fdo#109271] / [i915#2437])
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl1/igt@kms_writeback@writeback-check-output.html

  * igt@kms_writeback@writeback-pixel-formats:
    - shard-skl:          NOTRUN -> [SKIP][75] ([fdo#109271] / [i915#2437])
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl9/igt@kms_writeback@writeback-pixel-formats.html

  * igt@perf@gen12-mi-rpc:
    - shard-skl:          NOTRUN -> [SKIP][76] ([fdo#109271]) +291 similar issues
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl9/igt@perf@gen12-mi-rpc.html

  * igt@syncobj_timeline@transfer-timeline-point:
    - shard-kbl:          NOTRUN -> [DMESG-FAIL][77] ([i915#5098])
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl4/igt@syncobj_timeline@transfer-timeline-point.html

  * igt@sysfs_clients@busy:
    - shard-skl:          NOTRUN -> [SKIP][78] ([fdo#109271] / [i915#2994])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl9/igt@sysfs_clients@busy.html

  * igt@sysfs_clients@split-50:
    - shard-kbl:          NOTRUN -> [SKIP][79] ([fdo#109271] / [i915#2994]) +1 similar issue
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl4/igt@sysfs_clients@split-50.html

  * igt@sysfs_heartbeat_interval@mixed@rcs0:
    - shard-skl:          [PASS][80] -> [FAIL][81] ([i915#1731])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-skl9/igt@sysfs_heartbeat_interval@mixed@rcs0.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl1/igt@sysfs_heartbeat_interval@mixed@rcs0.html

  * igt@sysfs_timeslice_duration@timeout@rcs0:
    - shard-skl:          [PASS][82] -> [FAIL][83] ([i915#3259])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-skl10/igt@sysfs_timeslice_duration@timeout@rcs0.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl2/igt@sysfs_timeslice_duration@timeout@rcs0.html

  
#### Possible fixes ####

  * igt@fbdev@write:
    - {shard-rkl}:        [SKIP][84] ([i915#2582]) -> [PASS][85]
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-5/igt@fbdev@write.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@fbdev@write.html

  * igt@gem_eio@unwedge-stress:
    - {shard-tglu}:       [TIMEOUT][86] ([i915#3063] / [i915#3648]) -> [PASS][87]
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-tglu-2/igt@gem_eio@unwedge-stress.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-tglu-1/igt@gem_eio@unwedge-stress.html

  * igt@gem_exec_fair@basic-none@vcs0:
    - shard-kbl:          [FAIL][88] ([i915#2842]) -> [PASS][89] +1 similar issue
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-kbl6/igt@gem_exec_fair@basic-none@vcs0.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl4/igt@gem_exec_fair@basic-none@vcs0.html
    - shard-apl:          [FAIL][90] ([i915#2842]) -> [PASS][91]
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-apl2/igt@gem_exec_fair@basic-none@vcs0.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-apl8/igt@gem_exec_fair@basic-none@vcs0.html

  * igt@gem_exec_fair@basic-pace-share@rcs0:
    - {shard-tglu}:       [FAIL][92] ([i915#2842]) -> [PASS][93]
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-tglu-2/igt@gem_exec_fair@basic-pace-share@rcs0.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-tglu-1/igt@gem_exec_fair@basic-pace-share@rcs0.html

  * igt@gem_exec_fair@basic-pace@bcs0:
    - shard-tglb:         [FAIL][94] ([i915#2842]) -> [PASS][95]
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-tglb6/igt@gem_exec_fair@basic-pace@bcs0.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-tglb6/igt@gem_exec_fair@basic-pace@bcs0.html

  * igt@gem_exec_suspend@basic-s0@smem:
    - {shard-rkl}:        [INCOMPLETE][96] -> [PASS][97]
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-5/igt@gem_exec_suspend@basic-s0@smem.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-2/igt@gem_exec_suspend@basic-s0@smem.html

  * igt@gen9_exec_parse@allowed-all:
    - shard-glk:          [DMESG-WARN][98] ([i915#1436] / [i915#716]) -> [PASS][99]
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-glk7/igt@gen9_exec_parse@allowed-all.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-glk8/igt@gen9_exec_parse@allowed-all.html

  * igt@i915_pm_backlight@bad-brightness:
    - {shard-rkl}:        [SKIP][100] ([i915#3012]) -> [PASS][101] +1 similar issue
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-2/igt@i915_pm_backlight@bad-brightness.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@i915_pm_backlight@bad-brightness.html

  * igt@i915_suspend@fence-restore-untiled:
    - shard-apl:          [DMESG-WARN][102] ([i915#180]) -> [PASS][103]
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-apl4/igt@i915_suspend@fence-restore-untiled.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-apl1/igt@i915_suspend@fence-restore-untiled.html

  * igt@kms_atomic@atomic_plane_damage:
    - {shard-rkl}:        ([SKIP][104], [SKIP][105]) ([i915#4098]) -> [PASS][106]
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-2/igt@kms_atomic@atomic_plane_damage.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-4/igt@kms_atomic@atomic_plane_damage.html
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_atomic@atomic_plane_damage.html

  * igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip-async-flip:
    - {shard-rkl}:        [SKIP][107] ([i915#1845] / [i915#4098]) -> [PASS][108] +19 similar issues
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-2/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip-async-flip.html
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_big_fb@x-tiled-max-hw-stride-32bpp-rotate-180-hflip-async-flip.html

  * igt@kms_color@pipe-a-ctm-blue-to-red:
    - {shard-rkl}:        [SKIP][109] ([i915#1149] / [i915#1849] / [i915#4070] / [i915#4098]) -> [PASS][110] +2 similar issues
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-5/igt@kms_color@pipe-a-ctm-blue-to-red.html
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_color@pipe-a-ctm-blue-to-red.html

  * igt@kms_cursor_crc@pipe-a-cursor-256x85-random:
    - {shard-rkl}:        [SKIP][111] ([fdo#112022] / [i915#4070]) -> [PASS][112] +7 similar issues
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-2/igt@kms_cursor_crc@pipe-a-cursor-256x85-random.html
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_cursor_crc@pipe-a-cursor-256x85-random.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][113] ([i915#180]) -> [PASS][114] +2 similar issues
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-kbl6/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl7/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_cursor_edge_walk@pipe-a-128x128-left-edge:
    - {shard-rkl}:        [SKIP][115] ([i915#1849] / [i915#4070] / [i915#4098]) -> [PASS][116] +1 similar issue
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-2/igt@kms_cursor_edge_walk@pipe-a-128x128-left-edge.html
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_cursor_edge_walk@pipe-a-128x128-left-edge.html

  * igt@kms_cursor_legacy@cursor-vs-flip-atomic:
    - {shard-rkl}:        [SKIP][117] ([fdo#111825] / [i915#4070]) -> [PASS][118] +1 similar issue
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-5/igt@kms_cursor_legacy@cursor-vs-flip-atomic.html
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_cursor_legacy@cursor-vs-flip-atomic.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions:
    - shard-skl:          [FAIL][119] ([i915#2346]) -> [PASS][120]
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-skl9/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl1/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions.html

  * igt@kms_draw_crc@draw-method-rgb565-pwrite-ytiled:
    - {shard-rkl}:        [SKIP][121] ([fdo#111314] / [i915#4098] / [i915#4369]) -> [PASS][122] +3 similar issues
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-5/igt@kms_draw_crc@draw-method-rgb565-pwrite-ytiled.html
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_draw_crc@draw-method-rgb565-pwrite-ytiled.html

  * igt@kms_draw_crc@draw-method-xrgb2101010-blt-xtiled:
    - {shard-rkl}:        ([SKIP][123], [SKIP][124]) ([fdo#111314] / [i915#4098] / [i915#4369]) -> [PASS][125]
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-2/igt@kms_draw_crc@draw-method-xrgb2101010-blt-xtiled.html
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-4/igt@kms_draw_crc@draw-method-xrgb2101010-blt-xtiled.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_draw_crc@draw-method-xrgb2101010-blt-xtiled.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - {shard-rkl}:        [SKIP][126] ([i915#1849] / [i915#4098]) -> [PASS][127] +2 similar issues
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-5/igt@kms_fbcon_fbt@fbc-suspend.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_fbcon_fbt@fbc-suspend.html
    - shard-kbl:          [INCOMPLETE][128] ([i915#180] / [i915#636]) -> [PASS][129]
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-kbl6/igt@kms_fbcon_fbt@fbc-suspend.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-kbl4/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt:
    - {shard-rkl}:        ([SKIP][130], [SKIP][131]) ([i915#1849] / [i915#4098]) -> [PASS][132] +1 similar issue
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-4/igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-5/igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt.html
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_frontbuffer_tracking@fbc-1p-shrfb-fliptrack-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-draw-mmap-gtt:
    - {shard-rkl}:        [SKIP][133] ([i915#1849]) -> [PASS][134] +14 similar issues
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-rkl-5/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-draw-mmap-gtt.html
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-rkl-6/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-spr-indfb-draw-mmap-gtt.html

  * igt@kms_hdr@bpc-switch-suspend@bpc-switch-suspend-edp-1-pipe-a:
    - shard-skl:          [FAIL][135] ([i915#1188]) -> [PASS][136]
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-skl1/igt@kms_hdr@bpc-switch-suspend@bpc-switch-suspend-edp-1-pipe-a.html
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-skl10/igt@kms_hdr@bpc-switch-suspend@bpc-switch-suspend-edp-1-pipe-a.html

  * igt@kms_psr@psr2_sprite_mmap_gtt:
    - shard-iclb:         [SKIP][137] ([fdo#109441]) -> [PASS][138] +1 similar issue
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11385/shard-iclb7/igt@kms_psr@psr2_sprite_mmap_gtt.html
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/shard-iclb2/igt@kms_psr@psr2_sprite_mmap_gtt.html

  * igt@kms_universal_plane@universal-plane-pipe-a-sanity:
    - {shard-rkl}:        [SKIP][139] ([i915#1845] / [i915#4070] / [i915#4098]) -> [PASS][140] +1 si

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_22619/index.html

[-- Attachment #2: Type: text/html, Size: 33427 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
@ 2022-03-21 10:27   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 10:27 UTC (permalink / raw)
  To: Michael Cheng, intel-gfx
  Cc: thomas.hellstrom, wayne.boyer, daniel.vetter, casey.g.bowman,
	lucas.demarchi, dri-devel, chris, Matthew Auld


On 19/03/2022 19:42, Michael Cheng wrote:
> To align with the discussion in [1][2], this patch series drops all usage of
> wbvind_on_all_cpus within i915 by either replacing the call with certain
> drm clflush helpers, or reverting to a previous logic.

AFAIU, complaint from [1] was that it is wrong to provide non x86 implementations under the wbinvd_on_all_cpus name. Instead an arch agnostic helper which achieves the same effect could be created. Does Arm have such concept?

Given that the series seems to be taking a different route, avoiding the need to call wbinvd_on_all_cpus rather than what [1] suggests (note drm_clflush_sg can still call it!?), concern is that the series has a bunch of reverts and each one needs to be analyzed.

For instance looking at just the last one, 64b95df91f44, who has looked at the locking consequences that commit describes:

"""
     Inside gtt_restore_mappings() we currently take the obj->resv->lock, but
     in the future we need to avoid taking this fs-reclaim tainted lock as we
     need to extend the coverage of the vm->mutex. Take advantage of the
     single-threaded nature of the early resume phase, and do a single
     wbinvd() to flush all the GTT objects en masse.

"""

?

Then there are suspend and freeze reverts which presumably can regress the suspend times. Any data on those?

Adding Matt since he was the reviewer for that work so might remember something.

Regards,

Tvrtko

  
> [1]. https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html
> [2]. https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5
> 
> Michael Cheng (4):
>    i915/gem: drop wbinvd_on_all_cpus usage
>    Revert "drm/i915/gem: Almagamate clflushes on suspend"
>    i915/gem: Revert i915_gem_freeze to previous logic
>    drm/i915/gt: Revert ggtt_resume to previous logic
> 
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
>   drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56 ++++++++++++++--------
>   drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
>   drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
>   4 files changed, 46 insertions(+), 38 deletions(-)
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-21 10:27   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 10:27 UTC (permalink / raw)
  To: Michael Cheng, intel-gfx
  Cc: thomas.hellstrom, daniel.vetter, lucas.demarchi, dri-devel,
	chris, Matthew Auld


On 19/03/2022 19:42, Michael Cheng wrote:
> To align with the discussion in [1][2], this patch series drops all usage of
> wbvind_on_all_cpus within i915 by either replacing the call with certain
> drm clflush helpers, or reverting to a previous logic.

AFAIU, complaint from [1] was that it is wrong to provide non x86 implementations under the wbinvd_on_all_cpus name. Instead an arch agnostic helper which achieves the same effect could be created. Does Arm have such concept?

Given that the series seems to be taking a different route, avoiding the need to call wbinvd_on_all_cpus rather than what [1] suggests (note drm_clflush_sg can still call it!?), concern is that the series has a bunch of reverts and each one needs to be analyzed.

For instance looking at just the last one, 64b95df91f44, who has looked at the locking consequences that commit describes:

"""
     Inside gtt_restore_mappings() we currently take the obj->resv->lock, but
     in the future we need to avoid taking this fs-reclaim tainted lock as we
     need to extend the coverage of the vm->mutex. Take advantage of the
     single-threaded nature of the early resume phase, and do a single
     wbinvd() to flush all the GTT objects en masse.

"""

?

Then there are suspend and freeze reverts which presumably can regress the suspend times. Any data on those?

Adding Matt since he was the reviewer for that work so might remember something.

Regards,

Tvrtko

  
> [1]. https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html
> [2]. https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5
> 
> Michael Cheng (4):
>    i915/gem: drop wbinvd_on_all_cpus usage
>    Revert "drm/i915/gem: Almagamate clflushes on suspend"
>    i915/gem: Revert i915_gem_freeze to previous logic
>    drm/i915/gt: Revert ggtt_resume to previous logic
> 
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
>   drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56 ++++++++++++++--------
>   drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
>   drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
>   4 files changed, 46 insertions(+), 38 deletions(-)
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
  2022-03-19 19:42   ` [Intel-gfx] " Michael Cheng
@ 2022-03-21 10:30     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 10:30 UTC (permalink / raw)
  To: Michael Cheng, intel-gfx
  Cc: thomas.hellstrom, wayne.boyer, daniel.vetter, casey.g.bowman,
	lucas.demarchi, dri-devel, chris


On 19/03/2022 19:42, Michael Cheng wrote:
> Previous concern with using drm_clflush_sg was that we don't know what the
> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
> everything at once to avoid paranoia.

And now we know, or we know it is not a concern?

> To make i915 more architecture-neutral and be less paranoid, lets attempt to

"Lets attempt" as we don't know if this will work and/or what can/will 
break?

> use drm_clflush_sg to flush the pages for when the GPU wants to read
> from main memory.
> 
> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index f5062d0c6333..b0a5baaebc43 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -8,6 +8,7 @@
>   #include <linux/highmem.h>
>   #include <linux/dma-resv.h>
>   #include <linux/module.h>
> +#include <drm/drm_cache.h>
>   
>   #include <asm/smp.h>
>   
> @@ -250,16 +251,10 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>   	 * DG1 is special here since it still snoops transactions even with
>   	 * CACHE_NONE. This is not the case with other HAS_SNOOP platforms. We
>   	 * might need to revisit this as we add new discrete platforms.
> -	 *
> -	 * XXX: Consider doing a vmap flush or something, where possible.
> -	 * Currently we just do a heavy handed wbinvd_on_all_cpus() here since
> -	 * the underlying sg_table might not even point to struct pages, so we
> -	 * can't just call drm_clflush_sg or similar, like we do elsewhere in
> -	 * the driver.
>   	 */
>   	if (i915_gem_object_can_bypass_llc(obj) ||
>   	    (!HAS_LLC(i915) && !IS_DG1(i915)))
> -		wbinvd_on_all_cpus();
> +		drm_clflush_sg(pages);

And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
so are you just punting the issue somewhere else? How will it be solved 
there?

Regards,

Tvrtko

>   
>   	sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>   	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
@ 2022-03-21 10:30     ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 10:30 UTC (permalink / raw)
  To: Michael Cheng, intel-gfx
  Cc: thomas.hellstrom, daniel.vetter, lucas.demarchi, dri-devel, chris


On 19/03/2022 19:42, Michael Cheng wrote:
> Previous concern with using drm_clflush_sg was that we don't know what the
> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
> everything at once to avoid paranoia.

And now we know, or we know it is not a concern?

> To make i915 more architecture-neutral and be less paranoid, lets attempt to

"Lets attempt" as we don't know if this will work and/or what can/will 
break?

> use drm_clflush_sg to flush the pages for when the GPU wants to read
> from main memory.
> 
> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index f5062d0c6333..b0a5baaebc43 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -8,6 +8,7 @@
>   #include <linux/highmem.h>
>   #include <linux/dma-resv.h>
>   #include <linux/module.h>
> +#include <drm/drm_cache.h>
>   
>   #include <asm/smp.h>
>   
> @@ -250,16 +251,10 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>   	 * DG1 is special here since it still snoops transactions even with
>   	 * CACHE_NONE. This is not the case with other HAS_SNOOP platforms. We
>   	 * might need to revisit this as we add new discrete platforms.
> -	 *
> -	 * XXX: Consider doing a vmap flush or something, where possible.
> -	 * Currently we just do a heavy handed wbinvd_on_all_cpus() here since
> -	 * the underlying sg_table might not even point to struct pages, so we
> -	 * can't just call drm_clflush_sg or similar, like we do elsewhere in
> -	 * the driver.
>   	 */
>   	if (i915_gem_object_can_bypass_llc(obj) ||
>   	    (!HAS_LLC(i915) && !IS_DG1(i915)))
> -		wbinvd_on_all_cpus();
> +		drm_clflush_sg(pages);

And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
so are you just punting the issue somewhere else? How will it be solved 
there?

Regards,

Tvrtko

>   
>   	sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>   	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-21 10:27   ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-21 11:03     ` Thomas Hellström
  -1 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 11:03 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld

Hi, Tvrtko.

On 3/21/22 11:27, Tvrtko Ursulin wrote:
>
> On 19/03/2022 19:42, Michael Cheng wrote:
>> To align with the discussion in [1][2], this patch series drops all 
>> usage of
>> wbvind_on_all_cpus within i915 by either replacing the call with certain
>> drm clflush helpers, or reverting to a previous logic.
>
> AFAIU, complaint from [1] was that it is wrong to provide non x86 
> implementations under the wbinvd_on_all_cpus name. Instead an arch 
> agnostic helper which achieves the same effect could be created. Does 
> Arm have such concept?

I also understand Linus' email like we shouldn't leak incoherent IO to 
other architectures, meaning any remaining wbinvd()s should be X86 only.

Also, wbinvd_on_all_cpus() can become very costly, hence prefer the 
range apis when possible if they can be verified not to degrade performance.


>
> Given that the series seems to be taking a different route, avoiding 
> the need to call wbinvd_on_all_cpus rather than what [1] suggests 
> (note drm_clflush_sg can still call it!?), concern is that the series 
> has a bunch of reverts and each one needs to be analyzed.


Agreed.

/Thomas



>
> For instance looking at just the last one, 64b95df91f44, who has 
> looked at the locking consequences that commit describes:
>
> """
>     Inside gtt_restore_mappings() we currently take the 
> obj->resv->lock, but
>     in the future we need to avoid taking this fs-reclaim tainted lock 
> as we
>     need to extend the coverage of the vm->mutex. Take advantage of the
>     single-threaded nature of the early resume phase, and do a single
>     wbinvd() to flush all the GTT objects en masse.
>
> """
>
> ?
>
> Then there are suspend and freeze reverts which presumably can regress 
> the suspend times. Any data on those?
>
> Adding Matt since he was the reviewer for that work so might remember 
> something.
>
> Regards,
>
> Tvrtko
>
>
>> [1]. 
>> https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html
>> [2]. https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5
>>
>> Michael Cheng (4):
>>    i915/gem: drop wbinvd_on_all_cpus usage
>>    Revert "drm/i915/gem: Almagamate clflushes on suspend"
>>    i915/gem: Revert i915_gem_freeze to previous logic
>>    drm/i915/gt: Revert ggtt_resume to previous logic
>>
>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
>>   drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56 ++++++++++++++--------
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
>>   drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
>>   4 files changed, 46 insertions(+), 38 deletions(-)
>>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-21 11:03     ` Thomas Hellström
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 11:03 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld

Hi, Tvrtko.

On 3/21/22 11:27, Tvrtko Ursulin wrote:
>
> On 19/03/2022 19:42, Michael Cheng wrote:
>> To align with the discussion in [1][2], this patch series drops all 
>> usage of
>> wbvind_on_all_cpus within i915 by either replacing the call with certain
>> drm clflush helpers, or reverting to a previous logic.
>
> AFAIU, complaint from [1] was that it is wrong to provide non x86 
> implementations under the wbinvd_on_all_cpus name. Instead an arch 
> agnostic helper which achieves the same effect could be created. Does 
> Arm have such concept?

I also understand Linus' email like we shouldn't leak incoherent IO to 
other architectures, meaning any remaining wbinvd()s should be X86 only.

Also, wbinvd_on_all_cpus() can become very costly, hence prefer the 
range apis when possible if they can be verified not to degrade performance.


>
> Given that the series seems to be taking a different route, avoiding 
> the need to call wbinvd_on_all_cpus rather than what [1] suggests 
> (note drm_clflush_sg can still call it!?), concern is that the series 
> has a bunch of reverts and each one needs to be analyzed.


Agreed.

/Thomas



>
> For instance looking at just the last one, 64b95df91f44, who has 
> looked at the locking consequences that commit describes:
>
> """
>     Inside gtt_restore_mappings() we currently take the 
> obj->resv->lock, but
>     in the future we need to avoid taking this fs-reclaim tainted lock 
> as we
>     need to extend the coverage of the vm->mutex. Take advantage of the
>     single-threaded nature of the early resume phase, and do a single
>     wbinvd() to flush all the GTT objects en masse.
>
> """
>
> ?
>
> Then there are suspend and freeze reverts which presumably can regress 
> the suspend times. Any data on those?
>
> Adding Matt since he was the reviewer for that work so might remember 
> something.
>
> Regards,
>
> Tvrtko
>
>
>> [1]. 
>> https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html
>> [2]. https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5
>>
>> Michael Cheng (4):
>>    i915/gem: drop wbinvd_on_all_cpus usage
>>    Revert "drm/i915/gem: Almagamate clflushes on suspend"
>>    i915/gem: Revert i915_gem_freeze to previous logic
>>    drm/i915/gt: Revert ggtt_resume to previous logic
>>
>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
>>   drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56 ++++++++++++++--------
>>   drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
>>   drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
>>   4 files changed, 46 insertions(+), 38 deletions(-)
>>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
  2022-03-21 10:30     ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-21 11:07       ` Thomas Hellström
  -1 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 11:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris


On 3/21/22 11:30, Tvrtko Ursulin wrote:
>
> On 19/03/2022 19:42, Michael Cheng wrote:
>> Previous concern with using drm_clflush_sg was that we don't know 
>> what the
>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>> everything at once to avoid paranoia.
>
> And now we know, or we know it is not a concern?
>
>> To make i915 more architecture-neutral and be less paranoid, lets 
>> attempt to
>
> "Lets attempt" as we don't know if this will work and/or what can/will 
> break?
>
>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>> from main memory.
>>
>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> index f5062d0c6333..b0a5baaebc43 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> @@ -8,6 +8,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/dma-resv.h>
>>   #include <linux/module.h>
>> +#include <drm/drm_cache.h>
>>     #include <asm/smp.h>
>>   @@ -250,16 +251,10 @@ static int 
>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>        * DG1 is special here since it still snoops transactions even 
>> with
>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>> platforms. We
>>        * might need to revisit this as we add new discrete platforms.
>> -     *
>> -     * XXX: Consider doing a vmap flush or something, where possible.
>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() here 
>> since
>> -     * the underlying sg_table might not even point to struct pages, 
>> so we
>> -     * can't just call drm_clflush_sg or similar, like we do 
>> elsewhere in
>> -     * the driver.
>>        */
>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>> -        wbinvd_on_all_cpus();
>> +        drm_clflush_sg(pages);
>
> And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
> so are you just punting the issue somewhere else? How will it be 
> solved there?

I think in this case, drm_clflush_sg() can't be immediately used, 
because pages may not contain actual page pointers; might be just the 
dma address. It needs to be preceded with a dmabuf vmap.

But otherwise this change, I figure, falls into the "prefer range-aware 
apis" category; If the CPU supports it, flush the range only, otherwise 
fall back to wbinvd().

/Thomas



>
> Regards,
>
> Tvrtko
>
>>         sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>>       __i915_gem_object_set_pages(obj, pages, sg_page_sizes);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
@ 2022-03-21 11:07       ` Thomas Hellström
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 11:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris


On 3/21/22 11:30, Tvrtko Ursulin wrote:
>
> On 19/03/2022 19:42, Michael Cheng wrote:
>> Previous concern with using drm_clflush_sg was that we don't know 
>> what the
>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>> everything at once to avoid paranoia.
>
> And now we know, or we know it is not a concern?
>
>> To make i915 more architecture-neutral and be less paranoid, lets 
>> attempt to
>
> "Lets attempt" as we don't know if this will work and/or what can/will 
> break?
>
>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>> from main memory.
>>
>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> index f5062d0c6333..b0a5baaebc43 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> @@ -8,6 +8,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/dma-resv.h>
>>   #include <linux/module.h>
>> +#include <drm/drm_cache.h>
>>     #include <asm/smp.h>
>>   @@ -250,16 +251,10 @@ static int 
>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>        * DG1 is special here since it still snoops transactions even 
>> with
>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>> platforms. We
>>        * might need to revisit this as we add new discrete platforms.
>> -     *
>> -     * XXX: Consider doing a vmap flush or something, where possible.
>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() here 
>> since
>> -     * the underlying sg_table might not even point to struct pages, 
>> so we
>> -     * can't just call drm_clflush_sg or similar, like we do 
>> elsewhere in
>> -     * the driver.
>>        */
>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>> -        wbinvd_on_all_cpus();
>> +        drm_clflush_sg(pages);
>
> And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
> so are you just punting the issue somewhere else? How will it be 
> solved there?

I think in this case, drm_clflush_sg() can't be immediately used, 
because pages may not contain actual page pointers; might be just the 
dma address. It needs to be preceded with a dmabuf vmap.

But otherwise this change, I figure, falls into the "prefer range-aware 
apis" category; If the CPU supports it, flush the range only, otherwise 
fall back to wbinvd().

/Thomas



>
> Regards,
>
> Tvrtko
>
>>         sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>>       __i915_gem_object_set_pages(obj, pages, sg_page_sizes);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-21 11:03     ` [Intel-gfx] " Thomas Hellström
@ 2022-03-21 12:22       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 12:22 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld


On 21/03/2022 11:03, Thomas Hellström wrote:
> Hi, Tvrtko.
> 
> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>
>> On 19/03/2022 19:42, Michael Cheng wrote:
>>> To align with the discussion in [1][2], this patch series drops all 
>>> usage of
>>> wbvind_on_all_cpus within i915 by either replacing the call with certain
>>> drm clflush helpers, or reverting to a previous logic.
>>
>> AFAIU, complaint from [1] was that it is wrong to provide non x86 
>> implementations under the wbinvd_on_all_cpus name. Instead an arch 
>> agnostic helper which achieves the same effect could be created. Does 
>> Arm have such concept?
> 
> I also understand Linus' email like we shouldn't leak incoherent IO to 
> other architectures, meaning any remaining wbinvd()s should be X86 only.

The last part is completely obvious since it is a x86 instruction name.

But I think we can't pick a solution until we know how the concept maps 
to Arm and that will also include seeing how the drm_clflush_sg for Arm 
would look. Is there a range based solution, or just a big hammer there. 
If the latter, then it is no good to churn all these reverts but instead 
an arch agnostic wrapper, with a generic name, would be the way to go.

Regards,

Tvrtko

> Also, wbinvd_on_all_cpus() can become very costly, hence prefer the 
> range apis when possible if they can be verified not to degrade 
> performance.
> 
> 
>>
>> Given that the series seems to be taking a different route, avoiding 
>> the need to call wbinvd_on_all_cpus rather than what [1] suggests 
>> (note drm_clflush_sg can still call it!?), concern is that the series 
>> has a bunch of reverts and each one needs to be analyzed.
> 
> 
> Agreed.
> 
> /Thomas
> 
> 
> 
>>
>> For instance looking at just the last one, 64b95df91f44, who has 
>> looked at the locking consequences that commit describes:
>>
>> """
>>     Inside gtt_restore_mappings() we currently take the 
>> obj->resv->lock, but
>>     in the future we need to avoid taking this fs-reclaim tainted lock 
>> as we
>>     need to extend the coverage of the vm->mutex. Take advantage of the
>>     single-threaded nature of the early resume phase, and do a single
>>     wbinvd() to flush all the GTT objects en masse.
>>
>> """
>>
>> ?
>>
>> Then there are suspend and freeze reverts which presumably can regress 
>> the suspend times. Any data on those?
>>
>> Adding Matt since he was the reviewer for that work so might remember 
>> something.
>>
>> Regards,
>>
>> Tvrtko
>>
>>
>>> [1]. 
>>> https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html 
>>>
>>> [2]. https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5
>>>
>>> Michael Cheng (4):
>>>    i915/gem: drop wbinvd_on_all_cpus usage
>>>    Revert "drm/i915/gem: Almagamate clflushes on suspend"
>>>    i915/gem: Revert i915_gem_freeze to previous logic
>>>    drm/i915/gt: Revert ggtt_resume to previous logic
>>>
>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
>>>   drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56 ++++++++++++++--------
>>>   drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
>>>   drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
>>>   4 files changed, 46 insertions(+), 38 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-21 12:22       ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 12:22 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld


On 21/03/2022 11:03, Thomas Hellström wrote:
> Hi, Tvrtko.
> 
> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>
>> On 19/03/2022 19:42, Michael Cheng wrote:
>>> To align with the discussion in [1][2], this patch series drops all 
>>> usage of
>>> wbvind_on_all_cpus within i915 by either replacing the call with certain
>>> drm clflush helpers, or reverting to a previous logic.
>>
>> AFAIU, complaint from [1] was that it is wrong to provide non x86 
>> implementations under the wbinvd_on_all_cpus name. Instead an arch 
>> agnostic helper which achieves the same effect could be created. Does 
>> Arm have such concept?
> 
> I also understand Linus' email like we shouldn't leak incoherent IO to 
> other architectures, meaning any remaining wbinvd()s should be X86 only.

The last part is completely obvious since it is a x86 instruction name.

But I think we can't pick a solution until we know how the concept maps 
to Arm and that will also include seeing how the drm_clflush_sg for Arm 
would look. Is there a range based solution, or just a big hammer there. 
If the latter, then it is no good to churn all these reverts but instead 
an arch agnostic wrapper, with a generic name, would be the way to go.

Regards,

Tvrtko

> Also, wbinvd_on_all_cpus() can become very costly, hence prefer the 
> range apis when possible if they can be verified not to degrade 
> performance.
> 
> 
>>
>> Given that the series seems to be taking a different route, avoiding 
>> the need to call wbinvd_on_all_cpus rather than what [1] suggests 
>> (note drm_clflush_sg can still call it!?), concern is that the series 
>> has a bunch of reverts and each one needs to be analyzed.
> 
> 
> Agreed.
> 
> /Thomas
> 
> 
> 
>>
>> For instance looking at just the last one, 64b95df91f44, who has 
>> looked at the locking consequences that commit describes:
>>
>> """
>>     Inside gtt_restore_mappings() we currently take the 
>> obj->resv->lock, but
>>     in the future we need to avoid taking this fs-reclaim tainted lock 
>> as we
>>     need to extend the coverage of the vm->mutex. Take advantage of the
>>     single-threaded nature of the early resume phase, and do a single
>>     wbinvd() to flush all the GTT objects en masse.
>>
>> """
>>
>> ?
>>
>> Then there are suspend and freeze reverts which presumably can regress 
>> the suspend times. Any data on those?
>>
>> Adding Matt since he was the reviewer for that work so might remember 
>> something.
>>
>> Regards,
>>
>> Tvrtko
>>
>>
>>> [1]. 
>>> https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html 
>>>
>>> [2]. https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5
>>>
>>> Michael Cheng (4):
>>>    i915/gem: drop wbinvd_on_all_cpus usage
>>>    Revert "drm/i915/gem: Almagamate clflushes on suspend"
>>>    i915/gem: Revert i915_gem_freeze to previous logic
>>>    drm/i915/gt: Revert ggtt_resume to previous logic
>>>
>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
>>>   drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56 ++++++++++++++--------
>>>   drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
>>>   drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
>>>   4 files changed, 46 insertions(+), 38 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-21 12:22       ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-21 12:33         ` Thomas Hellström
  -1 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 12:33 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld

On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> 
> On 21/03/2022 11:03, Thomas Hellström wrote:
> > Hi, Tvrtko.
> > 
> > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > 
> > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > To align with the discussion in [1][2], this patch series drops
> > > > all 
> > > > usage of
> > > > wbvind_on_all_cpus within i915 by either replacing the call
> > > > with certain
> > > > drm clflush helpers, or reverting to a previous logic.
> > > 
> > > AFAIU, complaint from [1] was that it is wrong to provide non x86
> > > implementations under the wbinvd_on_all_cpus name. Instead an
> > > arch 
> > > agnostic helper which achieves the same effect could be created.
> > > Does 
> > > Arm have such concept?
> > 
> > I also understand Linus' email like we shouldn't leak incoherent IO
> > to 
> > other architectures, meaning any remaining wbinvd()s should be X86
> > only.
> 
> The last part is completely obvious since it is a x86 instruction
> name.

Yeah, I meant the function implementing wbinvd() semantics.

> 
> But I think we can't pick a solution until we know how the concept
> maps 
> to Arm and that will also include seeing how the drm_clflush_sg for
> Arm 
> would look. Is there a range based solution, or just a big hammer
> there. 
> If the latter, then it is no good to churn all these reverts but
> instead 
> an arch agnostic wrapper, with a generic name, would be the way to
> go.

But my impression was that ARM would not need the range-based interface
either, because ARM is only for discrete and with discrete we're always
coherent.

So in essence it all would become:

1) Any cache flushing intended for incoherent IO is x86 only.
2) Prefer range-based flushing if possible and any implications sorted
out.

/Thomas


> 
> Regards,
> 
> Tvrtko
> 
> > Also, wbinvd_on_all_cpus() can become very costly, hence prefer the
> > range apis when possible if they can be verified not to degrade 
> > performance.
> > 
> > 
> > > 
> > > Given that the series seems to be taking a different route,
> > > avoiding 
> > > the need to call wbinvd_on_all_cpus rather than what [1] suggests
> > > (note drm_clflush_sg can still call it!?), concern is that the
> > > series 
> > > has a bunch of reverts and each one needs to be analyzed.
> > 
> > 
> > Agreed.
> > 
> > /Thomas
> > 
> > 
> > 
> > > 
> > > For instance looking at just the last one, 64b95df91f44, who has 
> > > looked at the locking consequences that commit describes:
> > > 
> > > """
> > >     Inside gtt_restore_mappings() we currently take the 
> > > obj->resv->lock, but
> > >     in the future we need to avoid taking this fs-reclaim tainted
> > > lock 
> > > as we
> > >     need to extend the coverage of the vm->mutex. Take advantage
> > > of the
> > >     single-threaded nature of the early resume phase, and do a
> > > single
> > >     wbinvd() to flush all the GTT objects en masse.
> > > 
> > > """
> > > 
> > > ?
> > > 
> > > Then there are suspend and freeze reverts which presumably can
> > > regress 
> > > the suspend times. Any data on those?
> > > 
> > > Adding Matt since he was the reviewer for that work so might
> > > remember 
> > > something.
> > > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > 
> > > > [1]. 
> > > > https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html
> > > >  
> > > > 
> > > > [2].
> > > > https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5
> > > > 
> > > > Michael Cheng (4):
> > > >    i915/gem: drop wbinvd_on_all_cpus usage
> > > >    Revert "drm/i915/gem: Almagamate clflushes on suspend"
> > > >    i915/gem: Revert i915_gem_freeze to previous logic
> > > >    drm/i915/gt: Revert ggtt_resume to previous logic
> > > > 
> > > >   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
> > > >   drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56
> > > > ++++++++++++++--------
> > > >   drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
> > > >   drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
> > > >   4 files changed, 46 insertions(+), 38 deletions(-)
> > > > 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-21 12:33         ` Thomas Hellström
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 12:33 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld

On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> 
> On 21/03/2022 11:03, Thomas Hellström wrote:
> > Hi, Tvrtko.
> > 
> > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > 
> > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > To align with the discussion in [1][2], this patch series drops
> > > > all 
> > > > usage of
> > > > wbvind_on_all_cpus within i915 by either replacing the call
> > > > with certain
> > > > drm clflush helpers, or reverting to a previous logic.
> > > 
> > > AFAIU, complaint from [1] was that it is wrong to provide non x86
> > > implementations under the wbinvd_on_all_cpus name. Instead an
> > > arch 
> > > agnostic helper which achieves the same effect could be created.
> > > Does 
> > > Arm have such concept?
> > 
> > I also understand Linus' email like we shouldn't leak incoherent IO
> > to 
> > other architectures, meaning any remaining wbinvd()s should be X86
> > only.
> 
> The last part is completely obvious since it is a x86 instruction
> name.

Yeah, I meant the function implementing wbinvd() semantics.

> 
> But I think we can't pick a solution until we know how the concept
> maps 
> to Arm and that will also include seeing how the drm_clflush_sg for
> Arm 
> would look. Is there a range based solution, or just a big hammer
> there. 
> If the latter, then it is no good to churn all these reverts but
> instead 
> an arch agnostic wrapper, with a generic name, would be the way to
> go.

But my impression was that ARM would not need the range-based interface
either, because ARM is only for discrete and with discrete we're always
coherent.

So in essence it all would become:

1) Any cache flushing intended for incoherent IO is x86 only.
2) Prefer range-based flushing if possible and any implications sorted
out.

/Thomas


> 
> Regards,
> 
> Tvrtko
> 
> > Also, wbinvd_on_all_cpus() can become very costly, hence prefer the
> > range apis when possible if they can be verified not to degrade 
> > performance.
> > 
> > 
> > > 
> > > Given that the series seems to be taking a different route,
> > > avoiding 
> > > the need to call wbinvd_on_all_cpus rather than what [1] suggests
> > > (note drm_clflush_sg can still call it!?), concern is that the
> > > series 
> > > has a bunch of reverts and each one needs to be analyzed.
> > 
> > 
> > Agreed.
> > 
> > /Thomas
> > 
> > 
> > 
> > > 
> > > For instance looking at just the last one, 64b95df91f44, who has 
> > > looked at the locking consequences that commit describes:
> > > 
> > > """
> > >     Inside gtt_restore_mappings() we currently take the 
> > > obj->resv->lock, but
> > >     in the future we need to avoid taking this fs-reclaim tainted
> > > lock 
> > > as we
> > >     need to extend the coverage of the vm->mutex. Take advantage
> > > of the
> > >     single-threaded nature of the early resume phase, and do a
> > > single
> > >     wbinvd() to flush all the GTT objects en masse.
> > > 
> > > """
> > > 
> > > ?
> > > 
> > > Then there are suspend and freeze reverts which presumably can
> > > regress 
> > > the suspend times. Any data on those?
> > > 
> > > Adding Matt since he was the reviewer for that work so might
> > > remember 
> > > something.
> > > 
> > > Regards,
> > > 
> > > Tvrtko
> > > 
> > > 
> > > > [1]. 
> > > > https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html
> > > >  
> > > > 
> > > > [2].
> > > > https://patchwork.freedesktop.org/patch/475752/?series=99991&rev=5
> > > > 
> > > > Michael Cheng (4):
> > > >    i915/gem: drop wbinvd_on_all_cpus usage
> > > >    Revert "drm/i915/gem: Almagamate clflushes on suspend"
> > > >    i915/gem: Revert i915_gem_freeze to previous logic
> > > >    drm/i915/gt: Revert ggtt_resume to previous logic
> > > > 
> > > >   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c |  9 +---
> > > >   drivers/gpu/drm/i915/gem/i915_gem_pm.c     | 56
> > > > ++++++++++++++--------
> > > >   drivers/gpu/drm/i915/gt/intel_ggtt.c       | 17 +++----
> > > >   drivers/gpu/drm/i915/gt/intel_gtt.h        |  2 +-
> > > >   4 files changed, 46 insertions(+), 38 deletions(-)
> > > > 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-21 12:33         ` [Intel-gfx] " Thomas Hellström
@ 2022-03-21 13:12           ` Tvrtko Ursulin
  -1 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 13:12 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld


On 21/03/2022 12:33, Thomas Hellström wrote:
> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
>>
>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>> Hi, Tvrtko.
>>>
>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>
>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>> To align with the discussion in [1][2], this patch series drops
>>>>> all
>>>>> usage of
>>>>> wbvind_on_all_cpus within i915 by either replacing the call
>>>>> with certain
>>>>> drm clflush helpers, or reverting to a previous logic.
>>>>
>>>> AFAIU, complaint from [1] was that it is wrong to provide non x86
>>>> implementations under the wbinvd_on_all_cpus name. Instead an
>>>> arch
>>>> agnostic helper which achieves the same effect could be created.
>>>> Does
>>>> Arm have such concept?
>>>
>>> I also understand Linus' email like we shouldn't leak incoherent IO
>>> to
>>> other architectures, meaning any remaining wbinvd()s should be X86
>>> only.
>>
>> The last part is completely obvious since it is a x86 instruction
>> name.
> 
> Yeah, I meant the function implementing wbinvd() semantics.
> 
>>
>> But I think we can't pick a solution until we know how the concept
>> maps
>> to Arm and that will also include seeing how the drm_clflush_sg for
>> Arm
>> would look. Is there a range based solution, or just a big hammer
>> there.
>> If the latter, then it is no good to churn all these reverts but
>> instead
>> an arch agnostic wrapper, with a generic name, would be the way to
>> go.
> 
> But my impression was that ARM would not need the range-based interface
> either, because ARM is only for discrete and with discrete we're always
> coherent.

Not sure what you mean here - what about flushing system memory objects 
on discrete? Those still need flushing on paths like suspend which this 
series touches. Am I missing something?

If I am not, then that means we either keep the current, presumably 
optimised (wasn't personally involved so I don't know), flush once code 
paths and add a wrapper i915_flush_caches/whatever, or convert all those 
back into piece-meal flushes so range flushing can be done. Assuming Arm 
does range flushing. That's why I asked what does Arm have here.

> So in essence it all would become:
> 
> 1) Any cache flushing intended for incoherent IO is x86 only.
> 2) Prefer range-based flushing if possible and any implications sorted
> out.

Yes, the question is how to do it.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-21 13:12           ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 13:12 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld


On 21/03/2022 12:33, Thomas Hellström wrote:
> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
>>
>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>> Hi, Tvrtko.
>>>
>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>
>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>> To align with the discussion in [1][2], this patch series drops
>>>>> all
>>>>> usage of
>>>>> wbvind_on_all_cpus within i915 by either replacing the call
>>>>> with certain
>>>>> drm clflush helpers, or reverting to a previous logic.
>>>>
>>>> AFAIU, complaint from [1] was that it is wrong to provide non x86
>>>> implementations under the wbinvd_on_all_cpus name. Instead an
>>>> arch
>>>> agnostic helper which achieves the same effect could be created.
>>>> Does
>>>> Arm have such concept?
>>>
>>> I also understand Linus' email like we shouldn't leak incoherent IO
>>> to
>>> other architectures, meaning any remaining wbinvd()s should be X86
>>> only.
>>
>> The last part is completely obvious since it is a x86 instruction
>> name.
> 
> Yeah, I meant the function implementing wbinvd() semantics.
> 
>>
>> But I think we can't pick a solution until we know how the concept
>> maps
>> to Arm and that will also include seeing how the drm_clflush_sg for
>> Arm
>> would look. Is there a range based solution, or just a big hammer
>> there.
>> If the latter, then it is no good to churn all these reverts but
>> instead
>> an arch agnostic wrapper, with a generic name, would be the way to
>> go.
> 
> But my impression was that ARM would not need the range-based interface
> either, because ARM is only for discrete and with discrete we're always
> coherent.

Not sure what you mean here - what about flushing system memory objects 
on discrete? Those still need flushing on paths like suspend which this 
series touches. Am I missing something?

If I am not, then that means we either keep the current, presumably 
optimised (wasn't personally involved so I don't know), flush once code 
paths and add a wrapper i915_flush_caches/whatever, or convert all those 
back into piece-meal flushes so range flushing can be done. Assuming Arm 
does range flushing. That's why I asked what does Arm have here.

> So in essence it all would become:
> 
> 1) Any cache flushing intended for incoherent IO is x86 only.
> 2) Prefer range-based flushing if possible and any implications sorted
> out.

Yes, the question is how to do it.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-21 13:12           ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-21 13:40             ` Thomas Hellström
  -1 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 13:40 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld

Hi,

On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> 
> On 21/03/2022 12:33, Thomas Hellström wrote:
> > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > Hi, Tvrtko.
> > > > 
> > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > To align with the discussion in [1][2], this patch series
> > > > > > drops
> > > > > > all
> > > > > > usage of
> > > > > > wbvind_on_all_cpus within i915 by either replacing the call
> > > > > > with certain
> > > > > > drm clflush helpers, or reverting to a previous logic.
> > > > > 
> > > > > AFAIU, complaint from [1] was that it is wrong to provide non
> > > > > x86
> > > > > implementations under the wbinvd_on_all_cpus name. Instead an
> > > > > arch
> > > > > agnostic helper which achieves the same effect could be
> > > > > created.
> > > > > Does
> > > > > Arm have such concept?
> > > > 
> > > > I also understand Linus' email like we shouldn't leak incoherent
> > > > IO
> > > > to
> > > > other architectures, meaning any remaining wbinvd()s should be
> > > > X86
> > > > only.
> > > 
> > > The last part is completely obvious since it is a x86 instruction
> > > name.
> > 
> > Yeah, I meant the function implementing wbinvd() semantics.
> > 
> > > 
> > > But I think we can't pick a solution until we know how the concept
> > > maps
> > > to Arm and that will also include seeing how the drm_clflush_sg for
> > > Arm
> > > would look. Is there a range based solution, or just a big hammer
> > > there.
> > > If the latter, then it is no good to churn all these reverts but
> > > instead
> > > an arch agnostic wrapper, with a generic name, would be the way to
> > > go.
> > 
> > But my impression was that ARM would not need the range-based
> > interface
> > either, because ARM is only for discrete and with discrete we're
> > always
> > coherent.
> 
> Not sure what you mean here - what about flushing system memory objects
> on discrete? Those still need flushing on paths like suspend which this
> series touches. Am I missing something?

System bos on discrete should always have

I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE

either by the gpu being fully cache coherent (or us mapping system
write-combined). Hence no need for cache clflushes or wbinvd() for
incoherent IO.

That's adhering to Linus'

"And I sincerely hope to the gods that no cache-incoherent i915 mess
ever makes it out of the x86 world. Incoherent IO was always a
historical mistake and should never ever happen again, so we should
not spread that horrific pattern around."


/Thomas



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-21 13:40             ` Thomas Hellström
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 13:40 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld

Hi,

On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> 
> On 21/03/2022 12:33, Thomas Hellström wrote:
> > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > Hi, Tvrtko.
> > > > 
> > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > To align with the discussion in [1][2], this patch series
> > > > > > drops
> > > > > > all
> > > > > > usage of
> > > > > > wbvind_on_all_cpus within i915 by either replacing the call
> > > > > > with certain
> > > > > > drm clflush helpers, or reverting to a previous logic.
> > > > > 
> > > > > AFAIU, complaint from [1] was that it is wrong to provide non
> > > > > x86
> > > > > implementations under the wbinvd_on_all_cpus name. Instead an
> > > > > arch
> > > > > agnostic helper which achieves the same effect could be
> > > > > created.
> > > > > Does
> > > > > Arm have such concept?
> > > > 
> > > > I also understand Linus' email like we shouldn't leak incoherent
> > > > IO
> > > > to
> > > > other architectures, meaning any remaining wbinvd()s should be
> > > > X86
> > > > only.
> > > 
> > > The last part is completely obvious since it is a x86 instruction
> > > name.
> > 
> > Yeah, I meant the function implementing wbinvd() semantics.
> > 
> > > 
> > > But I think we can't pick a solution until we know how the concept
> > > maps
> > > to Arm and that will also include seeing how the drm_clflush_sg for
> > > Arm
> > > would look. Is there a range based solution, or just a big hammer
> > > there.
> > > If the latter, then it is no good to churn all these reverts but
> > > instead
> > > an arch agnostic wrapper, with a generic name, would be the way to
> > > go.
> > 
> > But my impression was that ARM would not need the range-based
> > interface
> > either, because ARM is only for discrete and with discrete we're
> > always
> > coherent.
> 
> Not sure what you mean here - what about flushing system memory objects
> on discrete? Those still need flushing on paths like suspend which this
> series touches. Am I missing something?

System bos on discrete should always have

I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE

either by the gpu being fully cache coherent (or us mapping system
write-combined). Hence no need for cache clflushes or wbinvd() for
incoherent IO.

That's adhering to Linus'

"And I sincerely hope to the gods that no cache-incoherent i915 mess
ever makes it out of the x86 world. Incoherent IO was always a
historical mistake and should never ever happen again, so we should
not spread that horrific pattern around."


/Thomas



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-21 13:40             ` [Intel-gfx] " Thomas Hellström
@ 2022-03-21 14:43               ` Tvrtko Ursulin
  -1 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 14:43 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld


On 21/03/2022 13:40, Thomas Hellström wrote:
> Hi,
> 
> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>
>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>> Hi, Tvrtko.
>>>>>
>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>> To align with the discussion in [1][2], this patch series
>>>>>>> drops
>>>>>>> all
>>>>>>> usage of
>>>>>>> wbvind_on_all_cpus within i915 by either replacing the call
>>>>>>> with certain
>>>>>>> drm clflush helpers, or reverting to a previous logic.
>>>>>>
>>>>>> AFAIU, complaint from [1] was that it is wrong to provide non
>>>>>> x86
>>>>>> implementations under the wbinvd_on_all_cpus name. Instead an
>>>>>> arch
>>>>>> agnostic helper which achieves the same effect could be
>>>>>> created.
>>>>>> Does
>>>>>> Arm have such concept?
>>>>>
>>>>> I also understand Linus' email like we shouldn't leak incoherent
>>>>> IO
>>>>> to
>>>>> other architectures, meaning any remaining wbinvd()s should be
>>>>> X86
>>>>> only.
>>>>
>>>> The last part is completely obvious since it is a x86 instruction
>>>> name.
>>>
>>> Yeah, I meant the function implementing wbinvd() semantics.
>>>
>>>>
>>>> But I think we can't pick a solution until we know how the concept
>>>> maps
>>>> to Arm and that will also include seeing how the drm_clflush_sg for
>>>> Arm
>>>> would look. Is there a range based solution, or just a big hammer
>>>> there.
>>>> If the latter, then it is no good to churn all these reverts but
>>>> instead
>>>> an arch agnostic wrapper, with a generic name, would be the way to
>>>> go.
>>>
>>> But my impression was that ARM would not need the range-based
>>> interface
>>> either, because ARM is only for discrete and with discrete we're
>>> always
>>> coherent.
>>
>> Not sure what you mean here - what about flushing system memory objects
>> on discrete? Those still need flushing on paths like suspend which this
>> series touches. Am I missing something?
> 
> System bos on discrete should always have
> 
> I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE
> 
> either by the gpu being fully cache coherent (or us mapping system
> write-combined). Hence no need for cache clflushes or wbinvd() for
> incoherent IO.

Hmm so you are talking about the shmem ttm backend. It ends up depending on the result of i915_ttm_cache_level, yes? It cannot end up with I915_CACHE_NONE from that function?

I also found in i915_drm.h:

	 * As caching mode when specifying `I915_MMAP_OFFSET_FIXED`, WC or WB will
	 * be used, depending on the object placement on creation. WB will be used
	 * when the object can only exist in system memory, WC otherwise.

If what you say is true, that on discrete it is _always_ WC, then that needs updating as well.

> 
> That's adhering to Linus'
> 
> "And I sincerely hope to the gods that no cache-incoherent i915 mess
> ever makes it out of the x86 world. Incoherent IO was always a
> historical mistake and should never ever happen again, so we should
> not spread that horrific pattern around."

Sure, but I was not talking about IO - just the CPU side access to CPU side objects.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-21 14:43               ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 14:43 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld


On 21/03/2022 13:40, Thomas Hellström wrote:
> Hi,
> 
> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>
>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>> Hi, Tvrtko.
>>>>>
>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>> To align with the discussion in [1][2], this patch series
>>>>>>> drops
>>>>>>> all
>>>>>>> usage of
>>>>>>> wbvind_on_all_cpus within i915 by either replacing the call
>>>>>>> with certain
>>>>>>> drm clflush helpers, or reverting to a previous logic.
>>>>>>
>>>>>> AFAIU, complaint from [1] was that it is wrong to provide non
>>>>>> x86
>>>>>> implementations under the wbinvd_on_all_cpus name. Instead an
>>>>>> arch
>>>>>> agnostic helper which achieves the same effect could be
>>>>>> created.
>>>>>> Does
>>>>>> Arm have such concept?
>>>>>
>>>>> I also understand Linus' email like we shouldn't leak incoherent
>>>>> IO
>>>>> to
>>>>> other architectures, meaning any remaining wbinvd()s should be
>>>>> X86
>>>>> only.
>>>>
>>>> The last part is completely obvious since it is a x86 instruction
>>>> name.
>>>
>>> Yeah, I meant the function implementing wbinvd() semantics.
>>>
>>>>
>>>> But I think we can't pick a solution until we know how the concept
>>>> maps
>>>> to Arm and that will also include seeing how the drm_clflush_sg for
>>>> Arm
>>>> would look. Is there a range based solution, or just a big hammer
>>>> there.
>>>> If the latter, then it is no good to churn all these reverts but
>>>> instead
>>>> an arch agnostic wrapper, with a generic name, would be the way to
>>>> go.
>>>
>>> But my impression was that ARM would not need the range-based
>>> interface
>>> either, because ARM is only for discrete and with discrete we're
>>> always
>>> coherent.
>>
>> Not sure what you mean here - what about flushing system memory objects
>> on discrete? Those still need flushing on paths like suspend which this
>> series touches. Am I missing something?
> 
> System bos on discrete should always have
> 
> I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE
> 
> either by the gpu being fully cache coherent (or us mapping system
> write-combined). Hence no need for cache clflushes or wbinvd() for
> incoherent IO.

Hmm so you are talking about the shmem ttm backend. It ends up depending on the result of i915_ttm_cache_level, yes? It cannot end up with I915_CACHE_NONE from that function?

I also found in i915_drm.h:

	 * As caching mode when specifying `I915_MMAP_OFFSET_FIXED`, WC or WB will
	 * be used, depending on the object placement on creation. WB will be used
	 * when the object can only exist in system memory, WC otherwise.

If what you say is true, that on discrete it is _always_ WC, then that needs updating as well.

> 
> That's adhering to Linus'
> 
> "And I sincerely hope to the gods that no cache-incoherent i915 mess
> ever makes it out of the x86 world. Incoherent IO was always a
> historical mistake and should never ever happen again, so we should
> not spread that horrific pattern around."

Sure, but I was not talking about IO - just the CPU side access to CPU side objects.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-21 14:43               ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-21 15:15                 ` Thomas Hellström
  -1 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 15:15 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld

On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
> 
> On 21/03/2022 13:40, Thomas Hellström wrote:
> > Hi,
> > 
> > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 21/03/2022 12:33, Thomas Hellström wrote:
> > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > > > Hi, Tvrtko.
> > > > > > 
> > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > > > 
> > > > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > > > To align with the discussion in [1][2], this patch
> > > > > > > > series
> > > > > > > > drops
> > > > > > > > all
> > > > > > > > usage of
> > > > > > > > wbvind_on_all_cpus within i915 by either replacing the
> > > > > > > > call
> > > > > > > > with certain
> > > > > > > > drm clflush helpers, or reverting to a previous logic.
> > > > > > > 
> > > > > > > AFAIU, complaint from [1] was that it is wrong to provide
> > > > > > > non
> > > > > > > x86
> > > > > > > implementations under the wbinvd_on_all_cpus name.
> > > > > > > Instead an
> > > > > > > arch
> > > > > > > agnostic helper which achieves the same effect could be
> > > > > > > created.
> > > > > > > Does
> > > > > > > Arm have such concept?
> > > > > > 
> > > > > > I also understand Linus' email like we shouldn't leak
> > > > > > incoherent
> > > > > > IO
> > > > > > to
> > > > > > other architectures, meaning any remaining wbinvd()s should
> > > > > > be
> > > > > > X86
> > > > > > only.
> > > > > 
> > > > > The last part is completely obvious since it is a x86
> > > > > instruction
> > > > > name.
> > > > 
> > > > Yeah, I meant the function implementing wbinvd() semantics.
> > > > 
> > > > > 
> > > > > But I think we can't pick a solution until we know how the
> > > > > concept
> > > > > maps
> > > > > to Arm and that will also include seeing how the
> > > > > drm_clflush_sg for
> > > > > Arm
> > > > > would look. Is there a range based solution, or just a big
> > > > > hammer
> > > > > there.
> > > > > If the latter, then it is no good to churn all these reverts
> > > > > but
> > > > > instead
> > > > > an arch agnostic wrapper, with a generic name, would be the
> > > > > way to
> > > > > go.
> > > > 
> > > > But my impression was that ARM would not need the range-based
> > > > interface
> > > > either, because ARM is only for discrete and with discrete
> > > > we're
> > > > always
> > > > coherent.
> > > 
> > > Not sure what you mean here - what about flushing system memory
> > > objects
> > > on discrete? Those still need flushing on paths like suspend
> > > which this
> > > series touches. Am I missing something?
> > 
> > System bos on discrete should always have
> > 
> > I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE
> > 
> > either by the gpu being fully cache coherent (or us mapping system
> > write-combined). Hence no need for cache clflushes or wbinvd() for
> > incoherent IO.
> 
> Hmm so you are talking about the shmem ttm backend. It ends up
> depending on the result of i915_ttm_cache_level, yes? It cannot end
> up with I915_CACHE_NONE from that function?

If the object is allocated with allowable placement in either LMEM or
SYSTEM, and it ends in system, it gets allocated with I915_CACHE_NONE,
but then the shmem ttm backend isn't used but TTM's wc pools, and the
object should *always* be mapped wc. Even in system.

> 
> I also found in i915_drm.h:
> 
>          * As caching mode when specifying `I915_MMAP_OFFSET_FIXED`,
> WC or WB will
>          * be used, depending on the object placement on creation. WB
> will be used
>          * when the object can only exist in system memory, WC
> otherwise.
> 
> If what you say is true, that on discrete it is _always_ WC, then
> that needs updating as well.

If an object is allocated as system only, then it is mapped WB, and
we're relying on the gpu being cache coherent to avoid clflushes. Same
is actually currently true if the object happens to be accessed by the
cpu while evicted. Might need an update for that.

> 
> > 
> > That's adhering to Linus'
> > 
> > "And I sincerely hope to the gods that no cache-incoherent i915
> > mess
> > ever makes it out of the x86 world. Incoherent IO was always a
> > historical mistake and should never ever happen again, so we should
> > not spread that horrific pattern around."
> 
> Sure, but I was not talking about IO - just the CPU side access to
> CPU side objects.

OK, I was under the impression that clflushes() and wbinvd()s in i915
was only ever used to make data visible to non-snooping GPUs. 

Do you mean that there are other uses as well? Agreed the wb cache
flush on on suspend only if gpu is !I915_BO_CACHE_COHERENT_FOR_READ?
looks to not fit this pattern completely.

Otherwise, for architectures where memory isn't always fully coherent
with the cpu cache, I'd expect them to use the apis in
asm/cacheflush.h, like flush_cache_range() and similar, which are nops
on x86.

Thanks,
Thomas


> 
> Regards,
> 
> Tvrtko



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-21 15:15                 ` Thomas Hellström
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-21 15:15 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld

On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
> 
> On 21/03/2022 13:40, Thomas Hellström wrote:
> > Hi,
> > 
> > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 21/03/2022 12:33, Thomas Hellström wrote:
> > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > > > Hi, Tvrtko.
> > > > > > 
> > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > > > 
> > > > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > > > To align with the discussion in [1][2], this patch
> > > > > > > > series
> > > > > > > > drops
> > > > > > > > all
> > > > > > > > usage of
> > > > > > > > wbvind_on_all_cpus within i915 by either replacing the
> > > > > > > > call
> > > > > > > > with certain
> > > > > > > > drm clflush helpers, or reverting to a previous logic.
> > > > > > > 
> > > > > > > AFAIU, complaint from [1] was that it is wrong to provide
> > > > > > > non
> > > > > > > x86
> > > > > > > implementations under the wbinvd_on_all_cpus name.
> > > > > > > Instead an
> > > > > > > arch
> > > > > > > agnostic helper which achieves the same effect could be
> > > > > > > created.
> > > > > > > Does
> > > > > > > Arm have such concept?
> > > > > > 
> > > > > > I also understand Linus' email like we shouldn't leak
> > > > > > incoherent
> > > > > > IO
> > > > > > to
> > > > > > other architectures, meaning any remaining wbinvd()s should
> > > > > > be
> > > > > > X86
> > > > > > only.
> > > > > 
> > > > > The last part is completely obvious since it is a x86
> > > > > instruction
> > > > > name.
> > > > 
> > > > Yeah, I meant the function implementing wbinvd() semantics.
> > > > 
> > > > > 
> > > > > But I think we can't pick a solution until we know how the
> > > > > concept
> > > > > maps
> > > > > to Arm and that will also include seeing how the
> > > > > drm_clflush_sg for
> > > > > Arm
> > > > > would look. Is there a range based solution, or just a big
> > > > > hammer
> > > > > there.
> > > > > If the latter, then it is no good to churn all these reverts
> > > > > but
> > > > > instead
> > > > > an arch agnostic wrapper, with a generic name, would be the
> > > > > way to
> > > > > go.
> > > > 
> > > > But my impression was that ARM would not need the range-based
> > > > interface
> > > > either, because ARM is only for discrete and with discrete
> > > > we're
> > > > always
> > > > coherent.
> > > 
> > > Not sure what you mean here - what about flushing system memory
> > > objects
> > > on discrete? Those still need flushing on paths like suspend
> > > which this
> > > series touches. Am I missing something?
> > 
> > System bos on discrete should always have
> > 
> > I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE
> > 
> > either by the gpu being fully cache coherent (or us mapping system
> > write-combined). Hence no need for cache clflushes or wbinvd() for
> > incoherent IO.
> 
> Hmm so you are talking about the shmem ttm backend. It ends up
> depending on the result of i915_ttm_cache_level, yes? It cannot end
> up with I915_CACHE_NONE from that function?

If the object is allocated with allowable placement in either LMEM or
SYSTEM, and it ends in system, it gets allocated with I915_CACHE_NONE,
but then the shmem ttm backend isn't used but TTM's wc pools, and the
object should *always* be mapped wc. Even in system.

> 
> I also found in i915_drm.h:
> 
>          * As caching mode when specifying `I915_MMAP_OFFSET_FIXED`,
> WC or WB will
>          * be used, depending on the object placement on creation. WB
> will be used
>          * when the object can only exist in system memory, WC
> otherwise.
> 
> If what you say is true, that on discrete it is _always_ WC, then
> that needs updating as well.

If an object is allocated as system only, then it is mapped WB, and
we're relying on the gpu being cache coherent to avoid clflushes. Same
is actually currently true if the object happens to be accessed by the
cpu while evicted. Might need an update for that.

> 
> > 
> > That's adhering to Linus'
> > 
> > "And I sincerely hope to the gods that no cache-incoherent i915
> > mess
> > ever makes it out of the x86 world. Incoherent IO was always a
> > historical mistake and should never ever happen again, so we should
> > not spread that horrific pattern around."
> 
> Sure, but I was not talking about IO - just the CPU side access to
> CPU side objects.

OK, I was under the impression that clflushes() and wbinvd()s in i915
was only ever used to make data visible to non-snooping GPUs. 

Do you mean that there are other uses as well? Agreed the wb cache
flush on on suspend only if gpu is !I915_BO_CACHE_COHERENT_FOR_READ?
looks to not fit this pattern completely.

Otherwise, for architectures where memory isn't always fully coherent
with the cpu cache, I'd expect them to use the apis in
asm/cacheflush.h, like flush_cache_range() and similar, which are nops
on x86.

Thanks,
Thomas


> 
> Regards,
> 
> Tvrtko



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
  2022-03-21 10:30     ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-21 16:31       ` Michael Cheng
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-21 16:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx
  Cc: thomas.hellstrom, wayne.boyer, daniel.vetter, casey.g.bowman,
	lucas.demarchi, dri-devel, chris, Daniel Vetter

On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:

>
> On 19/03/2022 19:42, Michael Cheng wrote:
>> Previous concern with using drm_clflush_sg was that we don't know 
>> what the
>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>> everything at once to avoid paranoia.
>
> And now we know, or we know it is not a concern?
>
>> To make i915 more architecture-neutral and be less paranoid, lets 
>> attempt to
>
> "Lets attempt" as we don't know if this will work and/or what can/will 
> break?

Yes, but it seems like there's no regression with IGT .

If there's a big hit in performance, or if this solution gets accepted 
and the bug reports come flying in, we can explore other solutions. But 
speaking to Dan Vetter, ideal solution would be to avoid any calls 
directly to wbinvd, and use drm helpers in place.

+Daniel for any extra input.

>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>> from main memory.
>>
>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> index f5062d0c6333..b0a5baaebc43 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> @@ -8,6 +8,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/dma-resv.h>
>>   #include <linux/module.h>
>> +#include <drm/drm_cache.h>
>>     #include <asm/smp.h>
>>   @@ -250,16 +251,10 @@ static int 
>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>        * DG1 is special here since it still snoops transactions even 
>> with
>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>> platforms. We
>>        * might need to revisit this as we add new discrete platforms.
>> -     *
>> -     * XXX: Consider doing a vmap flush or something, where possible.
>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() here 
>> since
>> -     * the underlying sg_table might not even point to struct pages, 
>> so we
>> -     * can't just call drm_clflush_sg or similar, like we do 
>> elsewhere in
>> -     * the driver.
>>        */
>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>> -        wbinvd_on_all_cpus();
>> +        drm_clflush_sg(pages);
>
> And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
> so are you just punting the issue somewhere else? How will it be 
> solved there?
>
Instead of calling an x86 asm directly, we are using what's available to 
use to make the driver more architecture neutral. Agreeing with Thomas, 
this solution falls within the "prefer range-aware clflush apis", and 
since some other generation platform doesn't support clflushopt, it will 
fall back to using wbinvd.
> Regards,
>
> Tvrtko
>
>>         sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>>       __i915_gem_object_set_pages(obj, pages, sg_page_sizes);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
@ 2022-03-21 16:31       ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-21 16:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx
  Cc: thomas.hellstrom, daniel.vetter, lucas.demarchi, dri-devel,
	chris, Daniel Vetter

On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:

>
> On 19/03/2022 19:42, Michael Cheng wrote:
>> Previous concern with using drm_clflush_sg was that we don't know 
>> what the
>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>> everything at once to avoid paranoia.
>
> And now we know, or we know it is not a concern?
>
>> To make i915 more architecture-neutral and be less paranoid, lets 
>> attempt to
>
> "Lets attempt" as we don't know if this will work and/or what can/will 
> break?

Yes, but it seems like there's no regression with IGT .

If there's a big hit in performance, or if this solution gets accepted 
and the bug reports come flying in, we can explore other solutions. But 
speaking to Dan Vetter, ideal solution would be to avoid any calls 
directly to wbinvd, and use drm helpers in place.

+Daniel for any extra input.

>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>> from main memory.
>>
>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> index f5062d0c6333..b0a5baaebc43 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> @@ -8,6 +8,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/dma-resv.h>
>>   #include <linux/module.h>
>> +#include <drm/drm_cache.h>
>>     #include <asm/smp.h>
>>   @@ -250,16 +251,10 @@ static int 
>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>        * DG1 is special here since it still snoops transactions even 
>> with
>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>> platforms. We
>>        * might need to revisit this as we add new discrete platforms.
>> -     *
>> -     * XXX: Consider doing a vmap flush or something, where possible.
>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() here 
>> since
>> -     * the underlying sg_table might not even point to struct pages, 
>> so we
>> -     * can't just call drm_clflush_sg or similar, like we do 
>> elsewhere in
>> -     * the driver.
>>        */
>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>> -        wbinvd_on_all_cpus();
>> +        drm_clflush_sg(pages);
>
> And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
> so are you just punting the issue somewhere else? How will it be 
> solved there?
>
Instead of calling an x86 asm directly, we are using what's available to 
use to make the driver more architecture neutral. Agreeing with Thomas, 
this solution falls within the "prefer range-aware clflush apis", and 
since some other generation platform doesn't support clflushopt, it will 
fall back to using wbinvd.
> Regards,
>
> Tvrtko
>
>>         sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>>       __i915_gem_object_set_pages(obj, pages, sg_page_sizes);

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
  2022-03-21 16:31       ` [Intel-gfx] " Michael Cheng
@ 2022-03-21 17:28         ` Tvrtko Ursulin
  -1 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 17:28 UTC (permalink / raw)
  To: Michael Cheng, intel-gfx
  Cc: thomas.hellstrom, wayne.boyer, daniel.vetter, casey.g.bowman,
	lucas.demarchi, dri-devel, chris, Daniel Vetter


On 21/03/2022 16:31, Michael Cheng wrote:
> On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
> 
>>
>> On 19/03/2022 19:42, Michael Cheng wrote:
>>> Previous concern with using drm_clflush_sg was that we don't know 
>>> what the
>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>> everything at once to avoid paranoia.
>>
>> And now we know, or we know it is not a concern?
>>
>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>> attempt to
>>
>> "Lets attempt" as we don't know if this will work and/or what can/will 
>> break?
> 
> Yes, but it seems like there's no regression with IGT .
> 
> If there's a big hit in performance, or if this solution gets accepted 
> and the bug reports come flying in, we can explore other solutions. But 
> speaking to Dan Vetter, ideal solution would be to avoid any calls 
> directly to wbinvd, and use drm helpers in place.
> 
> +Daniel for any extra input.
> 
>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>> from main memory.
>>>
>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> index f5062d0c6333..b0a5baaebc43 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> @@ -8,6 +8,7 @@
>>>   #include <linux/highmem.h>
>>>   #include <linux/dma-resv.h>
>>>   #include <linux/module.h>
>>> +#include <drm/drm_cache.h>
>>>     #include <asm/smp.h>
>>>   @@ -250,16 +251,10 @@ static int 
>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>        * DG1 is special here since it still snoops transactions even 
>>> with
>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>> platforms. We
>>>        * might need to revisit this as we add new discrete platforms.
>>> -     *
>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() here 
>>> since
>>> -     * the underlying sg_table might not even point to struct pages, 
>>> so we
>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>> elsewhere in
>>> -     * the driver.
>>>        */
>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>> -        wbinvd_on_all_cpus();
>>> +        drm_clflush_sg(pages);
>>
>> And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
>> so are you just punting the issue somewhere else? How will it be 
>> solved there?
>>
> Instead of calling an x86 asm directly, we are using what's available to 
> use to make the driver more architecture neutral. Agreeing with Thomas, 
> this solution falls within the "prefer range-aware clflush apis", and 
> since some other generation platform doesn't support clflushopt, it will 
> fall back to using wbinvd.

Right, I was trying to get the information on what will drm_clflush_sg 
do on Arm. Is it range based or global there, or if the latter exists.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
@ 2022-03-21 17:28         ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-21 17:28 UTC (permalink / raw)
  To: Michael Cheng, intel-gfx
  Cc: thomas.hellstrom, daniel.vetter, lucas.demarchi, dri-devel,
	chris, Daniel Vetter


On 21/03/2022 16:31, Michael Cheng wrote:
> On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
> 
>>
>> On 19/03/2022 19:42, Michael Cheng wrote:
>>> Previous concern with using drm_clflush_sg was that we don't know 
>>> what the
>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>> everything at once to avoid paranoia.
>>
>> And now we know, or we know it is not a concern?
>>
>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>> attempt to
>>
>> "Lets attempt" as we don't know if this will work and/or what can/will 
>> break?
> 
> Yes, but it seems like there's no regression with IGT .
> 
> If there's a big hit in performance, or if this solution gets accepted 
> and the bug reports come flying in, we can explore other solutions. But 
> speaking to Dan Vetter, ideal solution would be to avoid any calls 
> directly to wbinvd, and use drm helpers in place.
> 
> +Daniel for any extra input.
> 
>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>> from main memory.
>>>
>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> index f5062d0c6333..b0a5baaebc43 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> @@ -8,6 +8,7 @@
>>>   #include <linux/highmem.h>
>>>   #include <linux/dma-resv.h>
>>>   #include <linux/module.h>
>>> +#include <drm/drm_cache.h>
>>>     #include <asm/smp.h>
>>>   @@ -250,16 +251,10 @@ static int 
>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>        * DG1 is special here since it still snoops transactions even 
>>> with
>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>> platforms. We
>>>        * might need to revisit this as we add new discrete platforms.
>>> -     *
>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() here 
>>> since
>>> -     * the underlying sg_table might not even point to struct pages, 
>>> so we
>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>> elsewhere in
>>> -     * the driver.
>>>        */
>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>> -        wbinvd_on_all_cpus();
>>> +        drm_clflush_sg(pages);
>>
>> And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
>> so are you just punting the issue somewhere else? How will it be 
>> solved there?
>>
> Instead of calling an x86 asm directly, we are using what's available to 
> use to make the driver more architecture neutral. Agreeing with Thomas, 
> this solution falls within the "prefer range-aware clflush apis", and 
> since some other generation platform doesn't support clflushopt, it will 
> fall back to using wbinvd.

Right, I was trying to get the information on what will drm_clflush_sg 
do on Arm. Is it range based or global there, or if the latter exists.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
  2022-03-21 17:28         ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-21 17:42           ` Michael Cheng
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-21 17:42 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx
  Cc: thomas.hellstrom, wayne.boyer, daniel.vetter, casey.g.bowman,
	lucas.demarchi, dri-devel, chris, Daniel Vetter


On 2022-03-21 10:28 a.m., Tvrtko Ursulin wrote:
>
> On 21/03/2022 16:31, Michael Cheng wrote:
>> On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
>>
>>>
>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>> Previous concern with using drm_clflush_sg was that we don't know 
>>>> what the
>>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>>> everything at once to avoid paranoia.
>>>
>>> And now we know, or we know it is not a concern?
>>>
>>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>>> attempt to
>>>
>>> "Lets attempt" as we don't know if this will work and/or what 
>>> can/will break?
>>
>> Yes, but it seems like there's no regression with IGT .
>>
>> If there's a big hit in performance, or if this solution gets 
>> accepted and the bug reports come flying in, we can explore other 
>> solutions. But speaking to Dan Vetter, ideal solution would be to 
>> avoid any calls directly to wbinvd, and use drm helpers in place.
>>
>> +Daniel for any extra input.
>>
>>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>>> from main memory.
>>>>
>>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> index f5062d0c6333..b0a5baaebc43 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> @@ -8,6 +8,7 @@
>>>>   #include <linux/highmem.h>
>>>>   #include <linux/dma-resv.h>
>>>>   #include <linux/module.h>
>>>> +#include <drm/drm_cache.h>
>>>>     #include <asm/smp.h>
>>>>   @@ -250,16 +251,10 @@ static int 
>>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>>        * DG1 is special here since it still snoops transactions 
>>>> even with
>>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>>> platforms. We
>>>>        * might need to revisit this as we add new discrete platforms.
>>>> -     *
>>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() 
>>>> here since
>>>> -     * the underlying sg_table might not even point to struct 
>>>> pages, so we
>>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>>> elsewhere in
>>>> -     * the driver.
>>>>        */
>>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>>> -        wbinvd_on_all_cpus();
>>>> +        drm_clflush_sg(pages);
>>>
>>> And as noticed before, drm_clfush_sg still can call 
>>> wbinvd_on_all_cpus so are you just punting the issue somewhere else? 
>>> How will it be solved there?
>>>
>> Instead of calling an x86 asm directly, we are using what's available 
>> to use to make the driver more architecture neutral. Agreeing with 
>> Thomas, this solution falls within the "prefer range-aware clflush 
>> apis", and since some other generation platform doesn't support 
>> clflushopt, it will fall back to using wbinvd.
>
> Right, I was trying to get the information on what will drm_clflush_sg 
> do on Arm. Is it range based or global there, or if the latter exists.
>
I am not too sure about the ARM side. We are currently working that out 
with the ARM folks in a different thread.
> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
@ 2022-03-21 17:42           ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-21 17:42 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx
  Cc: thomas.hellstrom, daniel.vetter, lucas.demarchi, dri-devel,
	chris, Daniel Vetter


On 2022-03-21 10:28 a.m., Tvrtko Ursulin wrote:
>
> On 21/03/2022 16:31, Michael Cheng wrote:
>> On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
>>
>>>
>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>> Previous concern with using drm_clflush_sg was that we don't know 
>>>> what the
>>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>>> everything at once to avoid paranoia.
>>>
>>> And now we know, or we know it is not a concern?
>>>
>>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>>> attempt to
>>>
>>> "Lets attempt" as we don't know if this will work and/or what 
>>> can/will break?
>>
>> Yes, but it seems like there's no regression with IGT .
>>
>> If there's a big hit in performance, or if this solution gets 
>> accepted and the bug reports come flying in, we can explore other 
>> solutions. But speaking to Dan Vetter, ideal solution would be to 
>> avoid any calls directly to wbinvd, and use drm helpers in place.
>>
>> +Daniel for any extra input.
>>
>>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>>> from main memory.
>>>>
>>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> index f5062d0c6333..b0a5baaebc43 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> @@ -8,6 +8,7 @@
>>>>   #include <linux/highmem.h>
>>>>   #include <linux/dma-resv.h>
>>>>   #include <linux/module.h>
>>>> +#include <drm/drm_cache.h>
>>>>     #include <asm/smp.h>
>>>>   @@ -250,16 +251,10 @@ static int 
>>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>>        * DG1 is special here since it still snoops transactions 
>>>> even with
>>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>>> platforms. We
>>>>        * might need to revisit this as we add new discrete platforms.
>>>> -     *
>>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() 
>>>> here since
>>>> -     * the underlying sg_table might not even point to struct 
>>>> pages, so we
>>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>>> elsewhere in
>>>> -     * the driver.
>>>>        */
>>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>>> -        wbinvd_on_all_cpus();
>>>> +        drm_clflush_sg(pages);
>>>
>>> And as noticed before, drm_clfush_sg still can call 
>>> wbinvd_on_all_cpus so are you just punting the issue somewhere else? 
>>> How will it be solved there?
>>>
>> Instead of calling an x86 asm directly, we are using what's available 
>> to use to make the driver more architecture neutral. Agreeing with 
>> Thomas, this solution falls within the "prefer range-aware clflush 
>> apis", and since some other generation platform doesn't support 
>> clflushopt, it will fall back to using wbinvd.
>
> Right, I was trying to get the information on what will drm_clflush_sg 
> do on Arm. Is it range based or global there, or if the latter exists.
>
I am not too sure about the ARM side. We are currently working that out 
with the ARM folks in a different thread.
> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
  2022-03-21 17:28         ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-21 17:51           ` Michael Cheng
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-21 17:51 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx
  Cc: thomas.hellstrom, wayne.boyer, daniel.vetter, casey.g.bowman,
	lucas.demarchi, Robin Murphy, dri-devel, chris, Catalin Marinas,
	Daniel Vetter


On 2022-03-21 10:28 a.m., Tvrtko Ursulin wrote:
>
> On 21/03/2022 16:31, Michael Cheng wrote:
>> On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
>>
>>>
>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>> Previous concern with using drm_clflush_sg was that we don't know 
>>>> what the
>>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>>> everything at once to avoid paranoia.
>>>
>>> And now we know, or we know it is not a concern?
>>>
>>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>>> attempt to
>>>
>>> "Lets attempt" as we don't know if this will work and/or what 
>>> can/will break?
>>
>> Yes, but it seems like there's no regression with IGT .
>>
>> If there's a big hit in performance, or if this solution gets 
>> accepted and the bug reports come flying in, we can explore other 
>> solutions. But speaking to Dan Vetter, ideal solution would be to 
>> avoid any calls directly to wbinvd, and use drm helpers in place.
>>
>> +Daniel for any extra input.
>>
>>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>>> from main memory.
>>>>
>>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> index f5062d0c6333..b0a5baaebc43 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> @@ -8,6 +8,7 @@
>>>>   #include <linux/highmem.h>
>>>>   #include <linux/dma-resv.h>
>>>>   #include <linux/module.h>
>>>> +#include <drm/drm_cache.h>
>>>>     #include <asm/smp.h>
>>>>   @@ -250,16 +251,10 @@ static int 
>>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>>        * DG1 is special here since it still snoops transactions 
>>>> even with
>>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>>> platforms. We
>>>>        * might need to revisit this as we add new discrete platforms.
>>>> -     *
>>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() 
>>>> here since
>>>> -     * the underlying sg_table might not even point to struct 
>>>> pages, so we
>>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>>> elsewhere in
>>>> -     * the driver.
>>>>        */
>>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>>> -        wbinvd_on_all_cpus();
>>>> +        drm_clflush_sg(pages);
>>>
>>> And as noticed before, drm_clfush_sg still can call 
>>> wbinvd_on_all_cpus so are you just punting the issue somewhere else? 
>>> How will it be solved there?
>>>
>> Instead of calling an x86 asm directly, we are using what's available 
>> to use to make the driver more architecture neutral. Agreeing with 
>> Thomas, this solution falls within the "prefer range-aware clflush 
>> apis", and since some other generation platform doesn't support 
>> clflushopt, it will fall back to using wbinvd.
>
> Right, I was trying to get the information on what will drm_clflush_sg 
> do on Arm. Is it range based or global there, or if the latter exists.
>
CCing a few ARM folks to see if they have any inputs.

+ Catalin And Robin

> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
@ 2022-03-21 17:51           ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-21 17:51 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx
  Cc: thomas.hellstrom, daniel.vetter, lucas.demarchi, Robin Murphy,
	dri-devel, chris, Catalin Marinas, Daniel Vetter


On 2022-03-21 10:28 a.m., Tvrtko Ursulin wrote:
>
> On 21/03/2022 16:31, Michael Cheng wrote:
>> On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
>>
>>>
>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>> Previous concern with using drm_clflush_sg was that we don't know 
>>>> what the
>>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>>> everything at once to avoid paranoia.
>>>
>>> And now we know, or we know it is not a concern?
>>>
>>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>>> attempt to
>>>
>>> "Lets attempt" as we don't know if this will work and/or what 
>>> can/will break?
>>
>> Yes, but it seems like there's no regression with IGT .
>>
>> If there's a big hit in performance, or if this solution gets 
>> accepted and the bug reports come flying in, we can explore other 
>> solutions. But speaking to Dan Vetter, ideal solution would be to 
>> avoid any calls directly to wbinvd, and use drm helpers in place.
>>
>> +Daniel for any extra input.
>>
>>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>>> from main memory.
>>>>
>>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> index f5062d0c6333..b0a5baaebc43 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> @@ -8,6 +8,7 @@
>>>>   #include <linux/highmem.h>
>>>>   #include <linux/dma-resv.h>
>>>>   #include <linux/module.h>
>>>> +#include <drm/drm_cache.h>
>>>>     #include <asm/smp.h>
>>>>   @@ -250,16 +251,10 @@ static int 
>>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>>        * DG1 is special here since it still snoops transactions 
>>>> even with
>>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>>> platforms. We
>>>>        * might need to revisit this as we add new discrete platforms.
>>>> -     *
>>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() 
>>>> here since
>>>> -     * the underlying sg_table might not even point to struct 
>>>> pages, so we
>>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>>> elsewhere in
>>>> -     * the driver.
>>>>        */
>>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>>> -        wbinvd_on_all_cpus();
>>>> +        drm_clflush_sg(pages);
>>>
>>> And as noticed before, drm_clfush_sg still can call 
>>> wbinvd_on_all_cpus so are you just punting the issue somewhere else? 
>>> How will it be solved there?
>>>
>> Instead of calling an x86 asm directly, we are using what's available 
>> to use to make the driver more architecture neutral. Agreeing with 
>> Thomas, this solution falls within the "prefer range-aware clflush 
>> apis", and since some other generation platform doesn't support 
>> clflushopt, it will fall back to using wbinvd.
>
> Right, I was trying to get the information on what will drm_clflush_sg 
> do on Arm. Is it range based or global there, or if the latter exists.
>
CCing a few ARM folks to see if they have any inputs.

+ Catalin And Robin

> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
  2022-03-21 11:07       ` [Intel-gfx] " Thomas Hellström
@ 2022-03-21 18:51         ` Michael Cheng
  -1 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-21 18:51 UTC (permalink / raw)
  To: Thomas Hellström, Tvrtko Ursulin, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris

[-- Attachment #1: Type: text/plain, Size: 3367 bytes --]


On 2022-03-21 4:07 a.m., Thomas Hellström wrote:
>
> On 3/21/22 11:30, Tvrtko Ursulin wrote:
>>
>> On 19/03/2022 19:42, Michael Cheng wrote:
>>> Previous concern with using drm_clflush_sg was that we don't know 
>>> what the
>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>> everything at once to avoid paranoia.
>>
>> And now we know, or we know it is not a concern?
>>
>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>> attempt to
>>
>> "Lets attempt" as we don't know if this will work and/or what 
>> can/will break?
>>
>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>> from main memory.
>>>
>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> index f5062d0c6333..b0a5baaebc43 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> @@ -8,6 +8,7 @@
>>>   #include <linux/highmem.h>
>>>   #include <linux/dma-resv.h>
>>>   #include <linux/module.h>
>>> +#include <drm/drm_cache.h>
>>>     #include <asm/smp.h>
>>>   @@ -250,16 +251,10 @@ static int 
>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>        * DG1 is special here since it still snoops transactions even 
>>> with
>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>> platforms. We
>>>        * might need to revisit this as we add new discrete platforms.
>>> -     *
>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() 
>>> here since
>>> -     * the underlying sg_table might not even point to struct 
>>> pages, so we
>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>> elsewhere in
>>> -     * the driver.
>>>        */
>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>> -        wbinvd_on_all_cpus();
>>> +        drm_clflush_sg(pages);
>>
>> And as noticed before, drm_clfush_sg still can call 
>> wbinvd_on_all_cpus so are you just punting the issue somewhere else? 
>> How will it be solved there?
>
> I think in this case, drm_clflush_sg() can't be immediately used, 
> because pages may not contain actual page pointers; might be just the 
> dma address. It needs to be preceded with a dmabuf vmap.

Could you elaborate more with using a dmabuf vmap?

Doing a quick grep on drm_clflush_sg, were you thinking about something 
similar to the following?

if (obj->cache_dirty) {
WARN_ON_ONCE(IS_DGFX(i915));
obj->write_domain = 0;
if (i915_gem_object_has_struct_page(obj))
drm_clflush_sg(pages);
obj->cache_dirty = false;
}


Thanks,

Michael Cheng

> But otherwise this change, I figure, falls into the "prefer 
> range-aware apis" category; If the CPU supports it, flush the range 
> only, otherwise fall back to wbinvd().
>
> /Thomas
>
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>>         sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>>>       __i915_gem_object_set_pages(obj, pages, sg_page_sizes);

[-- Attachment #2: Type: text/html, Size: 11908 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
@ 2022-03-21 18:51         ` Michael Cheng
  0 siblings, 0 replies; 60+ messages in thread
From: Michael Cheng @ 2022-03-21 18:51 UTC (permalink / raw)
  To: Thomas Hellström, Tvrtko Ursulin, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris

[-- Attachment #1: Type: text/plain, Size: 3367 bytes --]


On 2022-03-21 4:07 a.m., Thomas Hellström wrote:
>
> On 3/21/22 11:30, Tvrtko Ursulin wrote:
>>
>> On 19/03/2022 19:42, Michael Cheng wrote:
>>> Previous concern with using drm_clflush_sg was that we don't know 
>>> what the
>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>> everything at once to avoid paranoia.
>>
>> And now we know, or we know it is not a concern?
>>
>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>> attempt to
>>
>> "Lets attempt" as we don't know if this will work and/or what 
>> can/will break?
>>
>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>> from main memory.
>>>
>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> index f5062d0c6333..b0a5baaebc43 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> @@ -8,6 +8,7 @@
>>>   #include <linux/highmem.h>
>>>   #include <linux/dma-resv.h>
>>>   #include <linux/module.h>
>>> +#include <drm/drm_cache.h>
>>>     #include <asm/smp.h>
>>>   @@ -250,16 +251,10 @@ static int 
>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>        * DG1 is special here since it still snoops transactions even 
>>> with
>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>> platforms. We
>>>        * might need to revisit this as we add new discrete platforms.
>>> -     *
>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() 
>>> here since
>>> -     * the underlying sg_table might not even point to struct 
>>> pages, so we
>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>> elsewhere in
>>> -     * the driver.
>>>        */
>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>> -        wbinvd_on_all_cpus();
>>> +        drm_clflush_sg(pages);
>>
>> And as noticed before, drm_clfush_sg still can call 
>> wbinvd_on_all_cpus so are you just punting the issue somewhere else? 
>> How will it be solved there?
>
> I think in this case, drm_clflush_sg() can't be immediately used, 
> because pages may not contain actual page pointers; might be just the 
> dma address. It needs to be preceded with a dmabuf vmap.

Could you elaborate more with using a dmabuf vmap?

Doing a quick grep on drm_clflush_sg, were you thinking about something 
similar to the following?

if (obj->cache_dirty) {
WARN_ON_ONCE(IS_DGFX(i915));
obj->write_domain = 0;
if (i915_gem_object_has_struct_page(obj))
drm_clflush_sg(pages);
obj->cache_dirty = false;
}


Thanks,

Michael Cheng

> But otherwise this change, I figure, falls into the "prefer 
> range-aware apis" category; If the CPU supports it, flush the range 
> only, otherwise fall back to wbinvd().
>
> /Thomas
>
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>>         sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>>>       __i915_gem_object_set_pages(obj, pages, sg_page_sizes);

[-- Attachment #2: Type: text/html, Size: 11908 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-21 15:15                 ` [Intel-gfx] " Thomas Hellström
@ 2022-03-22 10:13                   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-22 10:13 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld


On 21/03/2022 15:15, Thomas Hellström wrote:
> On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
>>
>> On 21/03/2022 13:40, Thomas Hellström wrote:
>>> Hi,
>>>
>>> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>>>> Hi, Tvrtko.
>>>>>>>
>>>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>>>> To align with the discussion in [1][2], this patch
>>>>>>>>> series
>>>>>>>>> drops
>>>>>>>>> all
>>>>>>>>> usage of
>>>>>>>>> wbvind_on_all_cpus within i915 by either replacing the
>>>>>>>>> call
>>>>>>>>> with certain
>>>>>>>>> drm clflush helpers, or reverting to a previous logic.
>>>>>>>>
>>>>>>>> AFAIU, complaint from [1] was that it is wrong to provide
>>>>>>>> non
>>>>>>>> x86
>>>>>>>> implementations under the wbinvd_on_all_cpus name.
>>>>>>>> Instead an
>>>>>>>> arch
>>>>>>>> agnostic helper which achieves the same effect could be
>>>>>>>> created.
>>>>>>>> Does
>>>>>>>> Arm have such concept?
>>>>>>>
>>>>>>> I also understand Linus' email like we shouldn't leak
>>>>>>> incoherent
>>>>>>> IO
>>>>>>> to
>>>>>>> other architectures, meaning any remaining wbinvd()s should
>>>>>>> be
>>>>>>> X86
>>>>>>> only.
>>>>>>
>>>>>> The last part is completely obvious since it is a x86
>>>>>> instruction
>>>>>> name.
>>>>>
>>>>> Yeah, I meant the function implementing wbinvd() semantics.
>>>>>
>>>>>>
>>>>>> But I think we can't pick a solution until we know how the
>>>>>> concept
>>>>>> maps
>>>>>> to Arm and that will also include seeing how the
>>>>>> drm_clflush_sg for
>>>>>> Arm
>>>>>> would look. Is there a range based solution, or just a big
>>>>>> hammer
>>>>>> there.
>>>>>> If the latter, then it is no good to churn all these reverts
>>>>>> but
>>>>>> instead
>>>>>> an arch agnostic wrapper, with a generic name, would be the
>>>>>> way to
>>>>>> go.
>>>>>
>>>>> But my impression was that ARM would not need the range-based
>>>>> interface
>>>>> either, because ARM is only for discrete and with discrete
>>>>> we're
>>>>> always
>>>>> coherent.
>>>>
>>>> Not sure what you mean here - what about flushing system memory
>>>> objects
>>>> on discrete? Those still need flushing on paths like suspend
>>>> which this
>>>> series touches. Am I missing something?
>>>
>>> System bos on discrete should always have
>>>
>>> I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE
>>>
>>> either by the gpu being fully cache coherent (or us mapping system
>>> write-combined). Hence no need for cache clflushes or wbinvd() for
>>> incoherent IO.
>>
>> Hmm so you are talking about the shmem ttm backend. It ends up
>> depending on the result of i915_ttm_cache_level, yes? It cannot end
>> up with I915_CACHE_NONE from that function?
> 
> If the object is allocated with allowable placement in either LMEM or
> SYSTEM, and it ends in system, it gets allocated with I915_CACHE_NONE,
> but then the shmem ttm backend isn't used but TTM's wc pools, and the
> object should *always* be mapped wc. Even in system.

I am not familiar with neither TTM backend or wc pools so maybe a missed 
question - if obj->cache_level can be set to none, and 
obj->cache_coherency to zero, then during object lifetime helpers which 
consult those fields (like i915_gem_cpu_write_needs_clflush, 
__start_cpu_write, etc) are giving out incorrect answers? That is, it is 
irrelevant that they would say flushes are required, since in actuality 
those objects can never ever and from anywhere be mapped other than WC 
so flushes aren't actually required?

>> I also found in i915_drm.h:
>>
>>           * As caching mode when specifying `I915_MMAP_OFFSET_FIXED`,
>> WC or WB will
>>           * be used, depending on the object placement on creation. WB
>> will be used
>>           * when the object can only exist in system memory, WC
>> otherwise.
>>
>> If what you say is true, that on discrete it is _always_ WC, then
>> that needs updating as well.
> 
> If an object is allocated as system only, then it is mapped WB, and
> we're relying on the gpu being cache coherent to avoid clflushes. Same
> is actually currently true if the object happens to be accessed by the
> cpu while evicted. Might need an update for that.

Hmm okay, I think I actually misunderstood something here. I think the 
reason for difference bbtween smem+lmem object which happens to be in 
smem and smem only object is eluding me.

>>>
>>> That's adhering to Linus'
>>>
>>> "And I sincerely hope to the gods that no cache-incoherent i915
>>> mess
>>> ever makes it out of the x86 world. Incoherent IO was always a
>>> historical mistake and should never ever happen again, so we should
>>> not spread that horrific pattern around."
>>
>> Sure, but I was not talking about IO - just the CPU side access to
>> CPU side objects.
> 
> OK, I was under the impression that clflushes() and wbinvd()s in i915
> was only ever used to make data visible to non-snooping GPUs.
> 
> Do you mean that there are other uses as well? Agreed the wb cache
> flush on on suspend only if gpu is !I915_BO_CACHE_COHERENT_FOR_READ?
> looks to not fit this pattern completely.

Don't know, I was first trying to understand handling of the 
obj->cache_coherent as discussed in the first quote block. Are the flags 
consistently set and how the Arm low level code will look.

> Otherwise, for architectures where memory isn't always fully coherent
> with the cpu cache, I'd expect them to use the apis in
> asm/cacheflush.h, like flush_cache_range() and similar, which are nops
> on x86.

Hm do you know why there are no-ops? Like why wouldn't they map to clflush?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-22 10:13                   ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-22 10:13 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld


On 21/03/2022 15:15, Thomas Hellström wrote:
> On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
>>
>> On 21/03/2022 13:40, Thomas Hellström wrote:
>>> Hi,
>>>
>>> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>>>> Hi, Tvrtko.
>>>>>>>
>>>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>>>> To align with the discussion in [1][2], this patch
>>>>>>>>> series
>>>>>>>>> drops
>>>>>>>>> all
>>>>>>>>> usage of
>>>>>>>>> wbvind_on_all_cpus within i915 by either replacing the
>>>>>>>>> call
>>>>>>>>> with certain
>>>>>>>>> drm clflush helpers, or reverting to a previous logic.
>>>>>>>>
>>>>>>>> AFAIU, complaint from [1] was that it is wrong to provide
>>>>>>>> non
>>>>>>>> x86
>>>>>>>> implementations under the wbinvd_on_all_cpus name.
>>>>>>>> Instead an
>>>>>>>> arch
>>>>>>>> agnostic helper which achieves the same effect could be
>>>>>>>> created.
>>>>>>>> Does
>>>>>>>> Arm have such concept?
>>>>>>>
>>>>>>> I also understand Linus' email like we shouldn't leak
>>>>>>> incoherent
>>>>>>> IO
>>>>>>> to
>>>>>>> other architectures, meaning any remaining wbinvd()s should
>>>>>>> be
>>>>>>> X86
>>>>>>> only.
>>>>>>
>>>>>> The last part is completely obvious since it is a x86
>>>>>> instruction
>>>>>> name.
>>>>>
>>>>> Yeah, I meant the function implementing wbinvd() semantics.
>>>>>
>>>>>>
>>>>>> But I think we can't pick a solution until we know how the
>>>>>> concept
>>>>>> maps
>>>>>> to Arm and that will also include seeing how the
>>>>>> drm_clflush_sg for
>>>>>> Arm
>>>>>> would look. Is there a range based solution, or just a big
>>>>>> hammer
>>>>>> there.
>>>>>> If the latter, then it is no good to churn all these reverts
>>>>>> but
>>>>>> instead
>>>>>> an arch agnostic wrapper, with a generic name, would be the
>>>>>> way to
>>>>>> go.
>>>>>
>>>>> But my impression was that ARM would not need the range-based
>>>>> interface
>>>>> either, because ARM is only for discrete and with discrete
>>>>> we're
>>>>> always
>>>>> coherent.
>>>>
>>>> Not sure what you mean here - what about flushing system memory
>>>> objects
>>>> on discrete? Those still need flushing on paths like suspend
>>>> which this
>>>> series touches. Am I missing something?
>>>
>>> System bos on discrete should always have
>>>
>>> I915_BO_CACHE_COHERENT_FOR_READ | I915_BO_CACHE_COHERENT_FOR_WRITE
>>>
>>> either by the gpu being fully cache coherent (or us mapping system
>>> write-combined). Hence no need for cache clflushes or wbinvd() for
>>> incoherent IO.
>>
>> Hmm so you are talking about the shmem ttm backend. It ends up
>> depending on the result of i915_ttm_cache_level, yes? It cannot end
>> up with I915_CACHE_NONE from that function?
> 
> If the object is allocated with allowable placement in either LMEM or
> SYSTEM, and it ends in system, it gets allocated with I915_CACHE_NONE,
> but then the shmem ttm backend isn't used but TTM's wc pools, and the
> object should *always* be mapped wc. Even in system.

I am not familiar with neither TTM backend or wc pools so maybe a missed 
question - if obj->cache_level can be set to none, and 
obj->cache_coherency to zero, then during object lifetime helpers which 
consult those fields (like i915_gem_cpu_write_needs_clflush, 
__start_cpu_write, etc) are giving out incorrect answers? That is, it is 
irrelevant that they would say flushes are required, since in actuality 
those objects can never ever and from anywhere be mapped other than WC 
so flushes aren't actually required?

>> I also found in i915_drm.h:
>>
>>           * As caching mode when specifying `I915_MMAP_OFFSET_FIXED`,
>> WC or WB will
>>           * be used, depending on the object placement on creation. WB
>> will be used
>>           * when the object can only exist in system memory, WC
>> otherwise.
>>
>> If what you say is true, that on discrete it is _always_ WC, then
>> that needs updating as well.
> 
> If an object is allocated as system only, then it is mapped WB, and
> we're relying on the gpu being cache coherent to avoid clflushes. Same
> is actually currently true if the object happens to be accessed by the
> cpu while evicted. Might need an update for that.

Hmm okay, I think I actually misunderstood something here. I think the 
reason for difference bbtween smem+lmem object which happens to be in 
smem and smem only object is eluding me.

>>>
>>> That's adhering to Linus'
>>>
>>> "And I sincerely hope to the gods that no cache-incoherent i915
>>> mess
>>> ever makes it out of the x86 world. Incoherent IO was always a
>>> historical mistake and should never ever happen again, so we should
>>> not spread that horrific pattern around."
>>
>> Sure, but I was not talking about IO - just the CPU side access to
>> CPU side objects.
> 
> OK, I was under the impression that clflushes() and wbinvd()s in i915
> was only ever used to make data visible to non-snooping GPUs.
> 
> Do you mean that there are other uses as well? Agreed the wb cache
> flush on on suspend only if gpu is !I915_BO_CACHE_COHERENT_FOR_READ?
> looks to not fit this pattern completely.

Don't know, I was first trying to understand handling of the 
obj->cache_coherent as discussed in the first quote block. Are the flags 
consistently set and how the Arm low level code will look.

> Otherwise, for architectures where memory isn't always fully coherent
> with the cpu cache, I'd expect them to use the apis in
> asm/cacheflush.h, like flush_cache_range() and similar, which are nops
> on x86.

Hm do you know why there are no-ops? Like why wouldn't they map to clflush?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-22 10:13                   ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-22 10:26                     ` Thomas Hellström
  -1 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-22 10:26 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld

On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
> 
> On 21/03/2022 15:15, Thomas Hellström wrote:
> > On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 21/03/2022 13:40, Thomas Hellström wrote:
> > > > Hi,
> > > > 
> > > > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 21/03/2022 12:33, Thomas Hellström wrote:
> > > > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> > > > > > > 
> > > > > > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > > > > > Hi, Tvrtko.
> > > > > > > > 
> > > > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > > > > > 
> > > > > > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > > > > > To align with the discussion in [1][2], this patch
> > > > > > > > > > series
> > > > > > > > > > drops
> > > > > > > > > > all
> > > > > > > > > > usage of
> > > > > > > > > > wbvind_on_all_cpus within i915 by either replacing
> > > > > > > > > > the
> > > > > > > > > > call
> > > > > > > > > > with certain
> > > > > > > > > > drm clflush helpers, or reverting to a previous
> > > > > > > > > > logic.
> > > > > > > > > 
> > > > > > > > > AFAIU, complaint from [1] was that it is wrong to
> > > > > > > > > provide
> > > > > > > > > non
> > > > > > > > > x86
> > > > > > > > > implementations under the wbinvd_on_all_cpus name.
> > > > > > > > > Instead an
> > > > > > > > > arch
> > > > > > > > > agnostic helper which achieves the same effect could
> > > > > > > > > be
> > > > > > > > > created.
> > > > > > > > > Does
> > > > > > > > > Arm have such concept?
> > > > > > > > 
> > > > > > > > I also understand Linus' email like we shouldn't leak
> > > > > > > > incoherent
> > > > > > > > IO
> > > > > > > > to
> > > > > > > > other architectures, meaning any remaining wbinvd()s
> > > > > > > > should
> > > > > > > > be
> > > > > > > > X86
> > > > > > > > only.
> > > > > > > 
> > > > > > > The last part is completely obvious since it is a x86
> > > > > > > instruction
> > > > > > > name.
> > > > > > 
> > > > > > Yeah, I meant the function implementing wbinvd() semantics.
> > > > > > 
> > > > > > > 
> > > > > > > But I think we can't pick a solution until we know how
> > > > > > > the
> > > > > > > concept
> > > > > > > maps
> > > > > > > to Arm and that will also include seeing how the
> > > > > > > drm_clflush_sg for
> > > > > > > Arm
> > > > > > > would look. Is there a range based solution, or just a
> > > > > > > big
> > > > > > > hammer
> > > > > > > there.
> > > > > > > If the latter, then it is no good to churn all these
> > > > > > > reverts
> > > > > > > but
> > > > > > > instead
> > > > > > > an arch agnostic wrapper, with a generic name, would be
> > > > > > > the
> > > > > > > way to
> > > > > > > go.
> > > > > > 
> > > > > > But my impression was that ARM would not need the range-
> > > > > > based
> > > > > > interface
> > > > > > either, because ARM is only for discrete and with discrete
> > > > > > we're
> > > > > > always
> > > > > > coherent.
> > > > > 
> > > > > Not sure what you mean here - what about flushing system
> > > > > memory
> > > > > objects
> > > > > on discrete? Those still need flushing on paths like suspend
> > > > > which this
> > > > > series touches. Am I missing something?
> > > > 
> > > > System bos on discrete should always have
> > > > 
> > > > I915_BO_CACHE_COHERENT_FOR_READ |
> > > > I915_BO_CACHE_COHERENT_FOR_WRITE
> > > > 
> > > > either by the gpu being fully cache coherent (or us mapping
> > > > system
> > > > write-combined). Hence no need for cache clflushes or wbinvd()
> > > > for
> > > > incoherent IO.
> > > 
> > > Hmm so you are talking about the shmem ttm backend. It ends up
> > > depending on the result of i915_ttm_cache_level, yes? It cannot
> > > end
> > > up with I915_CACHE_NONE from that function?
> > 
> > If the object is allocated with allowable placement in either LMEM
> > or
> > SYSTEM, and it ends in system, it gets allocated with
> > I915_CACHE_NONE,
> > but then the shmem ttm backend isn't used but TTM's wc pools, and
> > the
> > object should *always* be mapped wc. Even in system.
> 
> I am not familiar with neither TTM backend or wc pools so maybe a
> missed 
> question - if obj->cache_level can be set to none, and 
> obj->cache_coherency to zero, then during object lifetime helpers
> which 
> consult those fields (like i915_gem_cpu_write_needs_clflush, 
> __start_cpu_write, etc) are giving out incorrect answers? That is, it
> is 
> irrelevant that they would say flushes are required, since in
> actuality 
> those objects can never ever and from anywhere be mapped other than
> WC 
> so flushes aren't actually required?

If we map other than WC somewhere in these situations, that should be a
bug needing a fix. It might be that some of these helpers that you
mention might still flag that a clflush is needed, and in that case
that's an oversight that also needs fixing.

> 
> > > I also found in i915_drm.h:
> > > 
> > >           * As caching mode when specifying
> > > `I915_MMAP_OFFSET_FIXED`,
> > > WC or WB will
> > >           * be used, depending on the object placement on
> > > creation. WB
> > > will be used
> > >           * when the object can only exist in system memory, WC
> > > otherwise.
> > > 
> > > If what you say is true, that on discrete it is _always_ WC, then
> > > that needs updating as well.
> > 
> > If an object is allocated as system only, then it is mapped WB, and
> > we're relying on the gpu being cache coherent to avoid clflushes.
> > Same
> > is actually currently true if the object happens to be accessed by
> > the
> > cpu while evicted. Might need an update for that.
> 
> Hmm okay, I think I actually misunderstood something here. I think
> the 
> reason for difference bbtween smem+lmem object which happens to be in
> smem and smem only object is eluding me.
> 
> > > > 
> > > > That's adhering to Linus'
> > > > 
> > > > "And I sincerely hope to the gods that no cache-incoherent i915
> > > > mess
> > > > ever makes it out of the x86 world. Incoherent IO was always a
> > > > historical mistake and should never ever happen again, so we
> > > > should
> > > > not spread that horrific pattern around."
> > > 
> > > Sure, but I was not talking about IO - just the CPU side access
> > > to
> > > CPU side objects.
> > 
> > OK, I was under the impression that clflushes() and wbinvd()s in
> > i915
> > was only ever used to make data visible to non-snooping GPUs.
> > 
> > Do you mean that there are other uses as well? Agreed the wb cache
> > flush on on suspend only if gpu is
> > !I915_BO_CACHE_COHERENT_FOR_READ?
> > looks to not fit this pattern completely.
> 
> Don't know, I was first trying to understand handling of the 
> obj->cache_coherent as discussed in the first quote block. Are the
> flags 
> consistently set and how the Arm low level code will look.
> 
> > Otherwise, for architectures where memory isn't always fully
> > coherent
> > with the cpu cache, I'd expect them to use the apis in
> > asm/cacheflush.h, like flush_cache_range() and similar, which are
> > nops
> > on x86.
> 
> Hm do you know why there are no-ops? Like why wouldn't they map to
> clflush?

I think it mostly boils down to the PIPT caches on x86. Everything is
assumed to be coherent. Whereas some architextures keep different cache
entries for different virtual addresses even if the physical page is
the same...

clflushes and wbinvds on x86 are for odd arch-specific situations
where, for example where we change caching attributes of the linear
kernel map mappings.

/Thomas


> 
> Regards,
> 
> Tvrtko



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-22 10:26                     ` Thomas Hellström
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-22 10:26 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld

On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
> 
> On 21/03/2022 15:15, Thomas Hellström wrote:
> > On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 21/03/2022 13:40, Thomas Hellström wrote:
> > > > Hi,
> > > > 
> > > > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 21/03/2022 12:33, Thomas Hellström wrote:
> > > > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> > > > > > > 
> > > > > > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > > > > > Hi, Tvrtko.
> > > > > > > > 
> > > > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > > > > > 
> > > > > > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > > > > > To align with the discussion in [1][2], this patch
> > > > > > > > > > series
> > > > > > > > > > drops
> > > > > > > > > > all
> > > > > > > > > > usage of
> > > > > > > > > > wbvind_on_all_cpus within i915 by either replacing
> > > > > > > > > > the
> > > > > > > > > > call
> > > > > > > > > > with certain
> > > > > > > > > > drm clflush helpers, or reverting to a previous
> > > > > > > > > > logic.
> > > > > > > > > 
> > > > > > > > > AFAIU, complaint from [1] was that it is wrong to
> > > > > > > > > provide
> > > > > > > > > non
> > > > > > > > > x86
> > > > > > > > > implementations under the wbinvd_on_all_cpus name.
> > > > > > > > > Instead an
> > > > > > > > > arch
> > > > > > > > > agnostic helper which achieves the same effect could
> > > > > > > > > be
> > > > > > > > > created.
> > > > > > > > > Does
> > > > > > > > > Arm have such concept?
> > > > > > > > 
> > > > > > > > I also understand Linus' email like we shouldn't leak
> > > > > > > > incoherent
> > > > > > > > IO
> > > > > > > > to
> > > > > > > > other architectures, meaning any remaining wbinvd()s
> > > > > > > > should
> > > > > > > > be
> > > > > > > > X86
> > > > > > > > only.
> > > > > > > 
> > > > > > > The last part is completely obvious since it is a x86
> > > > > > > instruction
> > > > > > > name.
> > > > > > 
> > > > > > Yeah, I meant the function implementing wbinvd() semantics.
> > > > > > 
> > > > > > > 
> > > > > > > But I think we can't pick a solution until we know how
> > > > > > > the
> > > > > > > concept
> > > > > > > maps
> > > > > > > to Arm and that will also include seeing how the
> > > > > > > drm_clflush_sg for
> > > > > > > Arm
> > > > > > > would look. Is there a range based solution, or just a
> > > > > > > big
> > > > > > > hammer
> > > > > > > there.
> > > > > > > If the latter, then it is no good to churn all these
> > > > > > > reverts
> > > > > > > but
> > > > > > > instead
> > > > > > > an arch agnostic wrapper, with a generic name, would be
> > > > > > > the
> > > > > > > way to
> > > > > > > go.
> > > > > > 
> > > > > > But my impression was that ARM would not need the range-
> > > > > > based
> > > > > > interface
> > > > > > either, because ARM is only for discrete and with discrete
> > > > > > we're
> > > > > > always
> > > > > > coherent.
> > > > > 
> > > > > Not sure what you mean here - what about flushing system
> > > > > memory
> > > > > objects
> > > > > on discrete? Those still need flushing on paths like suspend
> > > > > which this
> > > > > series touches. Am I missing something?
> > > > 
> > > > System bos on discrete should always have
> > > > 
> > > > I915_BO_CACHE_COHERENT_FOR_READ |
> > > > I915_BO_CACHE_COHERENT_FOR_WRITE
> > > > 
> > > > either by the gpu being fully cache coherent (or us mapping
> > > > system
> > > > write-combined). Hence no need for cache clflushes or wbinvd()
> > > > for
> > > > incoherent IO.
> > > 
> > > Hmm so you are talking about the shmem ttm backend. It ends up
> > > depending on the result of i915_ttm_cache_level, yes? It cannot
> > > end
> > > up with I915_CACHE_NONE from that function?
> > 
> > If the object is allocated with allowable placement in either LMEM
> > or
> > SYSTEM, and it ends in system, it gets allocated with
> > I915_CACHE_NONE,
> > but then the shmem ttm backend isn't used but TTM's wc pools, and
> > the
> > object should *always* be mapped wc. Even in system.
> 
> I am not familiar with neither TTM backend or wc pools so maybe a
> missed 
> question - if obj->cache_level can be set to none, and 
> obj->cache_coherency to zero, then during object lifetime helpers
> which 
> consult those fields (like i915_gem_cpu_write_needs_clflush, 
> __start_cpu_write, etc) are giving out incorrect answers? That is, it
> is 
> irrelevant that they would say flushes are required, since in
> actuality 
> those objects can never ever and from anywhere be mapped other than
> WC 
> so flushes aren't actually required?

If we map other than WC somewhere in these situations, that should be a
bug needing a fix. It might be that some of these helpers that you
mention might still flag that a clflush is needed, and in that case
that's an oversight that also needs fixing.

> 
> > > I also found in i915_drm.h:
> > > 
> > >           * As caching mode when specifying
> > > `I915_MMAP_OFFSET_FIXED`,
> > > WC or WB will
> > >           * be used, depending on the object placement on
> > > creation. WB
> > > will be used
> > >           * when the object can only exist in system memory, WC
> > > otherwise.
> > > 
> > > If what you say is true, that on discrete it is _always_ WC, then
> > > that needs updating as well.
> > 
> > If an object is allocated as system only, then it is mapped WB, and
> > we're relying on the gpu being cache coherent to avoid clflushes.
> > Same
> > is actually currently true if the object happens to be accessed by
> > the
> > cpu while evicted. Might need an update for that.
> 
> Hmm okay, I think I actually misunderstood something here. I think
> the 
> reason for difference bbtween smem+lmem object which happens to be in
> smem and smem only object is eluding me.
> 
> > > > 
> > > > That's adhering to Linus'
> > > > 
> > > > "And I sincerely hope to the gods that no cache-incoherent i915
> > > > mess
> > > > ever makes it out of the x86 world. Incoherent IO was always a
> > > > historical mistake and should never ever happen again, so we
> > > > should
> > > > not spread that horrific pattern around."
> > > 
> > > Sure, but I was not talking about IO - just the CPU side access
> > > to
> > > CPU side objects.
> > 
> > OK, I was under the impression that clflushes() and wbinvd()s in
> > i915
> > was only ever used to make data visible to non-snooping GPUs.
> > 
> > Do you mean that there are other uses as well? Agreed the wb cache
> > flush on on suspend only if gpu is
> > !I915_BO_CACHE_COHERENT_FOR_READ?
> > looks to not fit this pattern completely.
> 
> Don't know, I was first trying to understand handling of the 
> obj->cache_coherent as discussed in the first quote block. Are the
> flags 
> consistently set and how the Arm low level code will look.
> 
> > Otherwise, for architectures where memory isn't always fully
> > coherent
> > with the cpu cache, I'd expect them to use the apis in
> > asm/cacheflush.h, like flush_cache_range() and similar, which are
> > nops
> > on x86.
> 
> Hm do you know why there are no-ops? Like why wouldn't they map to
> clflush?

I think it mostly boils down to the PIPT caches on x86. Everything is
assumed to be coherent. Whereas some architextures keep different cache
entries for different virtual addresses even if the physical page is
the same...

clflushes and wbinvds on x86 are for odd arch-specific situations
where, for example where we change caching attributes of the linear
kernel map mappings.

/Thomas


> 
> Regards,
> 
> Tvrtko



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-22 10:26                     ` [Intel-gfx] " Thomas Hellström
@ 2022-03-22 10:41                       ` Thomas Hellström
  -1 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-22 10:41 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld

On Tue, 2022-03-22 at 11:26 +0100, Thomas Hellström wrote:
> On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
> > 
> > On 21/03/2022 15:15, Thomas Hellström wrote:
> > > On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
> > > > 
> > > > On 21/03/2022 13:40, Thomas Hellström wrote:
> > > > > Hi,
> > > > > 
> > > > > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> > > > > > 
> > > > > > On 21/03/2022 12:33, Thomas Hellström wrote:
> > > > > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> > > > > > > > 
> > > > > > > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > > > > > > Hi, Tvrtko.
> > > > > > > > > 
> > > > > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > > > > > > 
> > > > > > > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > > > > > > To align with the discussion in [1][2], this
> > > > > > > > > > > patch
> > > > > > > > > > > series
> > > > > > > > > > > drops
> > > > > > > > > > > all
> > > > > > > > > > > usage of
> > > > > > > > > > > wbvind_on_all_cpus within i915 by either
> > > > > > > > > > > replacing
> > > > > > > > > > > the
> > > > > > > > > > > call
> > > > > > > > > > > with certain
> > > > > > > > > > > drm clflush helpers, or reverting to a previous
> > > > > > > > > > > logic.
> > > > > > > > > > 
> > > > > > > > > > AFAIU, complaint from [1] was that it is wrong to
> > > > > > > > > > provide
> > > > > > > > > > non
> > > > > > > > > > x86
> > > > > > > > > > implementations under the wbinvd_on_all_cpus name.
> > > > > > > > > > Instead an
> > > > > > > > > > arch
> > > > > > > > > > agnostic helper which achieves the same effect
> > > > > > > > > > could
> > > > > > > > > > be
> > > > > > > > > > created.
> > > > > > > > > > Does
> > > > > > > > > > Arm have such concept?
> > > > > > > > > 
> > > > > > > > > I also understand Linus' email like we shouldn't leak
> > > > > > > > > incoherent
> > > > > > > > > IO
> > > > > > > > > to
> > > > > > > > > other architectures, meaning any remaining wbinvd()s
> > > > > > > > > should
> > > > > > > > > be
> > > > > > > > > X86
> > > > > > > > > only.
> > > > > > > > 
> > > > > > > > The last part is completely obvious since it is a x86
> > > > > > > > instruction
> > > > > > > > name.
> > > > > > > 
> > > > > > > Yeah, I meant the function implementing wbinvd()
> > > > > > > semantics.
> > > > > > > 
> > > > > > > > 
> > > > > > > > But I think we can't pick a solution until we know how
> > > > > > > > the
> > > > > > > > concept
> > > > > > > > maps
> > > > > > > > to Arm and that will also include seeing how the
> > > > > > > > drm_clflush_sg for
> > > > > > > > Arm
> > > > > > > > would look. Is there a range based solution, or just a
> > > > > > > > big
> > > > > > > > hammer
> > > > > > > > there.
> > > > > > > > If the latter, then it is no good to churn all these
> > > > > > > > reverts
> > > > > > > > but
> > > > > > > > instead
> > > > > > > > an arch agnostic wrapper, with a generic name, would be
> > > > > > > > the
> > > > > > > > way to
> > > > > > > > go.
> > > > > > > 
> > > > > > > But my impression was that ARM would not need the range-
> > > > > > > based
> > > > > > > interface
> > > > > > > either, because ARM is only for discrete and with
> > > > > > > discrete
> > > > > > > we're
> > > > > > > always
> > > > > > > coherent.
> > > > > > 
> > > > > > Not sure what you mean here - what about flushing system
> > > > > > memory
> > > > > > objects
> > > > > > on discrete? Those still need flushing on paths like
> > > > > > suspend
> > > > > > which this
> > > > > > series touches. Am I missing something?
> > > > > 
> > > > > System bos on discrete should always have
> > > > > 
> > > > > I915_BO_CACHE_COHERENT_FOR_READ |
> > > > > I915_BO_CACHE_COHERENT_FOR_WRITE
> > > > > 
> > > > > either by the gpu being fully cache coherent (or us mapping
> > > > > system
> > > > > write-combined). Hence no need for cache clflushes or
> > > > > wbinvd()
> > > > > for
> > > > > incoherent IO.
> > > > 
> > > > Hmm so you are talking about the shmem ttm backend. It ends up
> > > > depending on the result of i915_ttm_cache_level, yes? It cannot
> > > > end
> > > > up with I915_CACHE_NONE from that function?
> > > 
> > > If the object is allocated with allowable placement in either
> > > LMEM
> > > or
> > > SYSTEM, and it ends in system, it gets allocated with
> > > I915_CACHE_NONE,
> > > but then the shmem ttm backend isn't used but TTM's wc pools, and
> > > the
> > > object should *always* be mapped wc. Even in system.
> > 
> > I am not familiar with neither TTM backend or wc pools so maybe a
> > missed 
> > question - if obj->cache_level can be set to none, and 
> > obj->cache_coherency to zero, then during object lifetime helpers
> > which 
> > consult those fields (like i915_gem_cpu_write_needs_clflush, 
> > __start_cpu_write, etc) are giving out incorrect answers? That is,
> > it
> > is 
> > irrelevant that they would say flushes are required, since in
> > actuality 
> > those objects can never ever and from anywhere be mapped other than
> > WC 
> > so flushes aren't actually required?
> 
> If we map other than WC somewhere in these situations, that should be
> a
> bug needing a fix. It might be that some of these helpers that you
> mention might still flag that a clflush is needed, and in that case
> that's an oversight that also needs fixing.

Actually, it seems like most of these has a IS_DGFX() in them, in
particular i915_gem_clflush_object(), but it looks like some sort of
cleanup might be needed here. In particular we might want to introduce
an IS_COHERENT() in case we change the api at some point also for
integrated.

/Thomas



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-22 10:41                       ` Thomas Hellström
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-22 10:41 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld

On Tue, 2022-03-22 at 11:26 +0100, Thomas Hellström wrote:
> On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
> > 
> > On 21/03/2022 15:15, Thomas Hellström wrote:
> > > On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
> > > > 
> > > > On 21/03/2022 13:40, Thomas Hellström wrote:
> > > > > Hi,
> > > > > 
> > > > > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> > > > > > 
> > > > > > On 21/03/2022 12:33, Thomas Hellström wrote:
> > > > > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
> > > > > > > > 
> > > > > > > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > > > > > > Hi, Tvrtko.
> > > > > > > > > 
> > > > > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > > > > > > 
> > > > > > > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > > > > > > To align with the discussion in [1][2], this
> > > > > > > > > > > patch
> > > > > > > > > > > series
> > > > > > > > > > > drops
> > > > > > > > > > > all
> > > > > > > > > > > usage of
> > > > > > > > > > > wbvind_on_all_cpus within i915 by either
> > > > > > > > > > > replacing
> > > > > > > > > > > the
> > > > > > > > > > > call
> > > > > > > > > > > with certain
> > > > > > > > > > > drm clflush helpers, or reverting to a previous
> > > > > > > > > > > logic.
> > > > > > > > > > 
> > > > > > > > > > AFAIU, complaint from [1] was that it is wrong to
> > > > > > > > > > provide
> > > > > > > > > > non
> > > > > > > > > > x86
> > > > > > > > > > implementations under the wbinvd_on_all_cpus name.
> > > > > > > > > > Instead an
> > > > > > > > > > arch
> > > > > > > > > > agnostic helper which achieves the same effect
> > > > > > > > > > could
> > > > > > > > > > be
> > > > > > > > > > created.
> > > > > > > > > > Does
> > > > > > > > > > Arm have such concept?
> > > > > > > > > 
> > > > > > > > > I also understand Linus' email like we shouldn't leak
> > > > > > > > > incoherent
> > > > > > > > > IO
> > > > > > > > > to
> > > > > > > > > other architectures, meaning any remaining wbinvd()s
> > > > > > > > > should
> > > > > > > > > be
> > > > > > > > > X86
> > > > > > > > > only.
> > > > > > > > 
> > > > > > > > The last part is completely obvious since it is a x86
> > > > > > > > instruction
> > > > > > > > name.
> > > > > > > 
> > > > > > > Yeah, I meant the function implementing wbinvd()
> > > > > > > semantics.
> > > > > > > 
> > > > > > > > 
> > > > > > > > But I think we can't pick a solution until we know how
> > > > > > > > the
> > > > > > > > concept
> > > > > > > > maps
> > > > > > > > to Arm and that will also include seeing how the
> > > > > > > > drm_clflush_sg for
> > > > > > > > Arm
> > > > > > > > would look. Is there a range based solution, or just a
> > > > > > > > big
> > > > > > > > hammer
> > > > > > > > there.
> > > > > > > > If the latter, then it is no good to churn all these
> > > > > > > > reverts
> > > > > > > > but
> > > > > > > > instead
> > > > > > > > an arch agnostic wrapper, with a generic name, would be
> > > > > > > > the
> > > > > > > > way to
> > > > > > > > go.
> > > > > > > 
> > > > > > > But my impression was that ARM would not need the range-
> > > > > > > based
> > > > > > > interface
> > > > > > > either, because ARM is only for discrete and with
> > > > > > > discrete
> > > > > > > we're
> > > > > > > always
> > > > > > > coherent.
> > > > > > 
> > > > > > Not sure what you mean here - what about flushing system
> > > > > > memory
> > > > > > objects
> > > > > > on discrete? Those still need flushing on paths like
> > > > > > suspend
> > > > > > which this
> > > > > > series touches. Am I missing something?
> > > > > 
> > > > > System bos on discrete should always have
> > > > > 
> > > > > I915_BO_CACHE_COHERENT_FOR_READ |
> > > > > I915_BO_CACHE_COHERENT_FOR_WRITE
> > > > > 
> > > > > either by the gpu being fully cache coherent (or us mapping
> > > > > system
> > > > > write-combined). Hence no need for cache clflushes or
> > > > > wbinvd()
> > > > > for
> > > > > incoherent IO.
> > > > 
> > > > Hmm so you are talking about the shmem ttm backend. It ends up
> > > > depending on the result of i915_ttm_cache_level, yes? It cannot
> > > > end
> > > > up with I915_CACHE_NONE from that function?
> > > 
> > > If the object is allocated with allowable placement in either
> > > LMEM
> > > or
> > > SYSTEM, and it ends in system, it gets allocated with
> > > I915_CACHE_NONE,
> > > but then the shmem ttm backend isn't used but TTM's wc pools, and
> > > the
> > > object should *always* be mapped wc. Even in system.
> > 
> > I am not familiar with neither TTM backend or wc pools so maybe a
> > missed 
> > question - if obj->cache_level can be set to none, and 
> > obj->cache_coherency to zero, then during object lifetime helpers
> > which 
> > consult those fields (like i915_gem_cpu_write_needs_clflush, 
> > __start_cpu_write, etc) are giving out incorrect answers? That is,
> > it
> > is 
> > irrelevant that they would say flushes are required, since in
> > actuality 
> > those objects can never ever and from anywhere be mapped other than
> > WC 
> > so flushes aren't actually required?
> 
> If we map other than WC somewhere in these situations, that should be
> a
> bug needing a fix. It might be that some of these helpers that you
> mention might still flag that a clflush is needed, and in that case
> that's an oversight that also needs fixing.

Actually, it seems like most of these has a IS_DGFX() in them, in
particular i915_gem_clflush_object(), but it looks like some sort of
cleanup might be needed here. In particular we might want to introduce
an IS_COHERENT() in case we change the api at some point also for
integrated.

/Thomas



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-22 10:26                     ` [Intel-gfx] " Thomas Hellström
@ 2022-03-22 11:20                       ` Tvrtko Ursulin
  -1 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-22 11:20 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld


On 22/03/2022 10:26, Thomas Hellström wrote:
> On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
>>
>> On 21/03/2022 15:15, Thomas Hellström wrote:
>>> On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/03/2022 13:40, Thomas Hellström wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>>>>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>>>>>> Hi, Tvrtko.
>>>>>>>>>
>>>>>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>>>>>
>>>>>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>>>>>> To align with the discussion in [1][2], this patch
>>>>>>>>>>> series
>>>>>>>>>>> drops
>>>>>>>>>>> all
>>>>>>>>>>> usage of
>>>>>>>>>>> wbvind_on_all_cpus within i915 by either replacing
>>>>>>>>>>> the
>>>>>>>>>>> call
>>>>>>>>>>> with certain
>>>>>>>>>>> drm clflush helpers, or reverting to a previous
>>>>>>>>>>> logic.
>>>>>>>>>>
>>>>>>>>>> AFAIU, complaint from [1] was that it is wrong to
>>>>>>>>>> provide
>>>>>>>>>> non
>>>>>>>>>> x86
>>>>>>>>>> implementations under the wbinvd_on_all_cpus name.
>>>>>>>>>> Instead an
>>>>>>>>>> arch
>>>>>>>>>> agnostic helper which achieves the same effect could
>>>>>>>>>> be
>>>>>>>>>> created.
>>>>>>>>>> Does
>>>>>>>>>> Arm have such concept?
>>>>>>>>>
>>>>>>>>> I also understand Linus' email like we shouldn't leak
>>>>>>>>> incoherent
>>>>>>>>> IO
>>>>>>>>> to
>>>>>>>>> other architectures, meaning any remaining wbinvd()s
>>>>>>>>> should
>>>>>>>>> be
>>>>>>>>> X86
>>>>>>>>> only.
>>>>>>>>
>>>>>>>> The last part is completely obvious since it is a x86
>>>>>>>> instruction
>>>>>>>> name.
>>>>>>>
>>>>>>> Yeah, I meant the function implementing wbinvd() semantics.
>>>>>>>
>>>>>>>>
>>>>>>>> But I think we can't pick a solution until we know how
>>>>>>>> the
>>>>>>>> concept
>>>>>>>> maps
>>>>>>>> to Arm and that will also include seeing how the
>>>>>>>> drm_clflush_sg for
>>>>>>>> Arm
>>>>>>>> would look. Is there a range based solution, or just a
>>>>>>>> big
>>>>>>>> hammer
>>>>>>>> there.
>>>>>>>> If the latter, then it is no good to churn all these
>>>>>>>> reverts
>>>>>>>> but
>>>>>>>> instead
>>>>>>>> an arch agnostic wrapper, with a generic name, would be
>>>>>>>> the
>>>>>>>> way to
>>>>>>>> go.
>>>>>>>
>>>>>>> But my impression was that ARM would not need the range-
>>>>>>> based
>>>>>>> interface
>>>>>>> either, because ARM is only for discrete and with discrete
>>>>>>> we're
>>>>>>> always
>>>>>>> coherent.
>>>>>>
>>>>>> Not sure what you mean here - what about flushing system
>>>>>> memory
>>>>>> objects
>>>>>> on discrete? Those still need flushing on paths like suspend
>>>>>> which this
>>>>>> series touches. Am I missing something?
>>>>>
>>>>> System bos on discrete should always have
>>>>>
>>>>> I915_BO_CACHE_COHERENT_FOR_READ |
>>>>> I915_BO_CACHE_COHERENT_FOR_WRITE
>>>>>
>>>>> either by the gpu being fully cache coherent (or us mapping
>>>>> system
>>>>> write-combined). Hence no need for cache clflushes or wbinvd()
>>>>> for
>>>>> incoherent IO.
>>>>
>>>> Hmm so you are talking about the shmem ttm backend. It ends up
>>>> depending on the result of i915_ttm_cache_level, yes? It cannot
>>>> end
>>>> up with I915_CACHE_NONE from that function?
>>>
>>> If the object is allocated with allowable placement in either LMEM
>>> or
>>> SYSTEM, and it ends in system, it gets allocated with
>>> I915_CACHE_NONE,
>>> but then the shmem ttm backend isn't used but TTM's wc pools, and
>>> the
>>> object should *always* be mapped wc. Even in system.
>>
>> I am not familiar with neither TTM backend or wc pools so maybe a
>> missed
>> question - if obj->cache_level can be set to none, and
>> obj->cache_coherency to zero, then during object lifetime helpers
>> which
>> consult those fields (like i915_gem_cpu_write_needs_clflush,
>> __start_cpu_write, etc) are giving out incorrect answers? That is, it
>> is
>> irrelevant that they would say flushes are required, since in
>> actuality
>> those objects can never ever and from anywhere be mapped other than
>> WC
>> so flushes aren't actually required?
> 
> If we map other than WC somewhere in these situations, that should be a
> bug needing a fix. It might be that some of these helpers that you
> mention might still flag that a clflush is needed, and in that case
> that's an oversight that also needs fixing.
> 
>>
>>>> I also found in i915_drm.h:
>>>>
>>>>            * As caching mode when specifying
>>>> `I915_MMAP_OFFSET_FIXED`,
>>>> WC or WB will
>>>>            * be used, depending on the object placement on
>>>> creation. WB
>>>> will be used
>>>>            * when the object can only exist in system memory, WC
>>>> otherwise.
>>>>
>>>> If what you say is true, that on discrete it is _always_ WC, then
>>>> that needs updating as well.
>>>
>>> If an object is allocated as system only, then it is mapped WB, and
>>> we're relying on the gpu being cache coherent to avoid clflushes.
>>> Same
>>> is actually currently true if the object happens to be accessed by
>>> the
>>> cpu while evicted. Might need an update for that.
>>
>> Hmm okay, I think I actually misunderstood something here. I think
>> the
>> reason for difference bbtween smem+lmem object which happens to be in
>> smem and smem only object is eluding me.
>>
>>>>>
>>>>> That's adhering to Linus'
>>>>>
>>>>> "And I sincerely hope to the gods that no cache-incoherent i915
>>>>> mess
>>>>> ever makes it out of the x86 world. Incoherent IO was always a
>>>>> historical mistake and should never ever happen again, so we
>>>>> should
>>>>> not spread that horrific pattern around."
>>>>
>>>> Sure, but I was not talking about IO - just the CPU side access
>>>> to
>>>> CPU side objects.
>>>
>>> OK, I was under the impression that clflushes() and wbinvd()s in
>>> i915
>>> was only ever used to make data visible to non-snooping GPUs.
>>>
>>> Do you mean that there are other uses as well? Agreed the wb cache
>>> flush on on suspend only if gpu is
>>> !I915_BO_CACHE_COHERENT_FOR_READ?
>>> looks to not fit this pattern completely.
>>
>> Don't know, I was first trying to understand handling of the
>> obj->cache_coherent as discussed in the first quote block. Are the
>> flags
>> consistently set and how the Arm low level code will look.
>>
>>> Otherwise, for architectures where memory isn't always fully
>>> coherent
>>> with the cpu cache, I'd expect them to use the apis in
>>> asm/cacheflush.h, like flush_cache_range() and similar, which are
>>> nops
>>> on x86.
>>
>> Hm do you know why there are no-ops? Like why wouldn't they map to
>> clflush?
> 
> I think it mostly boils down to the PIPT caches on x86. Everything is
> assumed to be coherent. Whereas some architextures keep different cache
> entries for different virtual addresses even if the physical page is
> the same...
> 
> clflushes and wbinvds on x86 are for odd arch-specific situations
> where, for example where we change caching attributes of the linear
> kernel map mappings.

So in summary we have flush_cache_range which is generic, not implemented on x86 and works with virtual addresses so not directly usable even if x86 implementation was added.

There is also x86 specific clflush_cache_range which works with virtual addresses as well so no good for drm_clflush_sg.

Question you implicitly raise, correct me if I got it wrong, is whether we should even be trying to extend drm_clflush_sg for Arm, given how most (all?) call sites are not needed on discrete, is that right?

Would that mean we could leave most of the code as is and just replace wbinvd_on_all_cpus with something like i915_flush_cpu_caches, which would then legitimately do nothing, at least on Arm if not also on discrete in general?

If that would work it would make a small and easy to review series. I don't think it would collide with what Linus asked since it is not propagating undesirable things further - given how if there is no actual need to flush then there is no need to make it range based either.

Exception would be the dmabuf get pages patch which needs a proper implementation of a new drm flush helper.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-22 11:20                       ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-22 11:20 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld


On 22/03/2022 10:26, Thomas Hellström wrote:
> On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
>>
>> On 21/03/2022 15:15, Thomas Hellström wrote:
>>> On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/03/2022 13:40, Thomas Hellström wrote:
>>>>> Hi,
>>>>>
>>>>> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>>>>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>>>>>> Hi, Tvrtko.
>>>>>>>>>
>>>>>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>>>>>
>>>>>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>>>>>> To align with the discussion in [1][2], this patch
>>>>>>>>>>> series
>>>>>>>>>>> drops
>>>>>>>>>>> all
>>>>>>>>>>> usage of
>>>>>>>>>>> wbvind_on_all_cpus within i915 by either replacing
>>>>>>>>>>> the
>>>>>>>>>>> call
>>>>>>>>>>> with certain
>>>>>>>>>>> drm clflush helpers, or reverting to a previous
>>>>>>>>>>> logic.
>>>>>>>>>>
>>>>>>>>>> AFAIU, complaint from [1] was that it is wrong to
>>>>>>>>>> provide
>>>>>>>>>> non
>>>>>>>>>> x86
>>>>>>>>>> implementations under the wbinvd_on_all_cpus name.
>>>>>>>>>> Instead an
>>>>>>>>>> arch
>>>>>>>>>> agnostic helper which achieves the same effect could
>>>>>>>>>> be
>>>>>>>>>> created.
>>>>>>>>>> Does
>>>>>>>>>> Arm have such concept?
>>>>>>>>>
>>>>>>>>> I also understand Linus' email like we shouldn't leak
>>>>>>>>> incoherent
>>>>>>>>> IO
>>>>>>>>> to
>>>>>>>>> other architectures, meaning any remaining wbinvd()s
>>>>>>>>> should
>>>>>>>>> be
>>>>>>>>> X86
>>>>>>>>> only.
>>>>>>>>
>>>>>>>> The last part is completely obvious since it is a x86
>>>>>>>> instruction
>>>>>>>> name.
>>>>>>>
>>>>>>> Yeah, I meant the function implementing wbinvd() semantics.
>>>>>>>
>>>>>>>>
>>>>>>>> But I think we can't pick a solution until we know how
>>>>>>>> the
>>>>>>>> concept
>>>>>>>> maps
>>>>>>>> to Arm and that will also include seeing how the
>>>>>>>> drm_clflush_sg for
>>>>>>>> Arm
>>>>>>>> would look. Is there a range based solution, or just a
>>>>>>>> big
>>>>>>>> hammer
>>>>>>>> there.
>>>>>>>> If the latter, then it is no good to churn all these
>>>>>>>> reverts
>>>>>>>> but
>>>>>>>> instead
>>>>>>>> an arch agnostic wrapper, with a generic name, would be
>>>>>>>> the
>>>>>>>> way to
>>>>>>>> go.
>>>>>>>
>>>>>>> But my impression was that ARM would not need the range-
>>>>>>> based
>>>>>>> interface
>>>>>>> either, because ARM is only for discrete and with discrete
>>>>>>> we're
>>>>>>> always
>>>>>>> coherent.
>>>>>>
>>>>>> Not sure what you mean here - what about flushing system
>>>>>> memory
>>>>>> objects
>>>>>> on discrete? Those still need flushing on paths like suspend
>>>>>> which this
>>>>>> series touches. Am I missing something?
>>>>>
>>>>> System bos on discrete should always have
>>>>>
>>>>> I915_BO_CACHE_COHERENT_FOR_READ |
>>>>> I915_BO_CACHE_COHERENT_FOR_WRITE
>>>>>
>>>>> either by the gpu being fully cache coherent (or us mapping
>>>>> system
>>>>> write-combined). Hence no need for cache clflushes or wbinvd()
>>>>> for
>>>>> incoherent IO.
>>>>
>>>> Hmm so you are talking about the shmem ttm backend. It ends up
>>>> depending on the result of i915_ttm_cache_level, yes? It cannot
>>>> end
>>>> up with I915_CACHE_NONE from that function?
>>>
>>> If the object is allocated with allowable placement in either LMEM
>>> or
>>> SYSTEM, and it ends in system, it gets allocated with
>>> I915_CACHE_NONE,
>>> but then the shmem ttm backend isn't used but TTM's wc pools, and
>>> the
>>> object should *always* be mapped wc. Even in system.
>>
>> I am not familiar with neither TTM backend or wc pools so maybe a
>> missed
>> question - if obj->cache_level can be set to none, and
>> obj->cache_coherency to zero, then during object lifetime helpers
>> which
>> consult those fields (like i915_gem_cpu_write_needs_clflush,
>> __start_cpu_write, etc) are giving out incorrect answers? That is, it
>> is
>> irrelevant that they would say flushes are required, since in
>> actuality
>> those objects can never ever and from anywhere be mapped other than
>> WC
>> so flushes aren't actually required?
> 
> If we map other than WC somewhere in these situations, that should be a
> bug needing a fix. It might be that some of these helpers that you
> mention might still flag that a clflush is needed, and in that case
> that's an oversight that also needs fixing.
> 
>>
>>>> I also found in i915_drm.h:
>>>>
>>>>            * As caching mode when specifying
>>>> `I915_MMAP_OFFSET_FIXED`,
>>>> WC or WB will
>>>>            * be used, depending on the object placement on
>>>> creation. WB
>>>> will be used
>>>>            * when the object can only exist in system memory, WC
>>>> otherwise.
>>>>
>>>> If what you say is true, that on discrete it is _always_ WC, then
>>>> that needs updating as well.
>>>
>>> If an object is allocated as system only, then it is mapped WB, and
>>> we're relying on the gpu being cache coherent to avoid clflushes.
>>> Same
>>> is actually currently true if the object happens to be accessed by
>>> the
>>> cpu while evicted. Might need an update for that.
>>
>> Hmm okay, I think I actually misunderstood something here. I think
>> the
>> reason for difference bbtween smem+lmem object which happens to be in
>> smem and smem only object is eluding me.
>>
>>>>>
>>>>> That's adhering to Linus'
>>>>>
>>>>> "And I sincerely hope to the gods that no cache-incoherent i915
>>>>> mess
>>>>> ever makes it out of the x86 world. Incoherent IO was always a
>>>>> historical mistake and should never ever happen again, so we
>>>>> should
>>>>> not spread that horrific pattern around."
>>>>
>>>> Sure, but I was not talking about IO - just the CPU side access
>>>> to
>>>> CPU side objects.
>>>
>>> OK, I was under the impression that clflushes() and wbinvd()s in
>>> i915
>>> was only ever used to make data visible to non-snooping GPUs.
>>>
>>> Do you mean that there are other uses as well? Agreed the wb cache
>>> flush on on suspend only if gpu is
>>> !I915_BO_CACHE_COHERENT_FOR_READ?
>>> looks to not fit this pattern completely.
>>
>> Don't know, I was first trying to understand handling of the
>> obj->cache_coherent as discussed in the first quote block. Are the
>> flags
>> consistently set and how the Arm low level code will look.
>>
>>> Otherwise, for architectures where memory isn't always fully
>>> coherent
>>> with the cpu cache, I'd expect them to use the apis in
>>> asm/cacheflush.h, like flush_cache_range() and similar, which are
>>> nops
>>> on x86.
>>
>> Hm do you know why there are no-ops? Like why wouldn't they map to
>> clflush?
> 
> I think it mostly boils down to the PIPT caches on x86. Everything is
> assumed to be coherent. Whereas some architextures keep different cache
> entries for different virtual addresses even if the physical page is
> the same...
> 
> clflushes and wbinvds on x86 are for odd arch-specific situations
> where, for example where we change caching attributes of the linear
> kernel map mappings.

So in summary we have flush_cache_range which is generic, not implemented on x86 and works with virtual addresses so not directly usable even if x86 implementation was added.

There is also x86 specific clflush_cache_range which works with virtual addresses as well so no good for drm_clflush_sg.

Question you implicitly raise, correct me if I got it wrong, is whether we should even be trying to extend drm_clflush_sg for Arm, given how most (all?) call sites are not needed on discrete, is that right?

Would that mean we could leave most of the code as is and just replace wbinvd_on_all_cpus with something like i915_flush_cpu_caches, which would then legitimately do nothing, at least on Arm if not also on discrete in general?

If that would work it would make a small and easy to review series. I don't think it would collide with what Linus asked since it is not propagating undesirable things further - given how if there is no actual need to flush then there is no need to make it range based either.

Exception would be the dmabuf get pages patch which needs a proper implementation of a new drm flush helper.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-22 11:20                       ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-22 11:37                         ` Thomas Hellström
  -1 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-22 11:37 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld

On Tue, 2022-03-22 at 11:20 +0000, Tvrtko Ursulin wrote:
> 
> On 22/03/2022 10:26, Thomas Hellström wrote:
> > On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 21/03/2022 15:15, Thomas Hellström wrote:
> > > > On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 21/03/2022 13:40, Thomas Hellström wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> > > > > > > 
> > > > > > > On 21/03/2022 12:33, Thomas Hellström wrote:
> > > > > > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin
> > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > > > > > > > Hi, Tvrtko.
> > > > > > > > > > 
> > > > > > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > > > > > > > 
> > > > > > > > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > > > > > > > To align with the discussion in [1][2], this
> > > > > > > > > > > > patch
> > > > > > > > > > > > series
> > > > > > > > > > > > drops
> > > > > > > > > > > > all
> > > > > > > > > > > > usage of
> > > > > > > > > > > > wbvind_on_all_cpus within i915 by either
> > > > > > > > > > > > replacing
> > > > > > > > > > > > the
> > > > > > > > > > > > call
> > > > > > > > > > > > with certain
> > > > > > > > > > > > drm clflush helpers, or reverting to a previous
> > > > > > > > > > > > logic.
> > > > > > > > > > > 
> > > > > > > > > > > AFAIU, complaint from [1] was that it is wrong to
> > > > > > > > > > > provide
> > > > > > > > > > > non
> > > > > > > > > > > x86
> > > > > > > > > > > implementations under the wbinvd_on_all_cpus
> > > > > > > > > > > name.
> > > > > > > > > > > Instead an
> > > > > > > > > > > arch
> > > > > > > > > > > agnostic helper which achieves the same effect
> > > > > > > > > > > could
> > > > > > > > > > > be
> > > > > > > > > > > created.
> > > > > > > > > > > Does
> > > > > > > > > > > Arm have such concept?
> > > > > > > > > > 
> > > > > > > > > > I also understand Linus' email like we shouldn't
> > > > > > > > > > leak
> > > > > > > > > > incoherent
> > > > > > > > > > IO
> > > > > > > > > > to
> > > > > > > > > > other architectures, meaning any remaining
> > > > > > > > > > wbinvd()s
> > > > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > X86
> > > > > > > > > > only.
> > > > > > > > > 
> > > > > > > > > The last part is completely obvious since it is a x86
> > > > > > > > > instruction
> > > > > > > > > name.
> > > > > > > > 
> > > > > > > > Yeah, I meant the function implementing wbinvd()
> > > > > > > > semantics.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > But I think we can't pick a solution until we know
> > > > > > > > > how
> > > > > > > > > the
> > > > > > > > > concept
> > > > > > > > > maps
> > > > > > > > > to Arm and that will also include seeing how the
> > > > > > > > > drm_clflush_sg for
> > > > > > > > > Arm
> > > > > > > > > would look. Is there a range based solution, or just
> > > > > > > > > a
> > > > > > > > > big
> > > > > > > > > hammer
> > > > > > > > > there.
> > > > > > > > > If the latter, then it is no good to churn all these
> > > > > > > > > reverts
> > > > > > > > > but
> > > > > > > > > instead
> > > > > > > > > an arch agnostic wrapper, with a generic name, would
> > > > > > > > > be
> > > > > > > > > the
> > > > > > > > > way to
> > > > > > > > > go.
> > > > > > > > 
> > > > > > > > But my impression was that ARM would not need the
> > > > > > > > range-
> > > > > > > > based
> > > > > > > > interface
> > > > > > > > either, because ARM is only for discrete and with
> > > > > > > > discrete
> > > > > > > > we're
> > > > > > > > always
> > > > > > > > coherent.
> > > > > > > 
> > > > > > > Not sure what you mean here - what about flushing system
> > > > > > > memory
> > > > > > > objects
> > > > > > > on discrete? Those still need flushing on paths like
> > > > > > > suspend
> > > > > > > which this
> > > > > > > series touches. Am I missing something?
> > > > > > 
> > > > > > System bos on discrete should always have
> > > > > > 
> > > > > > I915_BO_CACHE_COHERENT_FOR_READ |
> > > > > > I915_BO_CACHE_COHERENT_FOR_WRITE
> > > > > > 
> > > > > > either by the gpu being fully cache coherent (or us mapping
> > > > > > system
> > > > > > write-combined). Hence no need for cache clflushes or
> > > > > > wbinvd()
> > > > > > for
> > > > > > incoherent IO.
> > > > > 
> > > > > Hmm so you are talking about the shmem ttm backend. It ends
> > > > > up
> > > > > depending on the result of i915_ttm_cache_level, yes? It
> > > > > cannot
> > > > > end
> > > > > up with I915_CACHE_NONE from that function?
> > > > 
> > > > If the object is allocated with allowable placement in either
> > > > LMEM
> > > > or
> > > > SYSTEM, and it ends in system, it gets allocated with
> > > > I915_CACHE_NONE,
> > > > but then the shmem ttm backend isn't used but TTM's wc pools,
> > > > and
> > > > the
> > > > object should *always* be mapped wc. Even in system.
> > > 
> > > I am not familiar with neither TTM backend or wc pools so maybe a
> > > missed
> > > question - if obj->cache_level can be set to none, and
> > > obj->cache_coherency to zero, then during object lifetime helpers
> > > which
> > > consult those fields (like i915_gem_cpu_write_needs_clflush,
> > > __start_cpu_write, etc) are giving out incorrect answers? That
> > > is, it
> > > is
> > > irrelevant that they would say flushes are required, since in
> > > actuality
> > > those objects can never ever and from anywhere be mapped other
> > > than
> > > WC
> > > so flushes aren't actually required?
> > 
> > If we map other than WC somewhere in these situations, that should
> > be a
> > bug needing a fix. It might be that some of these helpers that you
> > mention might still flag that a clflush is needed, and in that case
> > that's an oversight that also needs fixing.
> > 
> > > 
> > > > > I also found in i915_drm.h:
> > > > > 
> > > > >            * As caching mode when specifying
> > > > > `I915_MMAP_OFFSET_FIXED`,
> > > > > WC or WB will
> > > > >            * be used, depending on the object placement on
> > > > > creation. WB
> > > > > will be used
> > > > >            * when the object can only exist in system memory,
> > > > > WC
> > > > > otherwise.
> > > > > 
> > > > > If what you say is true, that on discrete it is _always_ WC,
> > > > > then
> > > > > that needs updating as well.
> > > > 
> > > > If an object is allocated as system only, then it is mapped WB,
> > > > and
> > > > we're relying on the gpu being cache coherent to avoid
> > > > clflushes.
> > > > Same
> > > > is actually currently true if the object happens to be accessed
> > > > by
> > > > the
> > > > cpu while evicted. Might need an update for that.
> > > 
> > > Hmm okay, I think I actually misunderstood something here. I
> > > think
> > > the
> > > reason for difference bbtween smem+lmem object which happens to
> > > be in
> > > smem and smem only object is eluding me.
> > > 
> > > > > > 
> > > > > > That's adhering to Linus'
> > > > > > 
> > > > > > "And I sincerely hope to the gods that no cache-incoherent
> > > > > > i915
> > > > > > mess
> > > > > > ever makes it out of the x86 world. Incoherent IO was
> > > > > > always a
> > > > > > historical mistake and should never ever happen again, so
> > > > > > we
> > > > > > should
> > > > > > not spread that horrific pattern around."
> > > > > 
> > > > > Sure, but I was not talking about IO - just the CPU side
> > > > > access
> > > > > to
> > > > > CPU side objects.
> > > > 
> > > > OK, I was under the impression that clflushes() and wbinvd()s
> > > > in
> > > > i915
> > > > was only ever used to make data visible to non-snooping GPUs.
> > > > 
> > > > Do you mean that there are other uses as well? Agreed the wb
> > > > cache
> > > > flush on on suspend only if gpu is
> > > > !I915_BO_CACHE_COHERENT_FOR_READ?
> > > > looks to not fit this pattern completely.
> > > 
> > > Don't know, I was first trying to understand handling of the
> > > obj->cache_coherent as discussed in the first quote block. Are
> > > the
> > > flags
> > > consistently set and how the Arm low level code will look.
> > > 
> > > > Otherwise, for architectures where memory isn't always fully
> > > > coherent
> > > > with the cpu cache, I'd expect them to use the apis in
> > > > asm/cacheflush.h, like flush_cache_range() and similar, which
> > > > are
> > > > nops
> > > > on x86.
> > > 
> > > Hm do you know why there are no-ops? Like why wouldn't they map
> > > to
> > > clflush?
> > 
> > I think it mostly boils down to the PIPT caches on x86. Everything
> > is
> > assumed to be coherent. Whereas some architextures keep different
> > cache
> > entries for different virtual addresses even if the physical page
> > is
> > the same...
> > 
> > clflushes and wbinvds on x86 are for odd arch-specific situations
> > where, for example where we change caching attributes of the linear
> > kernel map mappings.
> 
> So in summary we have flush_cache_range which is generic, not
> implemented on x86 and works with virtual addresses so not directly
> usable even if x86 implementation was added.

I think for the intended flush_cache_range() semantics: "Make this
range visible to all vms on all cpus", I think the x86 implementation
is actually a nop, and correctly implemented.

> 
> There is also x86 specific clflush_cache_range which works with
> virtual addresses as well so no good for drm_clflush_sg.
> 
> Question you implicitly raise, correct me if I got it wrong, is
> whether we should even be trying to extend drm_clflush_sg for Arm,
> given how most (all?) call sites are not needed on discrete, is that
> right?

Yes exactly. No need to bother figuring this out for ARM, as we don't
do any incoherent IO.

> 
> Would that mean we could leave most of the code as is and just
> replace wbinvd_on_all_cpus with something like i915_flush_cpu_caches,
> which would then legitimately do nothing, at least on Arm if not also
> on discrete in general?

Yes, with the caveat that we should, at least as a second step, make
i915_flush_cpu_caches() range-based if possible from a performance
point of view.

> 
> If that would work it would make a small and easy to review series. I
> don't think it would collide with what Linus asked since it is not
> propagating undesirable things further - given how if there is no
> actual need to flush then there is no need to make it range based
> either.
> 
> Exception would be the dmabuf get pages patch which needs a proper
> implementation of a new drm flush helper.

I think the dmabuf get_pages (note that that's also only for integrated
I915_CACHE_NONE x86-only situations), can be done with

dma_buf_vmap(dma_buf, &virtual);
drm_clflush_virt_range(virtual, length);
dma_buf_vunmap(&virtual);

/Thomas


> 
> Regards,
> 
> Tvrtko



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-22 11:37                         ` Thomas Hellström
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-22 11:37 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld

On Tue, 2022-03-22 at 11:20 +0000, Tvrtko Ursulin wrote:
> 
> On 22/03/2022 10:26, Thomas Hellström wrote:
> > On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
> > > 
> > > On 21/03/2022 15:15, Thomas Hellström wrote:
> > > > On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 21/03/2022 13:40, Thomas Hellström wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
> > > > > > > 
> > > > > > > On 21/03/2022 12:33, Thomas Hellström wrote:
> > > > > > > > On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin
> > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > On 21/03/2022 11:03, Thomas Hellström wrote:
> > > > > > > > > > Hi, Tvrtko.
> > > > > > > > > > 
> > > > > > > > > > On 3/21/22 11:27, Tvrtko Ursulin wrote:
> > > > > > > > > > > 
> > > > > > > > > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > > > > > > > > To align with the discussion in [1][2], this
> > > > > > > > > > > > patch
> > > > > > > > > > > > series
> > > > > > > > > > > > drops
> > > > > > > > > > > > all
> > > > > > > > > > > > usage of
> > > > > > > > > > > > wbvind_on_all_cpus within i915 by either
> > > > > > > > > > > > replacing
> > > > > > > > > > > > the
> > > > > > > > > > > > call
> > > > > > > > > > > > with certain
> > > > > > > > > > > > drm clflush helpers, or reverting to a previous
> > > > > > > > > > > > logic.
> > > > > > > > > > > 
> > > > > > > > > > > AFAIU, complaint from [1] was that it is wrong to
> > > > > > > > > > > provide
> > > > > > > > > > > non
> > > > > > > > > > > x86
> > > > > > > > > > > implementations under the wbinvd_on_all_cpus
> > > > > > > > > > > name.
> > > > > > > > > > > Instead an
> > > > > > > > > > > arch
> > > > > > > > > > > agnostic helper which achieves the same effect
> > > > > > > > > > > could
> > > > > > > > > > > be
> > > > > > > > > > > created.
> > > > > > > > > > > Does
> > > > > > > > > > > Arm have such concept?
> > > > > > > > > > 
> > > > > > > > > > I also understand Linus' email like we shouldn't
> > > > > > > > > > leak
> > > > > > > > > > incoherent
> > > > > > > > > > IO
> > > > > > > > > > to
> > > > > > > > > > other architectures, meaning any remaining
> > > > > > > > > > wbinvd()s
> > > > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > X86
> > > > > > > > > > only.
> > > > > > > > > 
> > > > > > > > > The last part is completely obvious since it is a x86
> > > > > > > > > instruction
> > > > > > > > > name.
> > > > > > > > 
> > > > > > > > Yeah, I meant the function implementing wbinvd()
> > > > > > > > semantics.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > But I think we can't pick a solution until we know
> > > > > > > > > how
> > > > > > > > > the
> > > > > > > > > concept
> > > > > > > > > maps
> > > > > > > > > to Arm and that will also include seeing how the
> > > > > > > > > drm_clflush_sg for
> > > > > > > > > Arm
> > > > > > > > > would look. Is there a range based solution, or just
> > > > > > > > > a
> > > > > > > > > big
> > > > > > > > > hammer
> > > > > > > > > there.
> > > > > > > > > If the latter, then it is no good to churn all these
> > > > > > > > > reverts
> > > > > > > > > but
> > > > > > > > > instead
> > > > > > > > > an arch agnostic wrapper, with a generic name, would
> > > > > > > > > be
> > > > > > > > > the
> > > > > > > > > way to
> > > > > > > > > go.
> > > > > > > > 
> > > > > > > > But my impression was that ARM would not need the
> > > > > > > > range-
> > > > > > > > based
> > > > > > > > interface
> > > > > > > > either, because ARM is only for discrete and with
> > > > > > > > discrete
> > > > > > > > we're
> > > > > > > > always
> > > > > > > > coherent.
> > > > > > > 
> > > > > > > Not sure what you mean here - what about flushing system
> > > > > > > memory
> > > > > > > objects
> > > > > > > on discrete? Those still need flushing on paths like
> > > > > > > suspend
> > > > > > > which this
> > > > > > > series touches. Am I missing something?
> > > > > > 
> > > > > > System bos on discrete should always have
> > > > > > 
> > > > > > I915_BO_CACHE_COHERENT_FOR_READ |
> > > > > > I915_BO_CACHE_COHERENT_FOR_WRITE
> > > > > > 
> > > > > > either by the gpu being fully cache coherent (or us mapping
> > > > > > system
> > > > > > write-combined). Hence no need for cache clflushes or
> > > > > > wbinvd()
> > > > > > for
> > > > > > incoherent IO.
> > > > > 
> > > > > Hmm so you are talking about the shmem ttm backend. It ends
> > > > > up
> > > > > depending on the result of i915_ttm_cache_level, yes? It
> > > > > cannot
> > > > > end
> > > > > up with I915_CACHE_NONE from that function?
> > > > 
> > > > If the object is allocated with allowable placement in either
> > > > LMEM
> > > > or
> > > > SYSTEM, and it ends in system, it gets allocated with
> > > > I915_CACHE_NONE,
> > > > but then the shmem ttm backend isn't used but TTM's wc pools,
> > > > and
> > > > the
> > > > object should *always* be mapped wc. Even in system.
> > > 
> > > I am not familiar with neither TTM backend or wc pools so maybe a
> > > missed
> > > question - if obj->cache_level can be set to none, and
> > > obj->cache_coherency to zero, then during object lifetime helpers
> > > which
> > > consult those fields (like i915_gem_cpu_write_needs_clflush,
> > > __start_cpu_write, etc) are giving out incorrect answers? That
> > > is, it
> > > is
> > > irrelevant that they would say flushes are required, since in
> > > actuality
> > > those objects can never ever and from anywhere be mapped other
> > > than
> > > WC
> > > so flushes aren't actually required?
> > 
> > If we map other than WC somewhere in these situations, that should
> > be a
> > bug needing a fix. It might be that some of these helpers that you
> > mention might still flag that a clflush is needed, and in that case
> > that's an oversight that also needs fixing.
> > 
> > > 
> > > > > I also found in i915_drm.h:
> > > > > 
> > > > >            * As caching mode when specifying
> > > > > `I915_MMAP_OFFSET_FIXED`,
> > > > > WC or WB will
> > > > >            * be used, depending on the object placement on
> > > > > creation. WB
> > > > > will be used
> > > > >            * when the object can only exist in system memory,
> > > > > WC
> > > > > otherwise.
> > > > > 
> > > > > If what you say is true, that on discrete it is _always_ WC,
> > > > > then
> > > > > that needs updating as well.
> > > > 
> > > > If an object is allocated as system only, then it is mapped WB,
> > > > and
> > > > we're relying on the gpu being cache coherent to avoid
> > > > clflushes.
> > > > Same
> > > > is actually currently true if the object happens to be accessed
> > > > by
> > > > the
> > > > cpu while evicted. Might need an update for that.
> > > 
> > > Hmm okay, I think I actually misunderstood something here. I
> > > think
> > > the
> > > reason for difference bbtween smem+lmem object which happens to
> > > be in
> > > smem and smem only object is eluding me.
> > > 
> > > > > > 
> > > > > > That's adhering to Linus'
> > > > > > 
> > > > > > "And I sincerely hope to the gods that no cache-incoherent
> > > > > > i915
> > > > > > mess
> > > > > > ever makes it out of the x86 world. Incoherent IO was
> > > > > > always a
> > > > > > historical mistake and should never ever happen again, so
> > > > > > we
> > > > > > should
> > > > > > not spread that horrific pattern around."
> > > > > 
> > > > > Sure, but I was not talking about IO - just the CPU side
> > > > > access
> > > > > to
> > > > > CPU side objects.
> > > > 
> > > > OK, I was under the impression that clflushes() and wbinvd()s
> > > > in
> > > > i915
> > > > was only ever used to make data visible to non-snooping GPUs.
> > > > 
> > > > Do you mean that there are other uses as well? Agreed the wb
> > > > cache
> > > > flush on on suspend only if gpu is
> > > > !I915_BO_CACHE_COHERENT_FOR_READ?
> > > > looks to not fit this pattern completely.
> > > 
> > > Don't know, I was first trying to understand handling of the
> > > obj->cache_coherent as discussed in the first quote block. Are
> > > the
> > > flags
> > > consistently set and how the Arm low level code will look.
> > > 
> > > > Otherwise, for architectures where memory isn't always fully
> > > > coherent
> > > > with the cpu cache, I'd expect them to use the apis in
> > > > asm/cacheflush.h, like flush_cache_range() and similar, which
> > > > are
> > > > nops
> > > > on x86.
> > > 
> > > Hm do you know why there are no-ops? Like why wouldn't they map
> > > to
> > > clflush?
> > 
> > I think it mostly boils down to the PIPT caches on x86. Everything
> > is
> > assumed to be coherent. Whereas some architextures keep different
> > cache
> > entries for different virtual addresses even if the physical page
> > is
> > the same...
> > 
> > clflushes and wbinvds on x86 are for odd arch-specific situations
> > where, for example where we change caching attributes of the linear
> > kernel map mappings.
> 
> So in summary we have flush_cache_range which is generic, not
> implemented on x86 and works with virtual addresses so not directly
> usable even if x86 implementation was added.

I think for the intended flush_cache_range() semantics: "Make this
range visible to all vms on all cpus", I think the x86 implementation
is actually a nop, and correctly implemented.

> 
> There is also x86 specific clflush_cache_range which works with
> virtual addresses as well so no good for drm_clflush_sg.
> 
> Question you implicitly raise, correct me if I got it wrong, is
> whether we should even be trying to extend drm_clflush_sg for Arm,
> given how most (all?) call sites are not needed on discrete, is that
> right?

Yes exactly. No need to bother figuring this out for ARM, as we don't
do any incoherent IO.

> 
> Would that mean we could leave most of the code as is and just
> replace wbinvd_on_all_cpus with something like i915_flush_cpu_caches,
> which would then legitimately do nothing, at least on Arm if not also
> on discrete in general?

Yes, with the caveat that we should, at least as a second step, make
i915_flush_cpu_caches() range-based if possible from a performance
point of view.

> 
> If that would work it would make a small and easy to review series. I
> don't think it would collide with what Linus asked since it is not
> propagating undesirable things further - given how if there is no
> actual need to flush then there is no need to make it range based
> either.
> 
> Exception would be the dmabuf get pages patch which needs a proper
> implementation of a new drm flush helper.

I think the dmabuf get_pages (note that that's also only for integrated
I915_CACHE_NONE x86-only situations), can be done with

dma_buf_vmap(dma_buf, &virtual);
drm_clflush_virt_range(virtual, length);
dma_buf_vunmap(&virtual);

/Thomas


> 
> Regards,
> 
> Tvrtko



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-22 11:37                         ` [Intel-gfx] " Thomas Hellström
@ 2022-03-22 12:53                           ` Tvrtko Ursulin
  -1 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-22 12:53 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld


On 22/03/2022 11:37, Thomas Hellström wrote:
> On Tue, 2022-03-22 at 11:20 +0000, Tvrtko Ursulin wrote:
>>
>> On 22/03/2022 10:26, Thomas Hellström wrote:
>>> On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/03/2022 15:15, Thomas Hellström wrote:
>>>>> On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 21/03/2022 13:40, Thomas Hellström wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>>>>>>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>>>>>>>> Hi, Tvrtko.
>>>>>>>>>>>
>>>>>>>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>>>>>>>> To align with the discussion in [1][2], this
>>>>>>>>>>>>> patch
>>>>>>>>>>>>> series
>>>>>>>>>>>>> drops
>>>>>>>>>>>>> all
>>>>>>>>>>>>> usage of
>>>>>>>>>>>>> wbvind_on_all_cpus within i915 by either
>>>>>>>>>>>>> replacing
>>>>>>>>>>>>> the
>>>>>>>>>>>>> call
>>>>>>>>>>>>> with certain
>>>>>>>>>>>>> drm clflush helpers, or reverting to a previous
>>>>>>>>>>>>> logic.
>>>>>>>>>>>>
>>>>>>>>>>>> AFAIU, complaint from [1] was that it is wrong to
>>>>>>>>>>>> provide
>>>>>>>>>>>> non
>>>>>>>>>>>> x86
>>>>>>>>>>>> implementations under the wbinvd_on_all_cpus
>>>>>>>>>>>> name.
>>>>>>>>>>>> Instead an
>>>>>>>>>>>> arch
>>>>>>>>>>>> agnostic helper which achieves the same effect
>>>>>>>>>>>> could
>>>>>>>>>>>> be
>>>>>>>>>>>> created.
>>>>>>>>>>>> Does
>>>>>>>>>>>> Arm have such concept?
>>>>>>>>>>>
>>>>>>>>>>> I also understand Linus' email like we shouldn't
>>>>>>>>>>> leak
>>>>>>>>>>> incoherent
>>>>>>>>>>> IO
>>>>>>>>>>> to
>>>>>>>>>>> other architectures, meaning any remaining
>>>>>>>>>>> wbinvd()s
>>>>>>>>>>> should
>>>>>>>>>>> be
>>>>>>>>>>> X86
>>>>>>>>>>> only.
>>>>>>>>>>
>>>>>>>>>> The last part is completely obvious since it is a x86
>>>>>>>>>> instruction
>>>>>>>>>> name.
>>>>>>>>>
>>>>>>>>> Yeah, I meant the function implementing wbinvd()
>>>>>>>>> semantics.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But I think we can't pick a solution until we know
>>>>>>>>>> how
>>>>>>>>>> the
>>>>>>>>>> concept
>>>>>>>>>> maps
>>>>>>>>>> to Arm and that will also include seeing how the
>>>>>>>>>> drm_clflush_sg for
>>>>>>>>>> Arm
>>>>>>>>>> would look. Is there a range based solution, or just
>>>>>>>>>> a
>>>>>>>>>> big
>>>>>>>>>> hammer
>>>>>>>>>> there.
>>>>>>>>>> If the latter, then it is no good to churn all these
>>>>>>>>>> reverts
>>>>>>>>>> but
>>>>>>>>>> instead
>>>>>>>>>> an arch agnostic wrapper, with a generic name, would
>>>>>>>>>> be
>>>>>>>>>> the
>>>>>>>>>> way to
>>>>>>>>>> go.
>>>>>>>>>
>>>>>>>>> But my impression was that ARM would not need the
>>>>>>>>> range-
>>>>>>>>> based
>>>>>>>>> interface
>>>>>>>>> either, because ARM is only for discrete and with
>>>>>>>>> discrete
>>>>>>>>> we're
>>>>>>>>> always
>>>>>>>>> coherent.
>>>>>>>>
>>>>>>>> Not sure what you mean here - what about flushing system
>>>>>>>> memory
>>>>>>>> objects
>>>>>>>> on discrete? Those still need flushing on paths like
>>>>>>>> suspend
>>>>>>>> which this
>>>>>>>> series touches. Am I missing something?
>>>>>>>
>>>>>>> System bos on discrete should always have
>>>>>>>
>>>>>>> I915_BO_CACHE_COHERENT_FOR_READ |
>>>>>>> I915_BO_CACHE_COHERENT_FOR_WRITE
>>>>>>>
>>>>>>> either by the gpu being fully cache coherent (or us mapping
>>>>>>> system
>>>>>>> write-combined). Hence no need for cache clflushes or
>>>>>>> wbinvd()
>>>>>>> for
>>>>>>> incoherent IO.
>>>>>>
>>>>>> Hmm so you are talking about the shmem ttm backend. It ends
>>>>>> up
>>>>>> depending on the result of i915_ttm_cache_level, yes? It
>>>>>> cannot
>>>>>> end
>>>>>> up with I915_CACHE_NONE from that function?
>>>>>
>>>>> If the object is allocated with allowable placement in either
>>>>> LMEM
>>>>> or
>>>>> SYSTEM, and it ends in system, it gets allocated with
>>>>> I915_CACHE_NONE,
>>>>> but then the shmem ttm backend isn't used but TTM's wc pools,
>>>>> and
>>>>> the
>>>>> object should *always* be mapped wc. Even in system.
>>>>
>>>> I am not familiar with neither TTM backend or wc pools so maybe a
>>>> missed
>>>> question - if obj->cache_level can be set to none, and
>>>> obj->cache_coherency to zero, then during object lifetime helpers
>>>> which
>>>> consult those fields (like i915_gem_cpu_write_needs_clflush,
>>>> __start_cpu_write, etc) are giving out incorrect answers? That
>>>> is, it
>>>> is
>>>> irrelevant that they would say flushes are required, since in
>>>> actuality
>>>> those objects can never ever and from anywhere be mapped other
>>>> than
>>>> WC
>>>> so flushes aren't actually required?
>>>
>>> If we map other than WC somewhere in these situations, that should
>>> be a
>>> bug needing a fix. It might be that some of these helpers that you
>>> mention might still flag that a clflush is needed, and in that case
>>> that's an oversight that also needs fixing.
>>>
>>>>
>>>>>> I also found in i915_drm.h:
>>>>>>
>>>>>>             * As caching mode when specifying
>>>>>> `I915_MMAP_OFFSET_FIXED`,
>>>>>> WC or WB will
>>>>>>             * be used, depending on the object placement on
>>>>>> creation. WB
>>>>>> will be used
>>>>>>             * when the object can only exist in system memory,
>>>>>> WC
>>>>>> otherwise.
>>>>>>
>>>>>> If what you say is true, that on discrete it is _always_ WC,
>>>>>> then
>>>>>> that needs updating as well.
>>>>>
>>>>> If an object is allocated as system only, then it is mapped WB,
>>>>> and
>>>>> we're relying on the gpu being cache coherent to avoid
>>>>> clflushes.
>>>>> Same
>>>>> is actually currently true if the object happens to be accessed
>>>>> by
>>>>> the
>>>>> cpu while evicted. Might need an update for that.
>>>>
>>>> Hmm okay, I think I actually misunderstood something here. I
>>>> think
>>>> the
>>>> reason for difference bbtween smem+lmem object which happens to
>>>> be in
>>>> smem and smem only object is eluding me.
>>>>
>>>>>>>
>>>>>>> That's adhering to Linus'
>>>>>>>
>>>>>>> "And I sincerely hope to the gods that no cache-incoherent
>>>>>>> i915
>>>>>>> mess
>>>>>>> ever makes it out of the x86 world. Incoherent IO was
>>>>>>> always a
>>>>>>> historical mistake and should never ever happen again, so
>>>>>>> we
>>>>>>> should
>>>>>>> not spread that horrific pattern around."
>>>>>>
>>>>>> Sure, but I was not talking about IO - just the CPU side
>>>>>> access
>>>>>> to
>>>>>> CPU side objects.
>>>>>
>>>>> OK, I was under the impression that clflushes() and wbinvd()s
>>>>> in
>>>>> i915
>>>>> was only ever used to make data visible to non-snooping GPUs.
>>>>>
>>>>> Do you mean that there are other uses as well? Agreed the wb
>>>>> cache
>>>>> flush on on suspend only if gpu is
>>>>> !I915_BO_CACHE_COHERENT_FOR_READ?
>>>>> looks to not fit this pattern completely.
>>>>
>>>> Don't know, I was first trying to understand handling of the
>>>> obj->cache_coherent as discussed in the first quote block. Are
>>>> the
>>>> flags
>>>> consistently set and how the Arm low level code will look.
>>>>
>>>>> Otherwise, for architectures where memory isn't always fully
>>>>> coherent
>>>>> with the cpu cache, I'd expect them to use the apis in
>>>>> asm/cacheflush.h, like flush_cache_range() and similar, which
>>>>> are
>>>>> nops
>>>>> on x86.
>>>>
>>>> Hm do you know why there are no-ops? Like why wouldn't they map
>>>> to
>>>> clflush?
>>>
>>> I think it mostly boils down to the PIPT caches on x86. Everything
>>> is
>>> assumed to be coherent. Whereas some architextures keep different
>>> cache
>>> entries for different virtual addresses even if the physical page
>>> is
>>> the same...
>>>
>>> clflushes and wbinvds on x86 are for odd arch-specific situations
>>> where, for example where we change caching attributes of the linear
>>> kernel map mappings.
>>
>> So in summary we have flush_cache_range which is generic, not
>> implemented on x86 and works with virtual addresses so not directly
>> usable even if x86 implementation was added.
> 
> I think for the intended flush_cache_range() semantics: "Make this
> range visible to all vms on all cpus", I think the x86 implementation
> is actually a nop, and correctly implemented.

If that is so then I agree. (I did not spend much time looking for 
desired semantics, just noticed there was no kerneldoc next to the 
function and stopped there.)

>> There is also x86 specific clflush_cache_range which works with
>> virtual addresses as well so no good for drm_clflush_sg.
>>
>> Question you implicitly raise, correct me if I got it wrong, is
>> whether we should even be trying to extend drm_clflush_sg for Arm,
>> given how most (all?) call sites are not needed on discrete, is that
>> right?
> 
> Yes exactly. No need to bother figuring this out for ARM, as we don't
> do any incoherent IO.
> 
>>
>> Would that mean we could leave most of the code as is and just
>> replace wbinvd_on_all_cpus with something like i915_flush_cpu_caches,
>> which would then legitimately do nothing, at least on Arm if not also
>> on discrete in general?
> 
> Yes, with the caveat that we should, at least as a second step, make
> i915_flush_cpu_caches() range-based if possible from a performance
> point of view.

Sounds like a plan, and I am counting on the second step part to be 
really second step. Because that one will need to actually figure out 
and elaborate sufficiently all three proposed reverts, which was missing 
in this posting. So first step unblocks Arm builds very cheaply and 
non-controversially, second step tries going the range route.

>> If that would work it would make a small and easy to review series. I
>> don't think it would collide with what Linus asked since it is not
>> propagating undesirable things further - given how if there is no
>> actual need to flush then there is no need to make it range based
>> either.
>>
>> Exception would be the dmabuf get pages patch which needs a proper
>> implementation of a new drm flush helper.
> 
> I think the dmabuf get_pages (note that that's also only for integrated
> I915_CACHE_NONE x86-only situations), can be done with
> 
> dma_buf_vmap(dma_buf, &virtual);
> drm_clflush_virt_range(virtual, length);
> dma_buf_vunmap(&virtual);

Looks plausible to me. Downside being it vmaps the whole object at once 
so may regress, at least on 32-bit (!) builds. Would it work in theory 
to fall back to page by page but would it be worth it just for 32-bit I 
am not sure.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-22 12:53                           ` Tvrtko Ursulin
  0 siblings, 0 replies; 60+ messages in thread
From: Tvrtko Ursulin @ 2022-03-22 12:53 UTC (permalink / raw)
  To: Thomas Hellström, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld


On 22/03/2022 11:37, Thomas Hellström wrote:
> On Tue, 2022-03-22 at 11:20 +0000, Tvrtko Ursulin wrote:
>>
>> On 22/03/2022 10:26, Thomas Hellström wrote:
>>> On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 21/03/2022 15:15, Thomas Hellström wrote:
>>>>> On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 21/03/2022 13:40, Thomas Hellström wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>>>>>>>
>>>>>>>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>>>>>>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>>>>>>>> Hi, Tvrtko.
>>>>>>>>>>>
>>>>>>>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>>>>>>>> To align with the discussion in [1][2], this
>>>>>>>>>>>>> patch
>>>>>>>>>>>>> series
>>>>>>>>>>>>> drops
>>>>>>>>>>>>> all
>>>>>>>>>>>>> usage of
>>>>>>>>>>>>> wbvind_on_all_cpus within i915 by either
>>>>>>>>>>>>> replacing
>>>>>>>>>>>>> the
>>>>>>>>>>>>> call
>>>>>>>>>>>>> with certain
>>>>>>>>>>>>> drm clflush helpers, or reverting to a previous
>>>>>>>>>>>>> logic.
>>>>>>>>>>>>
>>>>>>>>>>>> AFAIU, complaint from [1] was that it is wrong to
>>>>>>>>>>>> provide
>>>>>>>>>>>> non
>>>>>>>>>>>> x86
>>>>>>>>>>>> implementations under the wbinvd_on_all_cpus
>>>>>>>>>>>> name.
>>>>>>>>>>>> Instead an
>>>>>>>>>>>> arch
>>>>>>>>>>>> agnostic helper which achieves the same effect
>>>>>>>>>>>> could
>>>>>>>>>>>> be
>>>>>>>>>>>> created.
>>>>>>>>>>>> Does
>>>>>>>>>>>> Arm have such concept?
>>>>>>>>>>>
>>>>>>>>>>> I also understand Linus' email like we shouldn't
>>>>>>>>>>> leak
>>>>>>>>>>> incoherent
>>>>>>>>>>> IO
>>>>>>>>>>> to
>>>>>>>>>>> other architectures, meaning any remaining
>>>>>>>>>>> wbinvd()s
>>>>>>>>>>> should
>>>>>>>>>>> be
>>>>>>>>>>> X86
>>>>>>>>>>> only.
>>>>>>>>>>
>>>>>>>>>> The last part is completely obvious since it is a x86
>>>>>>>>>> instruction
>>>>>>>>>> name.
>>>>>>>>>
>>>>>>>>> Yeah, I meant the function implementing wbinvd()
>>>>>>>>> semantics.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But I think we can't pick a solution until we know
>>>>>>>>>> how
>>>>>>>>>> the
>>>>>>>>>> concept
>>>>>>>>>> maps
>>>>>>>>>> to Arm and that will also include seeing how the
>>>>>>>>>> drm_clflush_sg for
>>>>>>>>>> Arm
>>>>>>>>>> would look. Is there a range based solution, or just
>>>>>>>>>> a
>>>>>>>>>> big
>>>>>>>>>> hammer
>>>>>>>>>> there.
>>>>>>>>>> If the latter, then it is no good to churn all these
>>>>>>>>>> reverts
>>>>>>>>>> but
>>>>>>>>>> instead
>>>>>>>>>> an arch agnostic wrapper, with a generic name, would
>>>>>>>>>> be
>>>>>>>>>> the
>>>>>>>>>> way to
>>>>>>>>>> go.
>>>>>>>>>
>>>>>>>>> But my impression was that ARM would not need the
>>>>>>>>> range-
>>>>>>>>> based
>>>>>>>>> interface
>>>>>>>>> either, because ARM is only for discrete and with
>>>>>>>>> discrete
>>>>>>>>> we're
>>>>>>>>> always
>>>>>>>>> coherent.
>>>>>>>>
>>>>>>>> Not sure what you mean here - what about flushing system
>>>>>>>> memory
>>>>>>>> objects
>>>>>>>> on discrete? Those still need flushing on paths like
>>>>>>>> suspend
>>>>>>>> which this
>>>>>>>> series touches. Am I missing something?
>>>>>>>
>>>>>>> System bos on discrete should always have
>>>>>>>
>>>>>>> I915_BO_CACHE_COHERENT_FOR_READ |
>>>>>>> I915_BO_CACHE_COHERENT_FOR_WRITE
>>>>>>>
>>>>>>> either by the gpu being fully cache coherent (or us mapping
>>>>>>> system
>>>>>>> write-combined). Hence no need for cache clflushes or
>>>>>>> wbinvd()
>>>>>>> for
>>>>>>> incoherent IO.
>>>>>>
>>>>>> Hmm so you are talking about the shmem ttm backend. It ends
>>>>>> up
>>>>>> depending on the result of i915_ttm_cache_level, yes? It
>>>>>> cannot
>>>>>> end
>>>>>> up with I915_CACHE_NONE from that function?
>>>>>
>>>>> If the object is allocated with allowable placement in either
>>>>> LMEM
>>>>> or
>>>>> SYSTEM, and it ends in system, it gets allocated with
>>>>> I915_CACHE_NONE,
>>>>> but then the shmem ttm backend isn't used but TTM's wc pools,
>>>>> and
>>>>> the
>>>>> object should *always* be mapped wc. Even in system.
>>>>
>>>> I am not familiar with neither TTM backend or wc pools so maybe a
>>>> missed
>>>> question - if obj->cache_level can be set to none, and
>>>> obj->cache_coherency to zero, then during object lifetime helpers
>>>> which
>>>> consult those fields (like i915_gem_cpu_write_needs_clflush,
>>>> __start_cpu_write, etc) are giving out incorrect answers? That
>>>> is, it
>>>> is
>>>> irrelevant that they would say flushes are required, since in
>>>> actuality
>>>> those objects can never ever and from anywhere be mapped other
>>>> than
>>>> WC
>>>> so flushes aren't actually required?
>>>
>>> If we map other than WC somewhere in these situations, that should
>>> be a
>>> bug needing a fix. It might be that some of these helpers that you
>>> mention might still flag that a clflush is needed, and in that case
>>> that's an oversight that also needs fixing.
>>>
>>>>
>>>>>> I also found in i915_drm.h:
>>>>>>
>>>>>>             * As caching mode when specifying
>>>>>> `I915_MMAP_OFFSET_FIXED`,
>>>>>> WC or WB will
>>>>>>             * be used, depending on the object placement on
>>>>>> creation. WB
>>>>>> will be used
>>>>>>             * when the object can only exist in system memory,
>>>>>> WC
>>>>>> otherwise.
>>>>>>
>>>>>> If what you say is true, that on discrete it is _always_ WC,
>>>>>> then
>>>>>> that needs updating as well.
>>>>>
>>>>> If an object is allocated as system only, then it is mapped WB,
>>>>> and
>>>>> we're relying on the gpu being cache coherent to avoid
>>>>> clflushes.
>>>>> Same
>>>>> is actually currently true if the object happens to be accessed
>>>>> by
>>>>> the
>>>>> cpu while evicted. Might need an update for that.
>>>>
>>>> Hmm okay, I think I actually misunderstood something here. I
>>>> think
>>>> the
>>>> reason for difference bbtween smem+lmem object which happens to
>>>> be in
>>>> smem and smem only object is eluding me.
>>>>
>>>>>>>
>>>>>>> That's adhering to Linus'
>>>>>>>
>>>>>>> "And I sincerely hope to the gods that no cache-incoherent
>>>>>>> i915
>>>>>>> mess
>>>>>>> ever makes it out of the x86 world. Incoherent IO was
>>>>>>> always a
>>>>>>> historical mistake and should never ever happen again, so
>>>>>>> we
>>>>>>> should
>>>>>>> not spread that horrific pattern around."
>>>>>>
>>>>>> Sure, but I was not talking about IO - just the CPU side
>>>>>> access
>>>>>> to
>>>>>> CPU side objects.
>>>>>
>>>>> OK, I was under the impression that clflushes() and wbinvd()s
>>>>> in
>>>>> i915
>>>>> was only ever used to make data visible to non-snooping GPUs.
>>>>>
>>>>> Do you mean that there are other uses as well? Agreed the wb
>>>>> cache
>>>>> flush on on suspend only if gpu is
>>>>> !I915_BO_CACHE_COHERENT_FOR_READ?
>>>>> looks to not fit this pattern completely.
>>>>
>>>> Don't know, I was first trying to understand handling of the
>>>> obj->cache_coherent as discussed in the first quote block. Are
>>>> the
>>>> flags
>>>> consistently set and how the Arm low level code will look.
>>>>
>>>>> Otherwise, for architectures where memory isn't always fully
>>>>> coherent
>>>>> with the cpu cache, I'd expect them to use the apis in
>>>>> asm/cacheflush.h, like flush_cache_range() and similar, which
>>>>> are
>>>>> nops
>>>>> on x86.
>>>>
>>>> Hm do you know why there are no-ops? Like why wouldn't they map
>>>> to
>>>> clflush?
>>>
>>> I think it mostly boils down to the PIPT caches on x86. Everything
>>> is
>>> assumed to be coherent. Whereas some architextures keep different
>>> cache
>>> entries for different virtual addresses even if the physical page
>>> is
>>> the same...
>>>
>>> clflushes and wbinvds on x86 are for odd arch-specific situations
>>> where, for example where we change caching attributes of the linear
>>> kernel map mappings.
>>
>> So in summary we have flush_cache_range which is generic, not
>> implemented on x86 and works with virtual addresses so not directly
>> usable even if x86 implementation was added.
> 
> I think for the intended flush_cache_range() semantics: "Make this
> range visible to all vms on all cpus", I think the x86 implementation
> is actually a nop, and correctly implemented.

If that is so then I agree. (I did not spend much time looking for 
desired semantics, just noticed there was no kerneldoc next to the 
function and stopped there.)

>> There is also x86 specific clflush_cache_range which works with
>> virtual addresses as well so no good for drm_clflush_sg.
>>
>> Question you implicitly raise, correct me if I got it wrong, is
>> whether we should even be trying to extend drm_clflush_sg for Arm,
>> given how most (all?) call sites are not needed on discrete, is that
>> right?
> 
> Yes exactly. No need to bother figuring this out for ARM, as we don't
> do any incoherent IO.
> 
>>
>> Would that mean we could leave most of the code as is and just
>> replace wbinvd_on_all_cpus with something like i915_flush_cpu_caches,
>> which would then legitimately do nothing, at least on Arm if not also
>> on discrete in general?
> 
> Yes, with the caveat that we should, at least as a second step, make
> i915_flush_cpu_caches() range-based if possible from a performance
> point of view.

Sounds like a plan, and I am counting on the second step part to be 
really second step. Because that one will need to actually figure out 
and elaborate sufficiently all three proposed reverts, which was missing 
in this posting. So first step unblocks Arm builds very cheaply and 
non-controversially, second step tries going the range route.

>> If that would work it would make a small and easy to review series. I
>> don't think it would collide with what Linus asked since it is not
>> propagating undesirable things further - given how if there is no
>> actual need to flush then there is no need to make it range based
>> either.
>>
>> Exception would be the dmabuf get pages patch which needs a proper
>> implementation of a new drm flush helper.
> 
> I think the dmabuf get_pages (note that that's also only for integrated
> I915_CACHE_NONE x86-only situations), can be done with
> 
> dma_buf_vmap(dma_buf, &virtual);
> drm_clflush_virt_range(virtual, length);
> dma_buf_vunmap(&virtual);

Looks plausible to me. Downside being it vmaps the whole object at once 
so may regress, at least on 32-bit (!) builds. Would it work in theory 
to fall back to page by page but would it be worth it just for 32-bit I 
am not sure.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
  2022-03-21 17:42           ` [Intel-gfx] " Michael Cheng
@ 2022-03-22 14:35             ` Daniel Vetter
  -1 siblings, 0 replies; 60+ messages in thread
From: Daniel Vetter @ 2022-03-22 14:35 UTC (permalink / raw)
  To: Michael Cheng
  Cc: Tvrtko Ursulin, thomas.hellstrom, daniel.vetter, intel-gfx,
	lucas.demarchi, dri-devel, chris, Daniel Vetter

On Mon, Mar 21, 2022 at 10:42:03AM -0700, Michael Cheng wrote:
> 
> On 2022-03-21 10:28 a.m., Tvrtko Ursulin wrote:
> > 
> > On 21/03/2022 16:31, Michael Cheng wrote:
> > > On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
> > > 
> > > > 
> > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > Previous concern with using drm_clflush_sg was that we don't
> > > > > know what the
> > > > > sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
> > > > > everything at once to avoid paranoia.
> > > > 
> > > > And now we know, or we know it is not a concern?
> > > > 
> > > > > To make i915 more architecture-neutral and be less paranoid,
> > > > > lets attempt to
> > > > 
> > > > "Lets attempt" as we don't know if this will work and/or what
> > > > can/will break?
> > > 
> > > Yes, but it seems like there's no regression with IGT .
> > > 
> > > If there's a big hit in performance, or if this solution gets
> > > accepted and the bug reports come flying in, we can explore other
> > > solutions. But speaking to Dan Vetter, ideal solution would be to
> > > avoid any calls directly to wbinvd, and use drm helpers in place.
> > > 
> > > +Daniel for any extra input.
> > > 
> > > > > use drm_clflush_sg to flush the pages for when the GPU wants to read
> > > > > from main memory.
> > > > > 
> > > > > Signed-off-by: Michael Cheng <michael.cheng@intel.com>
> > > > > ---
> > > > >   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
> > > > >   1 file changed, 2 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > index f5062d0c6333..b0a5baaebc43 100644
> > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > @@ -8,6 +8,7 @@
> > > > >   #include <linux/highmem.h>
> > > > >   #include <linux/dma-resv.h>
> > > > >   #include <linux/module.h>
> > > > > +#include <drm/drm_cache.h>
> > > > >     #include <asm/smp.h>
> > > > >   @@ -250,16 +251,10 @@ static int
> > > > > i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object
> > > > > *obj)
> > > > >        * DG1 is special here since it still snoops
> > > > > transactions even with
> > > > >        * CACHE_NONE. This is not the case with other
> > > > > HAS_SNOOP platforms. We
> > > > >        * might need to revisit this as we add new discrete platforms.
> > > > > -     *
> > > > > -     * XXX: Consider doing a vmap flush or something, where possible.
> > > > > -     * Currently we just do a heavy handed
> > > > > wbinvd_on_all_cpus() here since
> > > > > -     * the underlying sg_table might not even point to
> > > > > struct pages, so we
> > > > > -     * can't just call drm_clflush_sg or similar, like we
> > > > > do elsewhere in
> > > > > -     * the driver.
> > > > >        */
> > > > >       if (i915_gem_object_can_bypass_llc(obj) ||
> > > > >           (!HAS_LLC(i915) && !IS_DG1(i915)))
> > > > > -        wbinvd_on_all_cpus();
> > > > > +        drm_clflush_sg(pages);
> > > > 
> > > > And as noticed before, drm_clfush_sg still can call
> > > > wbinvd_on_all_cpus so are you just punting the issue somewhere
> > > > else? How will it be solved there?
> > > > 
> > > Instead of calling an x86 asm directly, we are using what's
> > > available to use to make the driver more architecture neutral.
> > > Agreeing with Thomas, this solution falls within the "prefer
> > > range-aware clflush apis", and since some other generation platform
> > > doesn't support clflushopt, it will fall back to using wbinvd.
> > 
> > Right, I was trying to get the information on what will drm_clflush_sg
> > do on Arm. Is it range based or global there, or if the latter exists.
> > 
> I am not too sure about the ARM side. We are currently working that out with
> the ARM folks in a different thread.

It won't do anything useful on arm. The _only_ way to get special memory
on arm is by specifying what you want at allocation time. Anything else is
busted, more or less. Which is why none of these code paths should run on
anything else than x86.

And even on x86 they're at best questionable, but some of these are
mistakes encoded into uapi and we're stuck.

We should still try to use drm_clflush_sg() imo to make the entire ordeal
less horrible, and if that turns out to be problematic, we need to bite
the bullet and fix the uapi architecture instead of trying to
retroshoehorn performance fixes into uapi that just can't do it properly.

In this case here this would mean fixing allocation flags with
GEM_CREATE_EXT and fixing userspace to use that when needed (it should
know already since pretty much all drivers have this issue in some form or
another).

Cheers, Daniel


> > Regards,
> > 
> > Tvrtko

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 1/4] i915/gem: drop wbinvd_on_all_cpus usage
@ 2022-03-22 14:35             ` Daniel Vetter
  0 siblings, 0 replies; 60+ messages in thread
From: Daniel Vetter @ 2022-03-22 14:35 UTC (permalink / raw)
  To: Michael Cheng
  Cc: thomas.hellstrom, daniel.vetter, intel-gfx, lucas.demarchi,
	dri-devel, chris, Daniel Vetter

On Mon, Mar 21, 2022 at 10:42:03AM -0700, Michael Cheng wrote:
> 
> On 2022-03-21 10:28 a.m., Tvrtko Ursulin wrote:
> > 
> > On 21/03/2022 16:31, Michael Cheng wrote:
> > > On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
> > > 
> > > > 
> > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > Previous concern with using drm_clflush_sg was that we don't
> > > > > know what the
> > > > > sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
> > > > > everything at once to avoid paranoia.
> > > > 
> > > > And now we know, or we know it is not a concern?
> > > > 
> > > > > To make i915 more architecture-neutral and be less paranoid,
> > > > > lets attempt to
> > > > 
> > > > "Lets attempt" as we don't know if this will work and/or what
> > > > can/will break?
> > > 
> > > Yes, but it seems like there's no regression with IGT .
> > > 
> > > If there's a big hit in performance, or if this solution gets
> > > accepted and the bug reports come flying in, we can explore other
> > > solutions. But speaking to Dan Vetter, ideal solution would be to
> > > avoid any calls directly to wbinvd, and use drm helpers in place.
> > > 
> > > +Daniel for any extra input.
> > > 
> > > > > use drm_clflush_sg to flush the pages for when the GPU wants to read
> > > > > from main memory.
> > > > > 
> > > > > Signed-off-by: Michael Cheng <michael.cheng@intel.com>
> > > > > ---
> > > > >   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
> > > > >   1 file changed, 2 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > index f5062d0c6333..b0a5baaebc43 100644
> > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > @@ -8,6 +8,7 @@
> > > > >   #include <linux/highmem.h>
> > > > >   #include <linux/dma-resv.h>
> > > > >   #include <linux/module.h>
> > > > > +#include <drm/drm_cache.h>
> > > > >     #include <asm/smp.h>
> > > > >   @@ -250,16 +251,10 @@ static int
> > > > > i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object
> > > > > *obj)
> > > > >        * DG1 is special here since it still snoops
> > > > > transactions even with
> > > > >        * CACHE_NONE. This is not the case with other
> > > > > HAS_SNOOP platforms. We
> > > > >        * might need to revisit this as we add new discrete platforms.
> > > > > -     *
> > > > > -     * XXX: Consider doing a vmap flush or something, where possible.
> > > > > -     * Currently we just do a heavy handed
> > > > > wbinvd_on_all_cpus() here since
> > > > > -     * the underlying sg_table might not even point to
> > > > > struct pages, so we
> > > > > -     * can't just call drm_clflush_sg or similar, like we
> > > > > do elsewhere in
> > > > > -     * the driver.
> > > > >        */
> > > > >       if (i915_gem_object_can_bypass_llc(obj) ||
> > > > >           (!HAS_LLC(i915) && !IS_DG1(i915)))
> > > > > -        wbinvd_on_all_cpus();
> > > > > +        drm_clflush_sg(pages);
> > > > 
> > > > And as noticed before, drm_clfush_sg still can call
> > > > wbinvd_on_all_cpus so are you just punting the issue somewhere
> > > > else? How will it be solved there?
> > > > 
> > > Instead of calling an x86 asm directly, we are using what's
> > > available to use to make the driver more architecture neutral.
> > > Agreeing with Thomas, this solution falls within the "prefer
> > > range-aware clflush apis", and since some other generation platform
> > > doesn't support clflushopt, it will fall back to using wbinvd.
> > 
> > Right, I was trying to get the information on what will drm_clflush_sg
> > do on Arm. Is it range based or global there, or if the latter exists.
> > 
> I am not too sure about the ARM side. We are currently working that out with
> the ARM folks in a different thread.

It won't do anything useful on arm. The _only_ way to get special memory
on arm is by specifying what you want at allocation time. Anything else is
busted, more or less. Which is why none of these code paths should run on
anything else than x86.

And even on x86 they're at best questionable, but some of these are
mistakes encoded into uapi and we're stuck.

We should still try to use drm_clflush_sg() imo to make the entire ordeal
less horrible, and if that turns out to be problematic, we need to bite
the bullet and fix the uapi architecture instead of trying to
retroshoehorn performance fixes into uapi that just can't do it properly.

In this case here this would mean fixing allocation flags with
GEM_CREATE_EXT and fixing userspace to use that when needed (it should
know already since pretty much all drivers have this issue in some form or
another).

Cheers, Daniel


> > Regards,
> > 
> > Tvrtko

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH 0/4] Drop wbinvd_on_all_cpus usage
  2022-03-22 12:53                           ` [Intel-gfx] " Tvrtko Ursulin
@ 2022-03-22 15:07                             ` Thomas Hellström
  -1 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-22 15:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: wayne.boyer, daniel.vetter, casey.g.bowman, lucas.demarchi,
	dri-devel, chris, Matthew Auld


On 3/22/22 13:53, Tvrtko Ursulin wrote:
>
> On 22/03/2022 11:37, Thomas Hellström wrote:
>> On Tue, 2022-03-22 at 11:20 +0000, Tvrtko Ursulin wrote:
>>>
>>> On 22/03/2022 10:26, Thomas Hellström wrote:
>>>> On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 21/03/2022 15:15, Thomas Hellström wrote:
>>>>>> On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 21/03/2022 13:40, Thomas Hellström wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>>>>>>>>
>>>>>>>>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>>>>>>>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>>>>>>>>> Hi, Tvrtko.
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>>>>>>>>> To align with the discussion in [1][2], this
>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>> series
>>>>>>>>>>>>>> drops
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> usage of
>>>>>>>>>>>>>> wbvind_on_all_cpus within i915 by either
>>>>>>>>>>>>>> replacing
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> call
>>>>>>>>>>>>>> with certain
>>>>>>>>>>>>>> drm clflush helpers, or reverting to a previous
>>>>>>>>>>>>>> logic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> AFAIU, complaint from [1] was that it is wrong to
>>>>>>>>>>>>> provide
>>>>>>>>>>>>> non
>>>>>>>>>>>>> x86
>>>>>>>>>>>>> implementations under the wbinvd_on_all_cpus
>>>>>>>>>>>>> name.
>>>>>>>>>>>>> Instead an
>>>>>>>>>>>>> arch
>>>>>>>>>>>>> agnostic helper which achieves the same effect
>>>>>>>>>>>>> could
>>>>>>>>>>>>> be
>>>>>>>>>>>>> created.
>>>>>>>>>>>>> Does
>>>>>>>>>>>>> Arm have such concept?
>>>>>>>>>>>>
>>>>>>>>>>>> I also understand Linus' email like we shouldn't
>>>>>>>>>>>> leak
>>>>>>>>>>>> incoherent
>>>>>>>>>>>> IO
>>>>>>>>>>>> to
>>>>>>>>>>>> other architectures, meaning any remaining
>>>>>>>>>>>> wbinvd()s
>>>>>>>>>>>> should
>>>>>>>>>>>> be
>>>>>>>>>>>> X86
>>>>>>>>>>>> only.
>>>>>>>>>>>
>>>>>>>>>>> The last part is completely obvious since it is a x86
>>>>>>>>>>> instruction
>>>>>>>>>>> name.
>>>>>>>>>>
>>>>>>>>>> Yeah, I meant the function implementing wbinvd()
>>>>>>>>>> semantics.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But I think we can't pick a solution until we know
>>>>>>>>>>> how
>>>>>>>>>>> the
>>>>>>>>>>> concept
>>>>>>>>>>> maps
>>>>>>>>>>> to Arm and that will also include seeing how the
>>>>>>>>>>> drm_clflush_sg for
>>>>>>>>>>> Arm
>>>>>>>>>>> would look. Is there a range based solution, or just
>>>>>>>>>>> a
>>>>>>>>>>> big
>>>>>>>>>>> hammer
>>>>>>>>>>> there.
>>>>>>>>>>> If the latter, then it is no good to churn all these
>>>>>>>>>>> reverts
>>>>>>>>>>> but
>>>>>>>>>>> instead
>>>>>>>>>>> an arch agnostic wrapper, with a generic name, would
>>>>>>>>>>> be
>>>>>>>>>>> the
>>>>>>>>>>> way to
>>>>>>>>>>> go.
>>>>>>>>>>
>>>>>>>>>> But my impression was that ARM would not need the
>>>>>>>>>> range-
>>>>>>>>>> based
>>>>>>>>>> interface
>>>>>>>>>> either, because ARM is only for discrete and with
>>>>>>>>>> discrete
>>>>>>>>>> we're
>>>>>>>>>> always
>>>>>>>>>> coherent.
>>>>>>>>>
>>>>>>>>> Not sure what you mean here - what about flushing system
>>>>>>>>> memory
>>>>>>>>> objects
>>>>>>>>> on discrete? Those still need flushing on paths like
>>>>>>>>> suspend
>>>>>>>>> which this
>>>>>>>>> series touches. Am I missing something?
>>>>>>>>
>>>>>>>> System bos on discrete should always have
>>>>>>>>
>>>>>>>> I915_BO_CACHE_COHERENT_FOR_READ |
>>>>>>>> I915_BO_CACHE_COHERENT_FOR_WRITE
>>>>>>>>
>>>>>>>> either by the gpu being fully cache coherent (or us mapping
>>>>>>>> system
>>>>>>>> write-combined). Hence no need for cache clflushes or
>>>>>>>> wbinvd()
>>>>>>>> for
>>>>>>>> incoherent IO.
>>>>>>>
>>>>>>> Hmm so you are talking about the shmem ttm backend. It ends
>>>>>>> up
>>>>>>> depending on the result of i915_ttm_cache_level, yes? It
>>>>>>> cannot
>>>>>>> end
>>>>>>> up with I915_CACHE_NONE from that function?
>>>>>>
>>>>>> If the object is allocated with allowable placement in either
>>>>>> LMEM
>>>>>> or
>>>>>> SYSTEM, and it ends in system, it gets allocated with
>>>>>> I915_CACHE_NONE,
>>>>>> but then the shmem ttm backend isn't used but TTM's wc pools,
>>>>>> and
>>>>>> the
>>>>>> object should *always* be mapped wc. Even in system.
>>>>>
>>>>> I am not familiar with neither TTM backend or wc pools so maybe a
>>>>> missed
>>>>> question - if obj->cache_level can be set to none, and
>>>>> obj->cache_coherency to zero, then during object lifetime helpers
>>>>> which
>>>>> consult those fields (like i915_gem_cpu_write_needs_clflush,
>>>>> __start_cpu_write, etc) are giving out incorrect answers? That
>>>>> is, it
>>>>> is
>>>>> irrelevant that they would say flushes are required, since in
>>>>> actuality
>>>>> those objects can never ever and from anywhere be mapped other
>>>>> than
>>>>> WC
>>>>> so flushes aren't actually required?
>>>>
>>>> If we map other than WC somewhere in these situations, that should
>>>> be a
>>>> bug needing a fix. It might be that some of these helpers that you
>>>> mention might still flag that a clflush is needed, and in that case
>>>> that's an oversight that also needs fixing.
>>>>
>>>>>
>>>>>>> I also found in i915_drm.h:
>>>>>>>
>>>>>>>             * As caching mode when specifying
>>>>>>> `I915_MMAP_OFFSET_FIXED`,
>>>>>>> WC or WB will
>>>>>>>             * be used, depending on the object placement on
>>>>>>> creation. WB
>>>>>>> will be used
>>>>>>>             * when the object can only exist in system memory,
>>>>>>> WC
>>>>>>> otherwise.
>>>>>>>
>>>>>>> If what you say is true, that on discrete it is _always_ WC,
>>>>>>> then
>>>>>>> that needs updating as well.
>>>>>>
>>>>>> If an object is allocated as system only, then it is mapped WB,
>>>>>> and
>>>>>> we're relying on the gpu being cache coherent to avoid
>>>>>> clflushes.
>>>>>> Same
>>>>>> is actually currently true if the object happens to be accessed
>>>>>> by
>>>>>> the
>>>>>> cpu while evicted. Might need an update for that.
>>>>>
>>>>> Hmm okay, I think I actually misunderstood something here. I
>>>>> think
>>>>> the
>>>>> reason for difference bbtween smem+lmem object which happens to
>>>>> be in
>>>>> smem and smem only object is eluding me.
>>>>>
>>>>>>>>
>>>>>>>> That's adhering to Linus'
>>>>>>>>
>>>>>>>> "And I sincerely hope to the gods that no cache-incoherent
>>>>>>>> i915
>>>>>>>> mess
>>>>>>>> ever makes it out of the x86 world. Incoherent IO was
>>>>>>>> always a
>>>>>>>> historical mistake and should never ever happen again, so
>>>>>>>> we
>>>>>>>> should
>>>>>>>> not spread that horrific pattern around."
>>>>>>>
>>>>>>> Sure, but I was not talking about IO - just the CPU side
>>>>>>> access
>>>>>>> to
>>>>>>> CPU side objects.
>>>>>>
>>>>>> OK, I was under the impression that clflushes() and wbinvd()s
>>>>>> in
>>>>>> i915
>>>>>> was only ever used to make data visible to non-snooping GPUs.
>>>>>>
>>>>>> Do you mean that there are other uses as well? Agreed the wb
>>>>>> cache
>>>>>> flush on on suspend only if gpu is
>>>>>> !I915_BO_CACHE_COHERENT_FOR_READ?
>>>>>> looks to not fit this pattern completely.
>>>>>
>>>>> Don't know, I was first trying to understand handling of the
>>>>> obj->cache_coherent as discussed in the first quote block. Are
>>>>> the
>>>>> flags
>>>>> consistently set and how the Arm low level code will look.
>>>>>
>>>>>> Otherwise, for architectures where memory isn't always fully
>>>>>> coherent
>>>>>> with the cpu cache, I'd expect them to use the apis in
>>>>>> asm/cacheflush.h, like flush_cache_range() and similar, which
>>>>>> are
>>>>>> nops
>>>>>> on x86.
>>>>>
>>>>> Hm do you know why there are no-ops? Like why wouldn't they map
>>>>> to
>>>>> clflush?
>>>>
>>>> I think it mostly boils down to the PIPT caches on x86. Everything
>>>> is
>>>> assumed to be coherent. Whereas some architextures keep different
>>>> cache
>>>> entries for different virtual addresses even if the physical page
>>>> is
>>>> the same...
>>>>
>>>> clflushes and wbinvds on x86 are for odd arch-specific situations
>>>> where, for example where we change caching attributes of the linear
>>>> kernel map mappings.
>>>
>>> So in summary we have flush_cache_range which is generic, not
>>> implemented on x86 and works with virtual addresses so not directly
>>> usable even if x86 implementation was added.
>>
>> I think for the intended flush_cache_range() semantics: "Make this
>> range visible to all vms on all cpus", I think the x86 implementation
>> is actually a nop, and correctly implemented.
>
> If that is so then I agree. (I did not spend much time looking for 
> desired semantics, just noticed there was no kerneldoc next to the 
> function and stopped there.)
>
>>> There is also x86 specific clflush_cache_range which works with
>>> virtual addresses as well so no good for drm_clflush_sg.
>>>
>>> Question you implicitly raise, correct me if I got it wrong, is
>>> whether we should even be trying to extend drm_clflush_sg for Arm,
>>> given how most (all?) call sites are not needed on discrete, is that
>>> right?
>>
>> Yes exactly. No need to bother figuring this out for ARM, as we don't
>> do any incoherent IO.
>>
>>>
>>> Would that mean we could leave most of the code as is and just
>>> replace wbinvd_on_all_cpus with something like i915_flush_cpu_caches,
>>> which would then legitimately do nothing, at least on Arm if not also
>>> on discrete in general?
>>
>> Yes, with the caveat that we should, at least as a second step, make
>> i915_flush_cpu_caches() range-based if possible from a performance
>> point of view.
>
> Sounds like a plan, and I am counting on the second step part to be 
> really second step. Because that one will need to actually figure out 
> and elaborate sufficiently all three proposed reverts, which was 
> missing in this posting. So first step unblocks Arm builds very 
> cheaply and non-controversially, second step tries going the range route.
>
>>> If that would work it would make a small and easy to review series. I
>>> don't think it would collide with what Linus asked since it is not
>>> propagating undesirable things further - given how if there is no
>>> actual need to flush then there is no need to make it range based
>>> either.
>>>
>>> Exception would be the dmabuf get pages patch which needs a proper
>>> implementation of a new drm flush helper.
>>
>> I think the dmabuf get_pages (note that that's also only for integrated
>> I915_CACHE_NONE x86-only situations), can be done with
>>
>> dma_buf_vmap(dma_buf, &virtual);
>> drm_clflush_virt_range(virtual, length);
>> dma_buf_vunmap(&virtual);
>
> Looks plausible to me. Downside being it vmaps the whole object at 
> once so may regress, at least on 32-bit (!) builds. Would it work in 
> theory to fall back to page by page but would it be worth it just for 
> 32-bit I am not sure.

Back in the days IIRC there was a kmap() api also for dma-buf. But 
nobody used it, and yes, vmap is not ideal but a simple fallback to 
page-based (or even wbinvd on the rare occasion of vmap error) might be ok.

/Thomas


>
> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Intel-gfx] [PATCH 0/4] Drop wbinvd_on_all_cpus usage
@ 2022-03-22 15:07                             ` Thomas Hellström
  0 siblings, 0 replies; 60+ messages in thread
From: Thomas Hellström @ 2022-03-22 15:07 UTC (permalink / raw)
  To: Tvrtko Ursulin, Michael Cheng, intel-gfx
  Cc: daniel.vetter, lucas.demarchi, dri-devel, chris, Matthew Auld


On 3/22/22 13:53, Tvrtko Ursulin wrote:
>
> On 22/03/2022 11:37, Thomas Hellström wrote:
>> On Tue, 2022-03-22 at 11:20 +0000, Tvrtko Ursulin wrote:
>>>
>>> On 22/03/2022 10:26, Thomas Hellström wrote:
>>>> On Tue, 2022-03-22 at 10:13 +0000, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 21/03/2022 15:15, Thomas Hellström wrote:
>>>>>> On Mon, 2022-03-21 at 14:43 +0000, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 21/03/2022 13:40, Thomas Hellström wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Mon, 2022-03-21 at 13:12 +0000, Tvrtko Ursulin wrote:
>>>>>>>>>
>>>>>>>>> On 21/03/2022 12:33, Thomas Hellström wrote:
>>>>>>>>>> On Mon, 2022-03-21 at 12:22 +0000, Tvrtko Ursulin
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 21/03/2022 11:03, Thomas Hellström wrote:
>>>>>>>>>>>> Hi, Tvrtko.
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/21/22 11:27, Tvrtko Ursulin wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>>>>>>>>>>>> To align with the discussion in [1][2], this
>>>>>>>>>>>>>> patch
>>>>>>>>>>>>>> series
>>>>>>>>>>>>>> drops
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>> usage of
>>>>>>>>>>>>>> wbvind_on_all_cpus within i915 by either
>>>>>>>>>>>>>> replacing
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> call
>>>>>>>>>>>>>> with certain
>>>>>>>>>>>>>> drm clflush helpers, or reverting to a previous
>>>>>>>>>>>>>> logic.
>>>>>>>>>>>>>
>>>>>>>>>>>>> AFAIU, complaint from [1] was that it is wrong to
>>>>>>>>>>>>> provide
>>>>>>>>>>>>> non
>>>>>>>>>>>>> x86
>>>>>>>>>>>>> implementations under the wbinvd_on_all_cpus
>>>>>>>>>>>>> name.
>>>>>>>>>>>>> Instead an
>>>>>>>>>>>>> arch
>>>>>>>>>>>>> agnostic helper which achieves the same effect
>>>>>>>>>>>>> could
>>>>>>>>>>>>> be
>>>>>>>>>>>>> created.
>>>>>>>>>>>>> Does
>>>>>>>>>>>>> Arm have such concept?
>>>>>>>>>>>>
>>>>>>>>>>>> I also understand Linus' email like we shouldn't
>>>>>>>>>>>> leak
>>>>>>>>>>>> incoherent
>>>>>>>>>>>> IO
>>>>>>>>>>>> to
>>>>>>>>>>>> other architectures, meaning any remaining
>>>>>>>>>>>> wbinvd()s
>>>>>>>>>>>> should
>>>>>>>>>>>> be
>>>>>>>>>>>> X86
>>>>>>>>>>>> only.
>>>>>>>>>>>
>>>>>>>>>>> The last part is completely obvious since it is a x86
>>>>>>>>>>> instruction
>>>>>>>>>>> name.
>>>>>>>>>>
>>>>>>>>>> Yeah, I meant the function implementing wbinvd()
>>>>>>>>>> semantics.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> But I think we can't pick a solution until we know
>>>>>>>>>>> how
>>>>>>>>>>> the
>>>>>>>>>>> concept
>>>>>>>>>>> maps
>>>>>>>>>>> to Arm and that will also include seeing how the
>>>>>>>>>>> drm_clflush_sg for
>>>>>>>>>>> Arm
>>>>>>>>>>> would look. Is there a range based solution, or just
>>>>>>>>>>> a
>>>>>>>>>>> big
>>>>>>>>>>> hammer
>>>>>>>>>>> there.
>>>>>>>>>>> If the latter, then it is no good to churn all these
>>>>>>>>>>> reverts
>>>>>>>>>>> but
>>>>>>>>>>> instead
>>>>>>>>>>> an arch agnostic wrapper, with a generic name, would
>>>>>>>>>>> be
>>>>>>>>>>> the
>>>>>>>>>>> way to
>>>>>>>>>>> go.
>>>>>>>>>>
>>>>>>>>>> But my impression was that ARM would not need the
>>>>>>>>>> range-
>>>>>>>>>> based
>>>>>>>>>> interface
>>>>>>>>>> either, because ARM is only for discrete and with
>>>>>>>>>> discrete
>>>>>>>>>> we're
>>>>>>>>>> always
>>>>>>>>>> coherent.
>>>>>>>>>
>>>>>>>>> Not sure what you mean here - what about flushing system
>>>>>>>>> memory
>>>>>>>>> objects
>>>>>>>>> on discrete? Those still need flushing on paths like
>>>>>>>>> suspend
>>>>>>>>> which this
>>>>>>>>> series touches. Am I missing something?
>>>>>>>>
>>>>>>>> System bos on discrete should always have
>>>>>>>>
>>>>>>>> I915_BO_CACHE_COHERENT_FOR_READ |
>>>>>>>> I915_BO_CACHE_COHERENT_FOR_WRITE
>>>>>>>>
>>>>>>>> either by the gpu being fully cache coherent (or us mapping
>>>>>>>> system
>>>>>>>> write-combined). Hence no need for cache clflushes or
>>>>>>>> wbinvd()
>>>>>>>> for
>>>>>>>> incoherent IO.
>>>>>>>
>>>>>>> Hmm so you are talking about the shmem ttm backend. It ends
>>>>>>> up
>>>>>>> depending on the result of i915_ttm_cache_level, yes? It
>>>>>>> cannot
>>>>>>> end
>>>>>>> up with I915_CACHE_NONE from that function?
>>>>>>
>>>>>> If the object is allocated with allowable placement in either
>>>>>> LMEM
>>>>>> or
>>>>>> SYSTEM, and it ends in system, it gets allocated with
>>>>>> I915_CACHE_NONE,
>>>>>> but then the shmem ttm backend isn't used but TTM's wc pools,
>>>>>> and
>>>>>> the
>>>>>> object should *always* be mapped wc. Even in system.
>>>>>
>>>>> I am not familiar with neither TTM backend or wc pools so maybe a
>>>>> missed
>>>>> question - if obj->cache_level can be set to none, and
>>>>> obj->cache_coherency to zero, then during object lifetime helpers
>>>>> which
>>>>> consult those fields (like i915_gem_cpu_write_needs_clflush,
>>>>> __start_cpu_write, etc) are giving out incorrect answers? That
>>>>> is, it
>>>>> is
>>>>> irrelevant that they would say flushes are required, since in
>>>>> actuality
>>>>> those objects can never ever and from anywhere be mapped other
>>>>> than
>>>>> WC
>>>>> so flushes aren't actually required?
>>>>
>>>> If we map other than WC somewhere in these situations, that should
>>>> be a
>>>> bug needing a fix. It might be that some of these helpers that you
>>>> mention might still flag that a clflush is needed, and in that case
>>>> that's an oversight that also needs fixing.
>>>>
>>>>>
>>>>>>> I also found in i915_drm.h:
>>>>>>>
>>>>>>>             * As caching mode when specifying
>>>>>>> `I915_MMAP_OFFSET_FIXED`,
>>>>>>> WC or WB will
>>>>>>>             * be used, depending on the object placement on
>>>>>>> creation. WB
>>>>>>> will be used
>>>>>>>             * when the object can only exist in system memory,
>>>>>>> WC
>>>>>>> otherwise.
>>>>>>>
>>>>>>> If what you say is true, that on discrete it is _always_ WC,
>>>>>>> then
>>>>>>> that needs updating as well.
>>>>>>
>>>>>> If an object is allocated as system only, then it is mapped WB,
>>>>>> and
>>>>>> we're relying on the gpu being cache coherent to avoid
>>>>>> clflushes.
>>>>>> Same
>>>>>> is actually currently true if the object happens to be accessed
>>>>>> by
>>>>>> the
>>>>>> cpu while evicted. Might need an update for that.
>>>>>
>>>>> Hmm okay, I think I actually misunderstood something here. I
>>>>> think
>>>>> the
>>>>> reason for difference bbtween smem+lmem object which happens to
>>>>> be in
>>>>> smem and smem only object is eluding me.
>>>>>
>>>>>>>>
>>>>>>>> That's adhering to Linus'
>>>>>>>>
>>>>>>>> "And I sincerely hope to the gods that no cache-incoherent
>>>>>>>> i915
>>>>>>>> mess
>>>>>>>> ever makes it out of the x86 world. Incoherent IO was
>>>>>>>> always a
>>>>>>>> historical mistake and should never ever happen again, so
>>>>>>>> we
>>>>>>>> should
>>>>>>>> not spread that horrific pattern around."
>>>>>>>
>>>>>>> Sure, but I was not talking about IO - just the CPU side
>>>>>>> access
>>>>>>> to
>>>>>>> CPU side objects.
>>>>>>
>>>>>> OK, I was under the impression that clflushes() and wbinvd()s
>>>>>> in
>>>>>> i915
>>>>>> was only ever used to make data visible to non-snooping GPUs.
>>>>>>
>>>>>> Do you mean that there are other uses as well? Agreed the wb
>>>>>> cache
>>>>>> flush on on suspend only if gpu is
>>>>>> !I915_BO_CACHE_COHERENT_FOR_READ?
>>>>>> looks to not fit this pattern completely.
>>>>>
>>>>> Don't know, I was first trying to understand handling of the
>>>>> obj->cache_coherent as discussed in the first quote block. Are
>>>>> the
>>>>> flags
>>>>> consistently set and how the Arm low level code will look.
>>>>>
>>>>>> Otherwise, for architectures where memory isn't always fully
>>>>>> coherent
>>>>>> with the cpu cache, I'd expect them to use the apis in
>>>>>> asm/cacheflush.h, like flush_cache_range() and similar, which
>>>>>> are
>>>>>> nops
>>>>>> on x86.
>>>>>
>>>>> Hm do you know why there are no-ops? Like why wouldn't they map
>>>>> to
>>>>> clflush?
>>>>
>>>> I think it mostly boils down to the PIPT caches on x86. Everything
>>>> is
>>>> assumed to be coherent. Whereas some architextures keep different
>>>> cache
>>>> entries for different virtual addresses even if the physical page
>>>> is
>>>> the same...
>>>>
>>>> clflushes and wbinvds on x86 are for odd arch-specific situations
>>>> where, for example where we change caching attributes of the linear
>>>> kernel map mappings.
>>>
>>> So in summary we have flush_cache_range which is generic, not
>>> implemented on x86 and works with virtual addresses so not directly
>>> usable even if x86 implementation was added.
>>
>> I think for the intended flush_cache_range() semantics: "Make this
>> range visible to all vms on all cpus", I think the x86 implementation
>> is actually a nop, and correctly implemented.
>
> If that is so then I agree. (I did not spend much time looking for 
> desired semantics, just noticed there was no kerneldoc next to the 
> function and stopped there.)
>
>>> There is also x86 specific clflush_cache_range which works with
>>> virtual addresses as well so no good for drm_clflush_sg.
>>>
>>> Question you implicitly raise, correct me if I got it wrong, is
>>> whether we should even be trying to extend drm_clflush_sg for Arm,
>>> given how most (all?) call sites are not needed on discrete, is that
>>> right?
>>
>> Yes exactly. No need to bother figuring this out for ARM, as we don't
>> do any incoherent IO.
>>
>>>
>>> Would that mean we could leave most of the code as is and just
>>> replace wbinvd_on_all_cpus with something like i915_flush_cpu_caches,
>>> which would then legitimately do nothing, at least on Arm if not also
>>> on discrete in general?
>>
>> Yes, with the caveat that we should, at least as a second step, make
>> i915_flush_cpu_caches() range-based if possible from a performance
>> point of view.
>
> Sounds like a plan, and I am counting on the second step part to be 
> really second step. Because that one will need to actually figure out 
> and elaborate sufficiently all three proposed reverts, which was 
> missing in this posting. So first step unblocks Arm builds very 
> cheaply and non-controversially, second step tries going the range route.
>
>>> If that would work it would make a small and easy to review series. I
>>> don't think it would collide with what Linus asked since it is not
>>> propagating undesirable things further - given how if there is no
>>> actual need to flush then there is no need to make it range based
>>> either.
>>>
>>> Exception would be the dmabuf get pages patch which needs a proper
>>> implementation of a new drm flush helper.
>>
>> I think the dmabuf get_pages (note that that's also only for integrated
>> I915_CACHE_NONE x86-only situations), can be done with
>>
>> dma_buf_vmap(dma_buf, &virtual);
>> drm_clflush_virt_range(virtual, length);
>> dma_buf_vunmap(&virtual);
>
> Looks plausible to me. Downside being it vmaps the whole object at 
> once so may regress, at least on 32-bit (!) builds. Would it work in 
> theory to fall back to page by page but would it be worth it just for 
> 32-bit I am not sure.

Back in the days IIRC there was a kmap() api also for dma-buf. But 
nobody used it, and yes, vmap is not ideal but a simple fallback to 
page-based (or even wbinvd on the rare occasion of vmap error) might be ok.

/Thomas


>
> Regards,
>
> Tvrtko

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2022-03-22 15:07 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-19 19:42 [PATCH 0/4] Drop wbinvd_on_all_cpus usage Michael Cheng
2022-03-19 19:42 ` [Intel-gfx] " Michael Cheng
2022-03-19 19:42 ` [PATCH 1/4] i915/gem: drop " Michael Cheng
2022-03-19 19:42   ` [Intel-gfx] " Michael Cheng
2022-03-21 10:30   ` Tvrtko Ursulin
2022-03-21 10:30     ` [Intel-gfx] " Tvrtko Ursulin
2022-03-21 11:07     ` Thomas Hellström
2022-03-21 11:07       ` [Intel-gfx] " Thomas Hellström
2022-03-21 18:51       ` Michael Cheng
2022-03-21 18:51         ` [Intel-gfx] " Michael Cheng
2022-03-21 16:31     ` Michael Cheng
2022-03-21 16:31       ` [Intel-gfx] " Michael Cheng
2022-03-21 17:28       ` Tvrtko Ursulin
2022-03-21 17:28         ` [Intel-gfx] " Tvrtko Ursulin
2022-03-21 17:42         ` Michael Cheng
2022-03-21 17:42           ` [Intel-gfx] " Michael Cheng
2022-03-22 14:35           ` Daniel Vetter
2022-03-22 14:35             ` Daniel Vetter
2022-03-21 17:51         ` Michael Cheng
2022-03-21 17:51           ` [Intel-gfx] " Michael Cheng
2022-03-19 19:42 ` [PATCH 2/4] Revert "drm/i915/gem: Almagamate clflushes on suspend" Michael Cheng
2022-03-19 19:42   ` [Intel-gfx] " Michael Cheng
2022-03-19 19:42 ` [PATCH 3/4] i915/gem: Revert i915_gem_freeze to previous logic Michael Cheng
2022-03-19 19:42   ` [Intel-gfx] " Michael Cheng
2022-03-19 19:42 ` [PATCH 4/4] drm/i915/gt: Revert ggtt_resume " Michael Cheng
2022-03-19 19:42   ` [Intel-gfx] " Michael Cheng
2022-03-19 20:15 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Drop wbinvd_on_all_cpus usage Patchwork
2022-03-19 20:16 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-03-19 20:45 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-03-19 22:04 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2022-03-21 10:27 ` [PATCH 0/4] " Tvrtko Ursulin
2022-03-21 10:27   ` [Intel-gfx] " Tvrtko Ursulin
2022-03-21 11:03   ` Thomas Hellström
2022-03-21 11:03     ` [Intel-gfx] " Thomas Hellström
2022-03-21 12:22     ` Tvrtko Ursulin
2022-03-21 12:22       ` [Intel-gfx] " Tvrtko Ursulin
2022-03-21 12:33       ` Thomas Hellström
2022-03-21 12:33         ` [Intel-gfx] " Thomas Hellström
2022-03-21 13:12         ` Tvrtko Ursulin
2022-03-21 13:12           ` [Intel-gfx] " Tvrtko Ursulin
2022-03-21 13:40           ` Thomas Hellström
2022-03-21 13:40             ` [Intel-gfx] " Thomas Hellström
2022-03-21 14:43             ` Tvrtko Ursulin
2022-03-21 14:43               ` [Intel-gfx] " Tvrtko Ursulin
2022-03-21 15:15               ` Thomas Hellström
2022-03-21 15:15                 ` [Intel-gfx] " Thomas Hellström
2022-03-22 10:13                 ` Tvrtko Ursulin
2022-03-22 10:13                   ` [Intel-gfx] " Tvrtko Ursulin
2022-03-22 10:26                   ` Thomas Hellström
2022-03-22 10:26                     ` [Intel-gfx] " Thomas Hellström
2022-03-22 10:41                     ` Thomas Hellström
2022-03-22 10:41                       ` [Intel-gfx] " Thomas Hellström
2022-03-22 11:20                     ` Tvrtko Ursulin
2022-03-22 11:20                       ` [Intel-gfx] " Tvrtko Ursulin
2022-03-22 11:37                       ` Thomas Hellström
2022-03-22 11:37                         ` [Intel-gfx] " Thomas Hellström
2022-03-22 12:53                         ` Tvrtko Ursulin
2022-03-22 12:53                           ` [Intel-gfx] " Tvrtko Ursulin
2022-03-22 15:07                           ` Thomas Hellström
2022-03-22 15:07                             ` [Intel-gfx] " Thomas Hellström

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.