intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] gtt cache coherency checker for i830 class hw
@ 2010-05-09 11:41 Daniel Vetter
  2010-05-09 11:41 ` [PATCH 1/4] agp/intel-gtt: steal the last gtt page Daniel Vetter
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Daniel Vetter @ 2010-05-09 11:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Hi all,

This is part of my try-to-fix-i855-cache-coherency patch. It's essentially
everything save the actual hacks to fix the gtt chipset flush (that simply not
yet ready). The most important part is the cache coherency checker. I think
merging this without any other fixes is worth it for a few reasons:

- easier i8xx bug triaging: "random" crashes without any other applicable
  known issues but with some WARN_ONs due to failed chipset flushes
  suddenly get a decent explanation.
- better assessment of how widespread this problem is: Given that my
  Thinkpad X40 happily spits out these traces when under use but is
  otherwise rather stable, we'll likely see a surge of bug reports. Perhaps
  even a top-ten spot on kerneloops.org :( ...
- and one purely selfish reason: The patch I'm carrying around gets smaller
  ;)

I've thought about whether to include a bz link in the dmesg output, but
couldn't really decide one way or another. I fear that bz entry will get
swamped under by totally useless me-too-and-all-intel-devs-suck reports.
But on the other hand it's not really nice to waste testers time (creating
nice bug reports) by not telling them that this is a know issue and where
progress is tracked.

Comments highly welcome. And when testing, don't get scared - at least on
i855 getting the first failed flush before gdm finished drawing the login
screen is kinda expected ;)

Patches rebased against my two outstanding agp/gtt split-up patches, but should
work on top of latest drm-intel-next, too.

Yours, Daniel

Daniel Vetter (4):
  agp/intel-gtt: steal the last gtt page
  drm/i915: add locking around chipset flush
  agp/intel-gtt: check cache-coherency on i830 class chipsets
  agp/intel-gtt: extract mch buffer flush in i830 chipset flush

 drivers/char/agp/intel-gtt.c    |  163 +++++++++++++++++++++++++++++++--------
 drivers/gpu/drm/i915/i915_dma.c |    4 +-
 drivers/gpu/drm/i915/i915_gem.c |    3 +
 include/drm/intel-gtt.h         |    8 ++
 4 files changed, 146 insertions(+), 32 deletions(-)
 create mode 100644 include/drm/intel-gtt.h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] agp/intel-gtt: steal the last gtt page
  2010-05-09 11:41 [PATCH 0/4] gtt cache coherency checker for i830 class hw Daniel Vetter
@ 2010-05-09 11:41 ` Daniel Vetter
  2010-05-09 11:41 ` [PATCH 2/4] drm/i915: add locking around chipset flush Daniel Vetter
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Daniel Vetter @ 2010-05-09 11:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

This page will be used to check cache coherency on i8xx chips.

Furthermore gem in drm/i915 doesn't use the last page in the gtt
already to prevent pagefaults due to the gpu prefetcher crossing
into unmapped memory. So this page is useless, anyway.

This introduces include/drm/intel-gtt.h. Atm it only contains this
single define. But I've already noticed quite some code duplication
between the intel-agp and the i915 drm module. The idea is that this
new header file can be used to share some code between these two
modules.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/char/agp/intel-gtt.c    |   13 +++++++------
 drivers/gpu/drm/i915/i915_dma.c |    4 +++-
 include/drm/intel-gtt.h         |    6 ++++++
 3 files changed, 16 insertions(+), 7 deletions(-)
 create mode 100644 include/drm/intel-gtt.h

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index f7793ec..eec043b 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -24,6 +24,7 @@
 #include <asm/smp.h>
 #include "agp.h"
 #include "intel-agp.h"
+#include <drm/intel-gtt.h>
 
 /*
  * If we have Intel graphics, we're not going to have anything other than
@@ -37,9 +38,9 @@
 
 static const struct aper_size_info_fixed intel_i810_sizes[] =
 {
-	{64, 16384, 4},
+	{64, 16384 - I830_CC_DANCE_PAGES, 4},
 	/* The 32M mode still requires a 64k gatt */
-	{32, 8192, 4}
+	{32, 8192 - I830_CC_DANCE_PAGES, 4}
 };
 
 #define AGP_DCACHE_MEMORY	1
@@ -489,11 +490,11 @@ static unsigned long intel_i810_mask_memory(struct agp_bridge_data *bridge,
 
 static struct aper_size_info_fixed intel_i830_sizes[] =
 {
-	{128, 32768, 5},
+	{128, 32768 - I830_CC_DANCE_PAGES, 5},
 	/* The 64M mode still requires a 128k gatt */
-	{64, 16384, 5},
-	{256, 65536, 6},
-	{512, 131072, 7},
+	{64, 16384 - I830_CC_DANCE_PAGES, 5},
+	{256, 65536 - I830_CC_DANCE_PAGES, 6},
+	{512, 131072 - I830_CC_DANCE_PAGES, 7},
 };
 
 static void intel_i830_init_gtt_entries(void)
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 851a2f8..3e16938 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -39,6 +39,7 @@
 #include <linux/pnp.h>
 #include <linux/vga_switcheroo.h>
 #include <linux/slab.h>
+#include <drm/intel-gtt.h>
 
 /* Really want an OS-independent resettable timer.  Would like to have
  * this loop run for (eg) 3 sec, but have the timer reset every time
@@ -1448,7 +1449,8 @@ static int i915_load_modeset_init(struct drm_device *dev,
 	 * at the last page of the aperture.  One page should be enough to
 	 * keep any prefetching inside of the aperture.
 	 */
-	i915_gem_do_init(dev, prealloc_size, agp_size - 4096);
+	i915_gem_do_init(dev, prealloc_size,
+			 agp_size - I830_CC_DANCE_PAGES*4096);
 
 	mutex_lock(&dev->struct_mutex);
 	ret = i915_gem_init_ringbuffer(dev);
diff --git a/include/drm/intel-gtt.h b/include/drm/intel-gtt.h
new file mode 100644
index 0000000..6cec6d2
--- /dev/null
+++ b/include/drm/intel-gtt.h
@@ -0,0 +1,6 @@
+/* Header file to share declarations between the intel-agp module and the i915
+ * drm module
+ */
+
+/* This denotes how many pages intel-gtt steals at the end of the gart. */
+#define I830_CC_DANCE_PAGES 1
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] drm/i915: add locking around chipset flush
  2010-05-09 11:41 [PATCH 0/4] gtt cache coherency checker for i830 class hw Daniel Vetter
  2010-05-09 11:41 ` [PATCH 1/4] agp/intel-gtt: steal the last gtt page Daniel Vetter
@ 2010-05-09 11:41 ` Daniel Vetter
  2010-05-09 11:41 ` [PATCH 3/4] agp/intel-gtt: check cache-coherency on i830 class chipsets Daniel Vetter
  2010-05-09 11:41 ` [PATCH 4/4] agp/intel-gtt: extract mch buffer flush in i830 chipset flush Daniel Vetter
  3 siblings, 0 replies; 5+ messages in thread
From: Daniel Vetter @ 2010-05-09 11:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

My cache coherency checker for i8xx chipsets will make cache flushes
stateful. Therefore add some locking around the only caller that had
none. This is not a fast-path, anyway, so it won't hurt for the other
chipsets.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f04612f..04fe7f5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5234,7 +5234,10 @@ i915_gem_phys_pwrite(struct drm_device *dev, struct drm_gem_object *obj,
 	if (ret)
 		return -EFAULT;
 
+	mutex_lock(&dev->struct_mutex);
 	drm_agp_chipset_flush(dev);
+	mutex_unlock(&dev->struct_mutex);
+
 	return 0;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] agp/intel-gtt: check cache-coherency on i830 class chipsets
  2010-05-09 11:41 [PATCH 0/4] gtt cache coherency checker for i830 class hw Daniel Vetter
  2010-05-09 11:41 ` [PATCH 1/4] agp/intel-gtt: steal the last gtt page Daniel Vetter
  2010-05-09 11:41 ` [PATCH 2/4] drm/i915: add locking around chipset flush Daniel Vetter
@ 2010-05-09 11:41 ` Daniel Vetter
  2010-05-09 11:41 ` [PATCH 4/4] agp/intel-gtt: extract mch buffer flush in i830 chipset flush Daniel Vetter
  3 siblings, 0 replies; 5+ messages in thread
From: Daniel Vetter @ 2010-05-09 11:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

To quote the intel documentation (on i815, but judging by the code,
the gtt works the same on all i8xx chipsets):

"Note that the 4KB page of physical main memory (that have been mapped
to the graphics aperture through the GTT) must be accesssed strictly
through the aperture. The GMCH does not guarentee data coherency if any
of these pages are accessed directly using their physical addresses.

...

There is _no_ hardware support for ensuring coherency between these
two access paths."

In other words: We're trying to accomplish something explicitly verboten.

Therefore be paranoid and check coherency on every chipset flush.
This reveals that the current code is totally non-working. On my i855GM
cache coherency for cpu to gtt transfers fails 1% of the time. The other
direction is even worse and fails about every second time under load.

This check uses the last page of the gtt, which has been reserved in the
previous patch.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/char/agp/intel-gtt.c |  140 ++++++++++++++++++++++++++++++++++-------
 include/drm/intel-gtt.h      |    6 +-
 2 files changed, 120 insertions(+), 26 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index eec043b..bed0ed6 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -21,6 +21,7 @@
 #include <linux/kernel.h>
 #include <linux/pagemap.h>
 #include <linux/agp_backend.h>
+#include <linux/ratelimit.h>
 #include <asm/smp.h>
 #include "agp.h"
 #include "intel-agp.h"
@@ -68,11 +69,12 @@ static struct _intel_private {
 	 */
 	int gtt_entries;			/* i830+ */
 	int gtt_total_size;
-	union {
-		void __iomem *i9xx_flush_page;
-		void *i8xx_flush_page;
-	};
-	struct page *i8xx_page;
+	void __iomem *i9xx_flush_page;
+	void *i8xx_cpu_flush_page;
+	void *i8xx_cpu_check_page;
+	void __iomem *i8xx_gtt_cc_pages;
+	unsigned int i8xx_cache_flush_num;
+	struct page *i8xx_pages[I830_CC_DANCE_PAGES + 1];
 	struct resource ifp_resource;
 	int resource_valid;
 } intel_private;
@@ -736,27 +738,74 @@ static void intel_i830_init_gtt_entries(void)
 
 static void intel_i830_fini_flush(void)
 {
-	kunmap(intel_private.i8xx_page);
-	intel_private.i8xx_flush_page = NULL;
-	unmap_page_from_agp(intel_private.i8xx_page);
+	int i;
+
+	kunmap(intel_private.i8xx_pages[0]);
+	intel_private.i8xx_cpu_flush_page = NULL;
+	kunmap(intel_private.i8xx_pages[1]);
+	intel_private.i8xx_cpu_check_page = NULL;
+
+	if (intel_private.i8xx_gtt_cc_pages)
+		iounmap(intel_private.i8xx_gtt_cc_pages);
 
-	__free_page(intel_private.i8xx_page);
-	intel_private.i8xx_page = NULL;
+	for (i = 0; i < I830_CC_DANCE_PAGES + 1; i++) {
+		__free_page(intel_private.i8xx_pages[i]);
+		intel_private.i8xx_pages[i] = NULL;
+	}
 }
 
 static void intel_i830_setup_flush(void)
 {
+	int num_entries = A_SIZE_FIX(agp_bridge->current_size)->num_entries;
+	int i;
+
 	/* return if we've already set the flush mechanism up */
-	if (intel_private.i8xx_page)
-		return;
+	if (intel_private.i8xx_pages[0])
+		goto setup;
+
+	for (i = 0; i < I830_CC_DANCE_PAGES + 1; i++) {
+		intel_private.i8xx_pages[i]
+			= alloc_page(GFP_KERNEL | __GFP_ZERO | GFP_DMA32);
+		if (!intel_private.i8xx_pages[i]) {
+			intel_i830_fini_flush();
+			return;
+		}
+	}
 
-	intel_private.i8xx_page = alloc_page(GFP_KERNEL | __GFP_ZERO | GFP_DMA32);
-	if (!intel_private.i8xx_page)
+	intel_private.i8xx_cpu_check_page = kmap(intel_private.i8xx_pages[1]);
+	if (!intel_private.i8xx_cpu_check_page) {
+		WARN_ON(1);
+		intel_i830_fini_flush();
 		return;
-
-	intel_private.i8xx_flush_page = kmap(intel_private.i8xx_page);
-	if (!intel_private.i8xx_flush_page)
+	}
+	intel_private.i8xx_cpu_flush_page = kmap(intel_private.i8xx_pages[0]);
+	if (!intel_private.i8xx_cpu_flush_page) {
+		WARN_ON(1);
 		intel_i830_fini_flush();
+		return;
+	}
+
+	/* Map the flushing pages into the gtt as the last entries. The last
+	 * page can't be used by the gpu, anyway (prefetch might walk over the
+	 * end of the last page). */
+	intel_private.i8xx_gtt_cc_pages
+		= ioremap_wc(agp_bridge->gart_bus_addr
+				+ num_entries*4096,
+			     I830_CC_DANCE_PAGES*4096);
+
+	if (!intel_private.i8xx_gtt_cc_pages)
+		dev_info(&intel_private.pcidev->dev, "can't ioremap flush page - no chipset flushing");
+
+setup:
+	/* Don't map the first page, we only write via its physical address
+	 * into it. */
+	for (i = 0; i < I830_CC_DANCE_PAGES; i++) {
+		writel(agp_bridge->driver->mask_memory(agp_bridge,
+				page_to_phys(intel_private.i8xx_pages[i+1]), 0),
+		       intel_private.registers+I810_PTE_BASE+((num_entries+i)*4));
+	}
+
+	intel_private.i8xx_cache_flush_num = 0;
 }
 
 /* The chipset_flush interface needs to get data that has already been
@@ -769,16 +818,58 @@ static void intel_i830_setup_flush(void)
  * that buffer out, we just fill 1KB and clflush it out, on the assumption
  * that it'll push whatever was in there out.  It appears to work.
  */
-static void intel_i830_chipset_flush(struct agp_bridge_data *bridge)
-{
-	unsigned int *pg = intel_private.i8xx_flush_page;
 
-	memset(pg, 0, 1024);
+/* Complaining once a minute about cache incoherency is enough! */
+DEFINE_RATELIMIT_STATE(i8xx_chipset_flush_ratelimit_cpu, 60*HZ, 1);
+DEFINE_RATELIMIT_STATE(i8xx_chipset_flush_ratelimit_gtt, 60*HZ, 1);
 
-	if (cpu_has_clflush)
-		clflush_cache_range(pg, 1024);
-	else if (wbinvd_on_all_cpus() != 0)
+static void intel_i830_chipset_flush(struct agp_bridge_data *bridge)
+{
+	unsigned int offset1
+		= (intel_private.i8xx_cache_flush_num * sizeof(int)) % 4096;
+	unsigned int offset2
+		= (intel_private.i8xx_cache_flush_num * sizeof(int)
+				+ 2048) % 4096;
+	unsigned int *p_cpu_read = intel_private.i8xx_cpu_check_page + offset1;
+	unsigned int *p_cpu_write = intel_private.i8xx_cpu_check_page + offset2;
+	unsigned int gtt_read, cpu_read;
+
+	/* write check values */
+	*p_cpu_write = intel_private.i8xx_cache_flush_num;
+	mb();
+	if (cpu_has_clflush) {
+		clflush(p_cpu_write);
+		clflush(p_cpu_read);
+	} else
+		wbinvd_on_all_cpus();
+	writel(intel_private.i8xx_cache_flush_num,
+			intel_private.i8xx_gtt_cc_pages + offset1);
+	mb();
+
+	memset(intel_private.i8xx_cpu_flush_page, 0,
+	       I830_MCH_WRITE_BUFFER_SIZE);
+
+	if (cpu_has_clflush) {
+		clflush_cache_range(intel_private.i8xx_cpu_flush_page,
+				    I830_MCH_WRITE_BUFFER_SIZE);
+	} else if (wbinvd_on_all_cpus() != 0)
 		printk(KERN_ERR "Timed out waiting for cache flush.\n");
+
+	/* read check values */
+	mb();
+	gtt_read = readl(intel_private.i8xx_gtt_cc_pages + offset2);
+	cpu_read = *p_cpu_read;
+
+	WARN(cpu_read != intel_private.i8xx_cache_flush_num
+			&& __ratelimit(&i8xx_chipset_flush_ratelimit_cpu),
+		"i8xx chipset flush failed, expected: %u, cpu_read: %u\n",
+		intel_private.i8xx_cache_flush_num, cpu_read);
+	WARN(gtt_read != intel_private.i8xx_cache_flush_num
+			&& __ratelimit(&i8xx_chipset_flush_ratelimit_gtt),
+		"i8xx chipset flush failed, expected: %u, gtt_read: %u\n",
+		intel_private.i8xx_cache_flush_num, gtt_read);
+
+	intel_private.i8xx_cache_flush_num++;
 }
 
 /* The intel i830 automatically initializes the agp aperture during POST.
@@ -883,6 +974,7 @@ static int intel_i830_configure(void)
 	global_cache_flush();
 
 	intel_i830_setup_flush();
+
 	return 0;
 }
 
diff --git a/include/drm/intel-gtt.h b/include/drm/intel-gtt.h
index 6cec6d2..ab4da67 100644
--- a/include/drm/intel-gtt.h
+++ b/include/drm/intel-gtt.h
@@ -2,5 +2,7 @@
  * drm module
  */
 
-/* This denotes how many pages intel-gtt steals at the end of the gart. */
-#define I830_CC_DANCE_PAGES 1
+/* This denotes how many pages intel-gtt steals at the end of the gart:
+ * one page to check cache coherency on i830 */
+#define I830_CC_DANCE_PAGES		1
+#define I830_MCH_WRITE_BUFFER_SIZE	1024
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] agp/intel-gtt: extract mch buffer flush in i830 chipset flush
  2010-05-09 11:41 [PATCH 0/4] gtt cache coherency checker for i830 class hw Daniel Vetter
                   ` (2 preceding siblings ...)
  2010-05-09 11:41 ` [PATCH 3/4] agp/intel-gtt: check cache-coherency on i830 class chipsets Daniel Vetter
@ 2010-05-09 11:41 ` Daniel Vetter
  3 siblings, 0 replies; 5+ messages in thread
From: Daniel Vetter @ 2010-05-09 11:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Just a small clean up. The real fix will add tons of code here,
so it's nice to shrink the function a tad bit, first.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/char/agp/intel-gtt.c |   24 ++++++++++++++++--------
 1 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index bed0ed6..0277314 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -808,6 +808,20 @@ setup:
 	intel_private.i8xx_cache_flush_num = 0;
 }
 
+static void intel_flush_mch_write_buffer(void)
+{
+	memset(intel_private.i8xx_cpu_flush_page, 0,
+	       I830_MCH_WRITE_BUFFER_SIZE);
+
+	mb();
+	if (cpu_has_clflush) {
+		clflush_cache_range(intel_private.i8xx_cpu_flush_page,
+				    I830_MCH_WRITE_BUFFER_SIZE);
+	} else if (wbinvd_on_all_cpus() != 0)
+		printk(KERN_ERR "Timed out waiting for cache flush.\n");
+	mb();
+}
+
 /* The chipset_flush interface needs to get data that has already been
  * flushed out of the CPU all the way out to main memory, because the GPU
  * doesn't snoop those buffers.
@@ -846,14 +860,8 @@ static void intel_i830_chipset_flush(struct agp_bridge_data *bridge)
 			intel_private.i8xx_gtt_cc_pages + offset1);
 	mb();
 
-	memset(intel_private.i8xx_cpu_flush_page, 0,
-	       I830_MCH_WRITE_BUFFER_SIZE);
-
-	if (cpu_has_clflush) {
-		clflush_cache_range(intel_private.i8xx_cpu_flush_page,
-				    I830_MCH_WRITE_BUFFER_SIZE);
-	} else if (wbinvd_on_all_cpus() != 0)
-		printk(KERN_ERR "Timed out waiting for cache flush.\n");
+	/* start chipset flush */
+	intel_flush_mch_write_buffer();
 
 	/* read check values */
 	mb();
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-05-09 11:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-09 11:41 [PATCH 0/4] gtt cache coherency checker for i830 class hw Daniel Vetter
2010-05-09 11:41 ` [PATCH 1/4] agp/intel-gtt: steal the last gtt page Daniel Vetter
2010-05-09 11:41 ` [PATCH 2/4] drm/i915: add locking around chipset flush Daniel Vetter
2010-05-09 11:41 ` [PATCH 3/4] agp/intel-gtt: check cache-coherency on i830 class chipsets Daniel Vetter
2010-05-09 11:41 ` [PATCH 4/4] agp/intel-gtt: extract mch buffer flush in i830 chipset flush Daniel Vetter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).