All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] aliasing ppgtt support v2
@ 2011-11-28 20:35 Daniel Vetter
  2011-11-28 20:35 ` [PATCH 01/11] agp/intel-gtt: export the scratch page dma address Daniel Vetter
                   ` (8 more replies)
  0 siblings, 9 replies; 30+ messages in thread
From: Daniel Vetter @ 2011-11-28 20:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Hi all,

Changes since the last submission:
- fixed issues pointed out by Chris Wilson on irc.
- fixed an oops on pre-snb, shame on me for that one.
- added two new patches to only bind objects to the global gtt when required.
- added a new patch so that userspace can find out whether ppgtt is on. This is
  required to use MI_STORE/LOAD commands correctly from userspace batchbuffers.
  Luckily no currently released userspace code depends on this, it's just prep
  work.

Comments, test reports, reviews and flames highly welcome.

Yours, Daniel

Daniel Vetter (11):
  agp/intel-gtt: export the scratch page dma address
  agp/intel-gtt: export the gtt pagetable iomapping
  drm/i915: initialization/teardown for the aliasing ppgtt
  drm/i915: ppgtt binding/unbinding support
  drm/i915: ppgtt register definitions
  drm/i915: ppgtt debugfs info
  drm/i915: per-ring fault reg
  drm/i915: enable ppgtt
  drm/i915: split out dma mapping from global gtt bind/unbind functions
  drm/i915: bind objects to the global gtt only when needed
  drm/i915: add HAS_ALIASING_PPGTT parameter for userspace

 drivers/char/agp/intel-gtt.c               |   10 +-
 drivers/gpu/drm/i915/i915_debugfs.c        |   46 ++++-
 drivers/gpu/drm/i915/i915_dma.c            |   17 ++-
 drivers/gpu/drm/i915/i915_drv.c            |    2 +
 drivers/gpu/drm/i915/i915_drv.h            |   34 +++-
 drivers/gpu/drm/i915/i915_gem.c            |   68 ++++++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   19 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  316 +++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c            |   13 +-
 drivers/gpu/drm/i915/i915_reg.h            |   39 ++++
 include/drm/i915_drm.h                     |    1 +
 include/drm/intel-gtt.h                    |    4 +
 12 files changed, 521 insertions(+), 48 deletions(-)

-- 
1.7.7.3

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/11] agp/intel-gtt: export the scratch page dma address
  2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
@ 2011-11-28 20:35 ` Daniel Vetter
  2011-11-29 23:52   ` Ben Widawsky
  2011-11-28 20:35 ` [PATCH 02/11] agp/intel-gtt: export the gtt pagetable iomapping Daniel Vetter
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-28 20:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

To implement a PPGTT for drm/i915 that fully aliases the GTT, we also
need to properly alias the scratch page.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/char/agp/intel-gtt.c |    9 ++++-----
 include/drm/intel-gtt.h      |    2 ++
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index c92424c..0a305ac 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -76,7 +76,6 @@ static struct _intel_private {
 	struct resource ifp_resource;
 	int resource_valid;
 	struct page *scratch_page;
-	dma_addr_t scratch_page_dma;
 } intel_private;
 
 #define INTEL_GTT_GEN	intel_private.driver->gen
@@ -306,9 +305,9 @@ static int intel_gtt_setup_scratch_page(void)
 		if (pci_dma_mapping_error(intel_private.pcidev, dma_addr))
 			return -EINVAL;
 
-		intel_private.scratch_page_dma = dma_addr;
+		intel_private.base.scratch_page_dma = dma_addr;
 	} else
-		intel_private.scratch_page_dma = page_to_phys(page);
+		intel_private.base.scratch_page_dma = page_to_phys(page);
 
 	intel_private.scratch_page = page;
 
@@ -631,7 +630,7 @@ static unsigned int intel_gtt_mappable_entries(void)
 static void intel_gtt_teardown_scratch_page(void)
 {
 	set_pages_wb(intel_private.scratch_page, 1);
-	pci_unmap_page(intel_private.pcidev, intel_private.scratch_page_dma,
+	pci_unmap_page(intel_private.pcidev, intel_private.base.scratch_page_dma,
 		       PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
 	put_page(intel_private.scratch_page);
 	__free_page(intel_private.scratch_page);
@@ -975,7 +974,7 @@ void intel_gtt_clear_range(unsigned int first_entry, unsigned int num_entries)
 	unsigned int i;
 
 	for (i = first_entry; i < (first_entry + num_entries); i++) {
-		intel_private.driver->write_entry(intel_private.scratch_page_dma,
+		intel_private.driver->write_entry(intel_private.base.scratch_page_dma,
 						  i, 0);
 	}
 	readl(intel_private.gtt+i-1);
diff --git a/include/drm/intel-gtt.h b/include/drm/intel-gtt.h
index b174620..6d4c77a 100644
--- a/include/drm/intel-gtt.h
+++ b/include/drm/intel-gtt.h
@@ -15,6 +15,8 @@ const struct intel_gtt {
 	unsigned int needs_dmar : 1;
 	/* Whether we idle the gpu before mapping/unmapping */
 	unsigned int do_idle_maps : 1;
+	/* Share the scratch page dma with ppgtts. */
+	dma_addr_t scratch_page_dma;
 } *intel_gtt_get(void);
 
 void intel_gtt_chipset_flush(void);
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/11] agp/intel-gtt: export the gtt pagetable iomapping
  2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
  2011-11-28 20:35 ` [PATCH 01/11] agp/intel-gtt: export the scratch page dma address Daniel Vetter
@ 2011-11-28 20:35 ` Daniel Vetter
  2011-11-29 23:53   ` Ben Widawsky
  2011-11-28 20:35 ` [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt Daniel Vetter
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-28 20:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

We need this because ppgtt page directory entries need to be in the
global gtt pagetable.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/char/agp/intel-gtt.c |    1 +
 include/drm/intel-gtt.h      |    2 ++
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 0a305ac..5cf47ac 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -680,6 +680,7 @@ static int intel_gtt_init(void)
 		iounmap(intel_private.registers);
 		return -ENOMEM;
 	}
+	intel_private.base.gtt = intel_private.gtt;
 
 	global_cache_flush();   /* FIXME: ? */
 
diff --git a/include/drm/intel-gtt.h b/include/drm/intel-gtt.h
index 6d4c77a..0a0001b 100644
--- a/include/drm/intel-gtt.h
+++ b/include/drm/intel-gtt.h
@@ -17,6 +17,8 @@ const struct intel_gtt {
 	unsigned int do_idle_maps : 1;
 	/* Share the scratch page dma with ppgtts. */
 	dma_addr_t scratch_page_dma;
+	/* for ppgtt PDE access */
+	u32 __iomem *gtt;
 } *intel_gtt_get(void);
 
 void intel_gtt_chipset_flush(void);
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
  2011-11-28 20:35 ` [PATCH 01/11] agp/intel-gtt: export the scratch page dma address Daniel Vetter
  2011-11-28 20:35 ` [PATCH 02/11] agp/intel-gtt: export the gtt pagetable iomapping Daniel Vetter
@ 2011-11-28 20:35 ` Daniel Vetter
  2011-11-29 18:50   ` Ben Widawsky
                     ` (3 more replies)
  2011-11-28 20:35 ` [PATCH 04/11] drm/i915: ppgtt binding/unbinding support Daniel Vetter
                   ` (5 subsequent siblings)
  8 siblings, 4 replies; 30+ messages in thread
From: Daniel Vetter @ 2011-11-28 20:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

This just adds the setup and teardown code for the ppgtt PDE and the
last-level pagetables, which are fixed for the entire lifetime, at
least for the moment.

v2: Kill the stray debug printk noted by and improve the pte
definitions as suggested by Chris Wilson.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_dma.c     |   14 +++-
 drivers/gpu/drm/i915/i915_drv.h     |   18 +++++
 drivers/gpu/drm/i915/i915_gem_gtt.c |  133 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h     |   18 +++++
 4 files changed, 181 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index fd617fb..9d9a92c 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1199,13 +1199,21 @@ static int i915_load_gem_init(struct drm_device *dev)
 	 * at the last page of the aperture.  One page should be enough to
 	 * keep any prefetching inside of the aperture.
 	 */
-	i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE);
+	i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE - 512*PAGE_SIZE);
+
+	if (HAS_ALIASING_PPGTT(dev)) {
+		ret = i915_gem_init_aliasing_ppgtt(dev);
+		if (ret)
+			return ret;
+	}
 
 	mutex_lock(&dev->struct_mutex);
 	ret = i915_gem_init_hw(dev);
 	mutex_unlock(&dev->struct_mutex);
-	if (ret)
+	if (ret) {
+		i915_gem_cleanup_aliasing_ppgtt(dev);
 		return ret;
+	}
 
 	/* Try to set up FBC with a reasonable compressed buffer size */
 	if (I915_HAS_FBC(dev) && i915_powersave) {
@@ -1292,6 +1300,7 @@ cleanup_gem:
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_cleanup_ringbuffer(dev);
 	mutex_unlock(&dev->struct_mutex);
+	i915_gem_cleanup_aliasing_ppgtt(dev);
 cleanup_vga_switcheroo:
 	vga_switcheroo_unregister_client(dev->pdev);
 cleanup_vga_client:
@@ -2179,6 +2188,7 @@ int i915_driver_unload(struct drm_device *dev)
 		i915_gem_free_all_phys_object(dev);
 		i915_gem_cleanup_ringbuffer(dev);
 		mutex_unlock(&dev->struct_mutex);
+		i915_gem_cleanup_aliasing_ppgtt(dev);
 		if (I915_HAS_FBC(dev) && i915_powersave)
 			i915_cleanup_compression(dev);
 		drm_mm_takedown(&dev_priv->mm.stolen);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6f1830f..90865d2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -254,6 +254,16 @@ struct intel_device_info {
 	u8 has_blt_ring:1;
 };
 
+#define I915_PPGTT_PD_ENTRIES 512
+#define I915_PPGTT_PT_ENTRIES 1024
+struct i915_hw_ppgtt {
+	unsigned num_pd_entries;
+	struct page **pt_pages;
+	uint32_t pd_offset;
+	dma_addr_t *pt_dma_addr;
+	dma_addr_t scratch_page_dma_addr;
+};
+
 enum no_fbc_reason {
 	FBC_NO_OUTPUT, /* no outputs enabled to compress */
 	FBC_STOLEN_TOO_SMALL, /* not enough space to hold compressed buffers */
@@ -584,6 +594,9 @@ typedef struct drm_i915_private {
 		struct io_mapping *gtt_mapping;
 		int gtt_mtrr;
 
+		/** PPGTT used for aliasing the PPGTT with the GTT */
+		struct i915_hw_ppgtt *aliasing_ppgtt;
+
 		struct shrinker inactive_shrinker;
 
 		/**
@@ -976,6 +989,8 @@ struct drm_i915_file_private {
 #define HAS_BLT(dev)            (INTEL_INFO(dev)->has_blt_ring)
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
+#define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >=6)
+
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
 
@@ -1247,6 +1262,9 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 				    enum i915_cache_level cache_level);
 
 /* i915_gem_gtt.c */
+int __must_check i915_gem_init_aliasing_ppgtt(struct drm_device *dev);
+void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev);
+
 void i915_gem_restore_gtt_mappings(struct drm_device *dev);
 int __must_check i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj);
 void i915_gem_gtt_rebind_object(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6042c5e..bd9b520 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -29,6 +29,139 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
+/* PPGTT support for Sandybdrige/Gen6 and later */
+static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
+				   unsigned first_entry,
+				   unsigned num_entries)
+{
+	int i, j;
+	uint32_t *pt_vaddr;
+	uint32_t scratch_pte;
+
+	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
+	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
+
+		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
+			pt_vaddr[j] = scratch_pte;
+
+		kunmap_atomic(pt_vaddr);
+	}
+
+}
+
+int i915_gem_init_aliasing_ppgtt(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_hw_ppgtt *ppgtt;
+	uint32_t pd_entry;
+	uint32_t __iomem *pd_addr;
+	int i;
+	int ret = -ENOMEM;
+
+	ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
+	if (!ppgtt)
+		return ret;
+
+	ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES;
+	ppgtt->pt_pages = kzalloc(sizeof(struct page *)*ppgtt->num_pd_entries,
+				  GFP_KERNEL);
+	if (!ppgtt->pt_pages)
+		goto err_ppgtt;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pt_pages[i])
+			goto err_pt_alloc;
+	}
+
+	if (dev_priv->mm.gtt->needs_dmar) {
+		ppgtt->pt_dma_addr = kzalloc(sizeof(dma_addr_t)
+						*ppgtt->num_pd_entries,
+					     GFP_KERNEL);
+		if (!ppgtt->pt_dma_addr)
+			goto err_pt_alloc;
+	}
+
+	pd_addr = dev_priv->mm.gtt->gtt + 512*1024 - 512;
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		dma_addr_t pt_addr;
+		if (dev_priv->mm.gtt->needs_dmar) {
+			pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i],
+					       0, 4096,
+					       PCI_DMA_BIDIRECTIONAL);
+
+			if (pci_dma_mapping_error(dev->pdev,
+						  pt_addr)) {
+				ret = -EIO;
+				goto err_pd_pin;
+
+			}
+			ppgtt->pt_dma_addr[i] = pt_addr;
+		} else
+			pt_addr = page_to_phys(ppgtt->pt_pages[i]);
+
+		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
+		pd_entry |= GEN6_PDE_VALID;
+
+		writel(pd_entry, pd_addr + i);
+	}
+	readl(pd_addr);
+
+	ppgtt->scratch_page_dma_addr = dev_priv->mm.gtt->scratch_page_dma;
+
+	i915_ppgtt_clear_range(ppgtt, 0,
+			       ppgtt->num_pd_entries*I915_PPGTT_PT_ENTRIES);
+
+	ppgtt->pd_offset = (512*1024 - 512) * 4;
+
+	dev_priv->mm.aliasing_ppgtt = ppgtt;
+
+	return 0;
+
+err_pd_pin:
+	if (ppgtt->pt_dma_addr) {
+		for (i--; i >= 0; i--)
+			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
+				       4096, PCI_DMA_BIDIRECTIONAL);
+	}
+err_pt_alloc:
+	kfree(ppgtt->pt_dma_addr);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		if (ppgtt->pt_pages[i])
+			__free_page(ppgtt->pt_pages[i]);
+	}
+	kfree(ppgtt->pt_pages);
+err_ppgtt:
+	kfree(ppgtt);
+
+	return ret;
+}
+
+void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
+	int i;
+
+	if (!ppgtt)
+		return;
+
+	if (ppgtt->pt_dma_addr) {
+		for (i = 0; i < ppgtt->num_pd_entries; i++)
+			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
+				       4096, PCI_DMA_BIDIRECTIONAL);
+	}
+
+	kfree(ppgtt->pt_dma_addr);
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		__free_page(ppgtt->pt_pages[i]);
+	kfree(ppgtt->pt_pages);
+	kfree(ppgtt);
+}
+
 /* XXX kill agp_type! */
 static unsigned int cache_level_to_agp_type(struct drm_device *dev,
 					    enum i915_cache_level cache_level)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index e810723..e9f5dc8 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -92,6 +92,24 @@
 #define  GEN6_GRDOM_MEDIA		(1 << 2)
 #define  GEN6_GRDOM_BLT			(1 << 3)
 
+/* PPGTT stuff */
+#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
+
+#define GEN6_PDE_VALID			(1 << 0)
+#define GEN6_PDE_LARGE_PAGE		(2 << 0) /* use 32kb pages */
+/* gen6+ has bit 11-4 for physical addr bit 39-32 */
+#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+
+#define GEN6_PTE_VALID			(1 << 0)
+#define GEN6_PTE_UNCACHED		(1 << 1)
+#define GEN6_PTE_CACHE_LLC		(2 << 1)
+#define GEN6_PTE_CACHE_LLC_MLC		(3 << 1)
+#define GEN6_PTE_CACHE_BITS		(3 << 1)
+#define GEN6_PTE_GFDT			(1 << 3)
+#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+
+#define GEN6_PTES_PER_PD		1024
+
 /* VGA stuff */
 
 #define VGA_ST01_MDA 0x3ba
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/11] drm/i915: ppgtt binding/unbinding support
  2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
                   ` (2 preceding siblings ...)
  2011-11-28 20:35 ` [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt Daniel Vetter
@ 2011-11-28 20:35 ` Daniel Vetter
  2011-11-29 23:46   ` Ben Widawsky
  2011-11-28 20:35 ` [PATCH 05/11] drm/i915: ppgtt register definitions Daniel Vetter
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-28 20:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

This adds support to bind/unbind objects and wires it up. Objects are
only put into the ppgtt when necessary, i.e. at execbuf time.

Objects are still unconditionally put into the global gtt.

v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
like for the global gtt function. Noticed by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h            |    7 ++
 drivers/gpu/drm/i915/i915_gem.c            |   11 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    9 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c        |  146 ++++++++++++++++++++++++++-
 4 files changed, 167 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 90865d2..a6916e7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -854,6 +854,8 @@ struct drm_i915_gem_object {
 
 	unsigned int cache_level:2;
 
+	unsigned int has_aliasing_ppgtt_mapping:1;
+
 	struct page **pages;
 
 	/**
@@ -1264,6 +1266,11 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 /* i915_gem_gtt.c */
 int __must_check i915_gem_init_aliasing_ppgtt(struct drm_device *dev);
 void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev);
+void i915_ppgtt_bind_object(struct i915_hw_ppgtt *ppgtt,
+			    struct drm_i915_gem_object *obj,
+			    enum i915_cache_level cache_level);
+void i915_ppgtt_unbind_object(struct i915_hw_ppgtt *ppgtt,
+			      struct drm_i915_gem_object *obj);
 
 void i915_gem_restore_gtt_mappings(struct drm_device *dev);
 int __must_check i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 39459d2..593aa60 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2089,6 +2089,7 @@ static void i915_gem_object_finish_gtt(struct drm_i915_gem_object *obj)
 int
 i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 {
+	drm_i915_private_t *dev_priv = obj->base.dev->dev_private;
 	int ret = 0;
 
 	if (obj->gtt_space == NULL)
@@ -2133,6 +2134,11 @@ i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 	trace_i915_gem_object_unbind(obj);
 
 	i915_gem_gtt_unbind_object(obj);
+	if (obj->has_aliasing_ppgtt_mapping) {
+		i915_ppgtt_unbind_object(dev_priv->mm.aliasing_ppgtt, obj);
+		obj->has_aliasing_ppgtt_mapping = 0;
+	}
+
 	i915_gem_object_put_pages_gtt(obj);
 
 	list_del_init(&obj->gtt_list);
@@ -2949,6 +2955,8 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 				    enum i915_cache_level cache_level)
 {
+	struct drm_device *dev = obj->base.dev;
+	drm_i915_private_t *dev_priv = dev->dev_private;
 	int ret;
 
 	if (obj->cache_level == cache_level)
@@ -2977,6 +2985,9 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		}
 
 		i915_gem_gtt_rebind_object(obj, cache_level);
+		if (obj->has_aliasing_ppgtt_mapping)
+			i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt,
+					       obj, cache_level);
 	}
 
 	if (cache_level == I915_CACHE_NONE) {
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index c918124..9c81cda 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -513,6 +513,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 			    struct drm_file *file,
 			    struct list_head *objects)
 {
+	drm_i915_private_t *dev_priv = ring->dev->dev_private;
 	struct drm_i915_gem_object *obj;
 	int ret, retry;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
@@ -621,6 +622,14 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 			}
 
 			i915_gem_object_unpin(obj);
+
+			/* ... and ensure ppgtt mapping exist if needed. */
+			if (dev_priv->mm.aliasing_ppgtt && !obj->has_aliasing_ppgtt_mapping) {
+				i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt,
+						       obj, obj->cache_level);
+
+				obj->has_aliasing_ppgtt_mapping = 1;
+			}
 		}
 
 		if (ret != -ENOSPC || retry > 1)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd9b520..061ae12 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -34,22 +34,31 @@ static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
 				   unsigned first_entry,
 				   unsigned num_entries)
 {
-	int i, j;
 	uint32_t *pt_vaddr;
 	uint32_t scratch_pte;
+	unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES;
+	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
+	unsigned last_pte, i;
 
 	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
 	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
 
-	for (i = 0; i < ppgtt->num_pd_entries; i++) {
-		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
+	while (num_entries) {
+		last_pte = first_pte + num_entries;
+		if (last_pte > I915_PPGTT_PT_ENTRIES)
+			last_pte = I915_PPGTT_PT_ENTRIES;
+
+		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pd]);
 
-		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
-			pt_vaddr[j] = scratch_pte;
+		for (i = first_pte; i < last_pte; i++)
+			pt_vaddr[i] = scratch_pte;
 
 		kunmap_atomic(pt_vaddr);
-	}
 
+		num_entries -= last_pte - first_pte;
+		first_pte = 0;
+		act_pd++;
+	}
 }
 
 int i915_gem_init_aliasing_ppgtt(struct drm_device *dev)
@@ -162,6 +171,131 @@ void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev)
 	kfree(ppgtt);
 }
 
+static void i915_ppgtt_insert_sg_entries(struct i915_hw_ppgtt *ppgtt,
+					 struct scatterlist *sg_list,
+					 unsigned sg_len,
+					 unsigned first_entry,
+					 uint32_t pte_flags)
+{
+	uint32_t *pt_vaddr, pte;
+	unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES;
+	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
+	unsigned i, j, m, segment_len;
+	dma_addr_t page_addr;
+	struct scatterlist *sg;
+
+	/* init sg walking */
+	sg = sg_list;
+	i = 0;
+	segment_len = sg_dma_len(sg) >> PAGE_SHIFT;
+	m = 0;
+
+	while (i < sg_len) {
+		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pd]);
+
+		for (j = first_pte; j < I915_PPGTT_PT_ENTRIES; j++) {
+			page_addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
+			pte = GEN6_PTE_ADDR_ENCODE(page_addr);
+			pt_vaddr[j] = pte | pte_flags;
+
+			/* grab the next page */
+			m++;
+			if (m == segment_len) {
+				sg = sg_next(sg);
+				i++;
+				if (i == sg_len)
+					break;
+
+				segment_len = sg_dma_len(sg) >> PAGE_SHIFT;
+				m = 0;
+			}
+		}
+
+		kunmap_atomic(pt_vaddr);
+
+		first_pte = 0;
+		act_pd++;
+	}
+}
+
+static void i915_ppgtt_insert_pages(struct i915_hw_ppgtt *ppgtt,
+				    unsigned first_entry, unsigned num_entries,
+				    struct page **pages, uint32_t pte_flags)
+{
+	uint32_t *pt_vaddr, pte;
+	unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES;
+	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
+	unsigned last_pte, i;
+	dma_addr_t page_addr;
+
+	while (num_entries) {
+		last_pte = first_pte + num_entries;
+		last_pte = min_t(unsigned, last_pte, I915_PPGTT_PT_ENTRIES);
+
+		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pd]);
+
+		for (i = first_pte; i < last_pte; i++) {
+			page_addr = page_to_phys(*pages);
+			pte = GEN6_PTE_ADDR_ENCODE(page_addr);
+			pt_vaddr[i] = pte | pte_flags;
+
+			pages++;
+		}
+
+		kunmap_atomic(pt_vaddr);
+
+		num_entries -= last_pte - first_pte;
+		first_pte = 0;
+		act_pd++;
+	}
+}
+
+void i915_ppgtt_bind_object(struct i915_hw_ppgtt *ppgtt,
+			    struct drm_i915_gem_object *obj,
+			    enum i915_cache_level cache_level)
+{
+	struct drm_device *dev = obj->base.dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	uint32_t pte_flags = GEN6_PTE_VALID;
+
+	switch (cache_level) {
+	case I915_CACHE_LLC_MLC:
+		pte_flags |= GEN6_PTE_CACHE_LLC_MLC;
+		break;
+	case I915_CACHE_LLC:
+		pte_flags |= GEN6_PTE_CACHE_LLC;
+		break;
+	case I915_CACHE_NONE:
+		pte_flags |= GEN6_PTE_UNCACHED;
+		break;
+	default:
+		BUG();
+	}
+
+	if (dev_priv->mm.gtt->needs_dmar) {
+		BUG_ON(!obj->sg_list);
+
+		i915_ppgtt_insert_sg_entries(ppgtt,
+					     obj->sg_list,
+					     obj->num_sg,
+					     obj->gtt_space->start >> PAGE_SHIFT,
+					     pte_flags);
+	} else
+		i915_ppgtt_insert_pages(ppgtt,
+					obj->gtt_space->start >> PAGE_SHIFT,
+					obj->base.size >> PAGE_SHIFT,
+					obj->pages,
+					pte_flags);
+}
+
+void i915_ppgtt_unbind_object(struct i915_hw_ppgtt *ppgtt,
+			      struct drm_i915_gem_object *obj)
+{
+	i915_ppgtt_clear_range(ppgtt,
+			       obj->gtt_space->start >> PAGE_SHIFT,
+			       obj->base.size >> PAGE_SHIFT);
+}
+
 /* XXX kill agp_type! */
 static unsigned int cache_level_to_agp_type(struct drm_device *dev,
 					    enum i915_cache_level cache_level)
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/11] drm/i915: ppgtt register definitions
  2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
                   ` (3 preceding siblings ...)
  2011-11-28 20:35 ` [PATCH 04/11] drm/i915: ppgtt binding/unbinding support Daniel Vetter
@ 2011-11-28 20:35 ` Daniel Vetter
  2011-11-29 23:57   ` Ben Widawsky
  2011-11-28 20:35 ` [PATCH 06/11] drm/i915: ppgtt debugfs info Daniel Vetter
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-28 20:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_reg.h |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index e9f5dc8..7227446 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -86,6 +86,13 @@
 #define   GEN6_MBC_SNPCR_LOW	(2<<21)
 #define   GEN6_MBC_SNPCR_MIN	(3<<21) /* only 1/16th of the cache is shared */
 
+#define GEN6_MBCTL		0x0907c
+#define   GEN6_MBCTL_ENABLE_BOOT_FETCH	(1 << 4)
+#define   GEN6_MBCTL_CTX_FETCH_NEEDED	(1 << 3)
+#define   GEN6_MBCTL_BME_UPDATE_ENABLE	(1 << 2)
+#define   GEN6_MBCTL_MAE_UPDATE_ENABLE	(1 << 1)
+#define   GEN6_MBCTL_BOOT_FETCH_MECH	(1 << 0)
+
 #define GEN6_GDRST	0x941c
 #define  GEN6_GRDOM_FULL		(1 << 0)
 #define  GEN6_GRDOM_RENDER		(1 << 1)
@@ -110,6 +117,16 @@
 
 #define GEN6_PTES_PER_PD		1024
 
+#define RING_PP_DIR_BASE(ring)		((ring)->mmio_base+0x228)
+#define RING_PP_DIR_BASE_READ(ring)	((ring)->mmio_base+0x518)
+#define RING_PP_DIR_DCLV(ring)		((ring)->mmio_base+0x220)
+#define   PP_DIR_DCLV_2G		0xffffffff
+
+#define GAM_ECOCHK			0x4090
+#define   ECOCHK_SNB_BIT		(1<<10)
+#define   ECOCHK_PPGTT_CACHE64B		(0x3<<3)
+#define   ECOCHK_PPGTT_CACHE4B		(0x0<<3)
+
 /* VGA stuff */
 
 #define VGA_ST01_MDA 0x3ba
@@ -422,6 +439,7 @@
 
 #define GFX_MODE	0x02520
 #define GFX_MODE_GEN7	0x0229c
+#define RING_MODE_GEN7(ring)	((ring)->mmio_base+0x29c)
 #define   GFX_RUN_LIST_ENABLE		(1<<15)
 #define   GFX_TLB_INVALIDATE_ALWAYS	(1<<13)
 #define   GFX_SURFACE_FAULT_ENABLE	(1<<12)
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/11] drm/i915: ppgtt debugfs info
  2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
                   ` (4 preceding siblings ...)
  2011-11-28 20:35 ` [PATCH 05/11] drm/i915: ppgtt register definitions Daniel Vetter
@ 2011-11-28 20:35 ` Daniel Vetter
  2011-11-30  0:02   ` Ben Widawsky
  2011-11-28 20:35 ` [PATCH 07/11] drm/i915: per-ring fault reg Daniel Vetter
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-28 20:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c |   38 +++++++++++++++++++++++++++++++++++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e0bb93d..e3da225 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1397,6 +1397,43 @@ static int i915_swizzle_info(struct seq_file *m, void *data)
 	return 0;
 }
 
+static int i915_ppgtt_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_ring_buffer *ring;
+	int i, ret;
+
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		return ret;
+	if (INTEL_INFO(dev)->gen == 6)
+		seq_printf(m, "GFX_MODE: 0x%08x\n", I915_READ(GFX_MODE));
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+		ring = &dev_priv->ring[i];
+
+		seq_printf(m, "%s\n", ring->name);
+		if (INTEL_INFO(dev)->gen == 7)
+			seq_printf(m, "GFX_MODE: 0x%08x\n", I915_READ(RING_MODE_GEN7(ring)));
+		seq_printf(m, "PP_DIR_BASE: 0x%08x\n", I915_READ(RING_PP_DIR_BASE(ring)));
+		seq_printf(m, "PP_DIR_BASE_READ: 0x%08x\n", I915_READ(RING_PP_DIR_BASE_READ(ring)));
+		seq_printf(m, "PP_DIR_DCLV: 0x%08x\n", I915_READ(RING_PP_DIR_DCLV(ring)));
+	}
+	if (dev_priv->mm.aliasing_ppgtt) {
+		struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
+
+		seq_printf(m, "aliasing PPGTT:\n");
+		seq_printf(m, "pd gtt offset: 0x%08x\n", ppgtt->pd_offset);
+	}
+	seq_printf(m, "ECOCHK: 0x%08x\n", I915_READ(GAM_ECOCHK));
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+}
+
 static int
 i915_debugfs_common_open(struct inode *inode,
 			 struct file *filp)
@@ -1799,6 +1836,7 @@ static struct drm_info_list i915_debugfs_list[] = {
 	{"i915_context_status", i915_context_status, 0},
 	{"i915_gen6_forcewake_count", i915_gen6_forcewake_count_info, 0},
 	{"i915_swizzle_info", i915_swizzle_info, 0},
+	{"i915_ppgtt_info", i915_ppgtt_info, 0},
 };
 #define I915_DEBUGFS_ENTRIES ARRAY_SIZE(i915_debugfs_list)
 
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/11] drm/i915: per-ring fault reg
  2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
                   ` (5 preceding siblings ...)
  2011-11-28 20:35 ` [PATCH 06/11] drm/i915: ppgtt debugfs info Daniel Vetter
@ 2011-11-28 20:35 ` Daniel Vetter
  2011-11-30  0:13   ` Eugeni Dodonov
  2011-11-29  9:42 ` [PATCH 00/11] aliasing ppgtt support v2 Chris Wilson
  2011-11-30  0:11 ` Eugeni Dodonov
  8 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-28 20:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

v2: Chris Wilson suggested to allocate the error_state with kzalloc
for better paranioa. Also kill existing spurious clears of the
error_state while at it.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c |    8 ++++++--
 drivers/gpu/drm/i915/i915_drv.h     |    2 ++
 drivers/gpu/drm/i915/i915_irq.c     |   13 +++++++------
 drivers/gpu/drm/i915/i915_reg.h     |    3 +++
 4 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e3da225..919f072 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -755,8 +755,10 @@ static void i915_ring_error_state(struct seq_file *m,
 	if (INTEL_INFO(dev)->gen >= 4)
 		seq_printf(m, "  INSTPS: 0x%08x\n", error->instps[ring]);
 	seq_printf(m, "  INSTPM: 0x%08x\n", error->instpm[ring]);
-	if (INTEL_INFO(dev)->gen >= 6)
+	if (INTEL_INFO(dev)->gen >= 6) {
 		seq_printf(m, "  FADDR: 0x%08x\n", error->faddr[ring]);
+		seq_printf(m, "  FAULT_REG: 0x%08x\n", error->fault_reg[ring]);
+	}
 	seq_printf(m, "  seqno: 0x%08x\n", error->seqno[ring]);
 }
 
@@ -790,8 +792,10 @@ static int i915_error_state(struct seq_file *m, void *unused)
 	for (i = 0; i < dev_priv->num_fence_regs; i++)
 		seq_printf(m, "  fence[%d] = %08llx\n", i, error->fence[i]);
 
-	if (INTEL_INFO(dev)->gen >= 6) 
+	if (INTEL_INFO(dev)->gen >= 6) {
 		seq_printf(m, "ERROR: 0x%08x\n", error->error);
+		seq_printf(m, "DONE_REG: 0x%08x\n", error->done_reg);
+	}
 
 	i915_ring_error_state(m, dev, error, RCS);
 	if (HAS_BLT(dev))
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a6916e7..5db3b84 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -167,6 +167,8 @@ struct drm_i915_error_state {
 	u32 instdone1;
 	u32 seqno[I915_NUM_RINGS];
 	u64 bbaddr;
+	u32 fault_reg[I915_NUM_RINGS];
+	u32 done_reg;
 	u32 faddr[I915_NUM_RINGS];
 	u64 fence[I915_MAX_NUM_FENCES];
 	struct timeval time;
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 56f2b2f..26ac3bf 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -883,8 +883,10 @@ static void i915_record_ring_state(struct drm_device *dev,
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (INTEL_INFO(dev)->gen >= 6)
+	if (INTEL_INFO(dev)->gen >= 6) {
 		error->faddr[ring->id] = I915_READ(RING_DMA_FADD(ring->mmio_base));
+		error->fault_reg[ring->id] = I915_READ(RING_FAULT_REG(ring));
+	}
 
 	if (INTEL_INFO(dev)->gen >= 4) {
 		error->ipeir[ring->id] = I915_READ(RING_IPEIR(ring->mmio_base));
@@ -899,7 +901,6 @@ static void i915_record_ring_state(struct drm_device *dev,
 		error->ipeir[ring->id] = I915_READ(IPEIR);
 		error->ipehr[ring->id] = I915_READ(IPEHR);
 		error->instdone[ring->id] = I915_READ(INSTDONE);
-		error->bbaddr = 0;
 	}
 
 	error->instpm[ring->id] = I915_READ(RING_INSTPM(ring->mmio_base));
@@ -933,7 +934,7 @@ static void i915_capture_error_state(struct drm_device *dev)
 		return;
 
 	/* Account for pipe specific data like PIPE*STAT */
-	error = kmalloc(sizeof(*error), GFP_ATOMIC);
+	error = kzalloc(sizeof(*error), GFP_ATOMIC);
 	if (!error) {
 		DRM_DEBUG_DRIVER("out of memory, not capturing error state\n");
 		return;
@@ -948,10 +949,10 @@ static void i915_capture_error_state(struct drm_device *dev)
 	for_each_pipe(pipe)
 		error->pipestat[pipe] = I915_READ(PIPESTAT(pipe));
 
-	if (INTEL_INFO(dev)->gen >= 6)
+	if (INTEL_INFO(dev)->gen >= 6) {
 		error->error = I915_READ(ERROR_GEN6);
-	else
-		error->error = 0;
+		error->done_reg = I915_READ(DONE_REG);
+	}
 
 	i915_record_ring_state(dev, error, &dev_priv->ring[RCS]);
 	if (HAS_BLT(dev))
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 7227446..ab9873b 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -365,6 +365,9 @@
 #define   ARB_MODE_ENABLE(x)	GFX_MODE_ENABLE(x)
 #define   ARB_MODE_DISABLE(x)	GFX_MODE_DISABLE(x)
 #define RENDER_HWS_PGA_GEN7	(0x04080)
+#define RING_FAULT_REG(ring)	(0x4094 + 0x100*(ring)->id)
+#define ERROR			0x40a0
+#define DONE_REG		0x40b0
 #define BSD_HWS_PGA_GEN7	(0x04180)
 #define BLT_HWS_PGA_GEN7	(0x04280)
 #define RING_ACTHD(base)	((base)+0x74)
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] aliasing ppgtt support v2
  2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
                   ` (6 preceding siblings ...)
  2011-11-28 20:35 ` [PATCH 07/11] drm/i915: per-ring fault reg Daniel Vetter
@ 2011-11-29  9:42 ` Chris Wilson
  2011-11-30  0:11 ` Eugeni Dodonov
  8 siblings, 0 replies; 30+ messages in thread
From: Chris Wilson @ 2011-11-29  9:42 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

On Mon, 28 Nov 2011 21:35:27 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> 
> Hi all,
> 
> Changes since the last submission:
> - fixed issues pointed out by Chris Wilson on irc.
> - fixed an oops on pre-snb, shame on me for that one.
> - added two new patches to only bind objects to the global gtt when required.
> - added a new patch so that userspace can find out whether ppgtt is on. This is
>   required to use MI_STORE/LOAD commands correctly from userspace batchbuffers.
>   Luckily no currently released userspace code depends on this, it's just prep
>   work.
> 
> Comments, test reports, reviews and flames highly welcome.

The lazy-gtt is just missing a guard to ensure the buffer is bound in
the global gtt before reading through those PTE (impacts other code to
avoid allocating mappable GTT space). The beauty is that it forced me to
grok the rest of the lazy-gtt, it's deceptively simple.

Aside from the lazy-gtt sufficiently speeding up command submission and
reopening an old wound that prevents me from usefully analysing
performance, the series is tested-by myself. I've so far reviewed
everything but the actual mechanics of the PDE, which is not saying much
;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-28 20:35 ` [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt Daniel Vetter
@ 2011-11-29 18:50   ` Ben Widawsky
  2011-11-29 19:06     ` Daniel Vetter
  2011-11-29 23:29   ` [PATCH 03/11] " Ben Widawsky
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Ben Widawsky @ 2011-11-29 18:50 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, 28 Nov 2011 21:35:30 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> This just adds the setup and teardown code for the ppgtt PDE and the
> last-level pagetables, which are fixed for the entire lifetime, at
> least for the moment.
> 
> v2: Kill the stray debug printk noted by and improve the pte
> definitions as suggested by Chris Wilson.
> 
> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_dma.c     |   14 +++-
>  drivers/gpu/drm/i915/i915_drv.h     |   18 +++++
>  drivers/gpu/drm/i915/i915_gem_gtt.c |  133 +++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_reg.h     |   18 +++++
>  4 files changed, 181 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index fd617fb..9d9a92c 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1199,13 +1199,21 @@ static int i915_load_gem_init(struct drm_device *dev)
>  	 * at the last page of the aperture.  One page should be enough to
>  	 * keep any prefetching inside of the aperture.
>  	 */
> -	i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE);
> +	i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE - 512*PAGE_SIZE);
> +
> +	if (HAS_ALIASING_PPGTT(dev)) {
> +		ret = i915_gem_init_aliasing_ppgtt(dev);
> +		if (ret)
> +			return ret;
> +	}

Not sure if you fixed this based on our IRC conversation. The size
belongs in the HAS_ALIASING block. Also, I'd prefer NUM_PDEs or
something similar. Finally, if you put the PDEs at the top, couldn't we
get rid of the scratch page?

Ben

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-29 18:50   ` Ben Widawsky
@ 2011-11-29 19:06     ` Daniel Vetter
  2011-11-29 19:09       ` [PATCH] " Daniel Vetter
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-29 19:06 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx

On Tue, Nov 29, 2011 at 10:50:56AM -0800, Ben Widawsky wrote:
> On Mon, 28 Nov 2011 21:35:30 +0100
> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> 
> > This just adds the setup and teardown code for the ppgtt PDE and the
> > last-level pagetables, which are fixed for the entire lifetime, at
> > least for the moment.
> > 
> > v2: Kill the stray debug printk noted by and improve the pte
> > definitions as suggested by Chris Wilson.
> > 
> > Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/i915_dma.c     |   14 +++-
> >  drivers/gpu/drm/i915/i915_drv.h     |   18 +++++
> >  drivers/gpu/drm/i915/i915_gem_gtt.c |  133 +++++++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_reg.h     |   18 +++++
> >  4 files changed, 181 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > index fd617fb..9d9a92c 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -1199,13 +1199,21 @@ static int i915_load_gem_init(struct drm_device *dev)
> >  	 * at the last page of the aperture.  One page should be enough to
> >  	 * keep any prefetching inside of the aperture.
> >  	 */
> > -	i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE);
> > +	i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE - 512*PAGE_SIZE);
> > +
> > +	if (HAS_ALIASING_PPGTT(dev)) {
> > +		ret = i915_gem_init_aliasing_ppgtt(dev);
> > +		if (ret)
> > +			return ret;
> > +	}
> 
> Not sure if you fixed this based on our IRC conversation. The size
> belongs in the HAS_ALIASING block. Also, I'd prefer NUM_PDEs or
> something similar. Finally, if you put the PDEs at the top, couldn't we
> get rid of the scratch page?

Well, yet another reason this is about the wost patch series I've
submitted in the last few months. You're totally right, I'll move this in.
I'll keep the guard page though even for the ppgtt case, I feel safer with
the more paranoid choice and it's just 4kb of almost 2GB of gtt.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-29 19:06     ` Daniel Vetter
@ 2011-11-29 19:09       ` Daniel Vetter
  2011-11-29 19:54         ` Daniel Vetter
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-29 19:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky

This just adds the setup and teardown code for the ppgtt PDE and the
last-level pagetables, which are fixed for the entire lifetime, at
least for the moment.

v2: Kill the stray debug printk noted by and improve the pte
definitions as suggested by Chris Wilson.

v3: Clean up the aperture stealing code as noted by Ben Widawsky.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_dma.c     |   40 ++++++++---
 drivers/gpu/drm/i915/i915_drv.h     |   18 +++++
 drivers/gpu/drm/i915/i915_gem_gtt.c |  133 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h     |   18 +++++
 4 files changed, 198 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index fd617fb..9c56284 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1190,22 +1190,38 @@ static int i915_load_gem_init(struct drm_device *dev)
 	/* Basic memrange allocator for stolen space */
 	drm_mm_init(&dev_priv->mm.stolen, 0, prealloc_size);
 
-	/* Let GEM Manage all of the aperture.
-	 *
-	 * However, leave one page at the end still bound to the scratch page.
-	 * There are a number of places where the hardware apparently
-	 * prefetches past the end of the object, and we've seen multiple
-	 * hangs with the GPU head pointer stuck in a batchbuffer bound
-	 * at the last page of the aperture.  One page should be enough to
-	 * keep any prefetching inside of the aperture.
-	 */
-	i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE);
+	if (HAS_ALIASING_PPGTT(dev)) {
+		/* PPGTT pdes are stolen from global gtt ptes, so shrink the
+		 * aperture accordingly when using aliasing ppgtt. For paranoia
+		 * keep the guard page in between. */
+		i915_gem_do_init(dev, 0, mappable_size,
+				 gtt_size - PAGE_SIZE
+				     - I915_PPGTT_PD_ENTRIES*PAGE_SIZE);
+
+		ret = i915_gem_init_aliasing_ppgtt(dev);
+		if (ret)
+			return ret;
+	} else {
+		/* Let GEM Manage all of the aperture.
+		 *
+		 * However, leave one page at the end still bound to the scratch
+		 * page.  There are a number of places where the hardware
+		 * apparently prefetches past the end of the object, and we've
+		 * seen multiple hangs with the GPU head pointer stuck in a
+		 * batchbuffer bound at the last page of the aperture.  One page
+		 * should be enough to keep any prefetching inside of the
+		 * aperture.
+		 */
+		i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE);
+	}
 
 	mutex_lock(&dev->struct_mutex);
 	ret = i915_gem_init_hw(dev);
 	mutex_unlock(&dev->struct_mutex);
-	if (ret)
+	if (ret) {
+		i915_gem_cleanup_aliasing_ppgtt(dev);
 		return ret;
+	}
 
 	/* Try to set up FBC with a reasonable compressed buffer size */
 	if (I915_HAS_FBC(dev) && i915_powersave) {
@@ -1292,6 +1308,7 @@ cleanup_gem:
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_cleanup_ringbuffer(dev);
 	mutex_unlock(&dev->struct_mutex);
+	i915_gem_cleanup_aliasing_ppgtt(dev);
 cleanup_vga_switcheroo:
 	vga_switcheroo_unregister_client(dev->pdev);
 cleanup_vga_client:
@@ -2179,6 +2196,7 @@ int i915_driver_unload(struct drm_device *dev)
 		i915_gem_free_all_phys_object(dev);
 		i915_gem_cleanup_ringbuffer(dev);
 		mutex_unlock(&dev->struct_mutex);
+		i915_gem_cleanup_aliasing_ppgtt(dev);
 		if (I915_HAS_FBC(dev) && i915_powersave)
 			i915_cleanup_compression(dev);
 		drm_mm_takedown(&dev_priv->mm.stolen);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bdbd6d8..8b6f1b7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -254,6 +254,16 @@ struct intel_device_info {
 	u8 has_blt_ring:1;
 };
 
+#define I915_PPGTT_PD_ENTRIES 512
+#define I915_PPGTT_PT_ENTRIES 1024
+struct i915_hw_ppgtt {
+	unsigned num_pd_entries;
+	struct page **pt_pages;
+	uint32_t pd_offset;
+	dma_addr_t *pt_dma_addr;
+	dma_addr_t scratch_page_dma_addr;
+};
+
 enum no_fbc_reason {
 	FBC_NO_OUTPUT, /* no outputs enabled to compress */
 	FBC_STOLEN_TOO_SMALL, /* not enough space to hold compressed buffers */
@@ -584,6 +594,9 @@ typedef struct drm_i915_private {
 		struct io_mapping *gtt_mapping;
 		int gtt_mtrr;
 
+		/** PPGTT used for aliasing the PPGTT with the GTT */
+		struct i915_hw_ppgtt *aliasing_ppgtt;
+
 		struct shrinker inactive_shrinker;
 
 		/**
@@ -976,6 +989,8 @@ struct drm_i915_file_private {
 #define HAS_BLT(dev)            (INTEL_INFO(dev)->has_blt_ring)
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
+#define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >=6)
+
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
 
@@ -1247,6 +1262,9 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 				    enum i915_cache_level cache_level);
 
 /* i915_gem_gtt.c */
+int __must_check i915_gem_init_aliasing_ppgtt(struct drm_device *dev);
+void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev);
+
 void i915_gem_restore_gtt_mappings(struct drm_device *dev);
 int __must_check i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj);
 void i915_gem_gtt_rebind_object(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6042c5e..bd9b520 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -29,6 +29,139 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
+/* PPGTT support for Sandybdrige/Gen6 and later */
+static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
+				   unsigned first_entry,
+				   unsigned num_entries)
+{
+	int i, j;
+	uint32_t *pt_vaddr;
+	uint32_t scratch_pte;
+
+	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
+	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
+
+		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
+			pt_vaddr[j] = scratch_pte;
+
+		kunmap_atomic(pt_vaddr);
+	}
+
+}
+
+int i915_gem_init_aliasing_ppgtt(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_hw_ppgtt *ppgtt;
+	uint32_t pd_entry;
+	uint32_t __iomem *pd_addr;
+	int i;
+	int ret = -ENOMEM;
+
+	ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
+	if (!ppgtt)
+		return ret;
+
+	ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES;
+	ppgtt->pt_pages = kzalloc(sizeof(struct page *)*ppgtt->num_pd_entries,
+				  GFP_KERNEL);
+	if (!ppgtt->pt_pages)
+		goto err_ppgtt;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pt_pages[i])
+			goto err_pt_alloc;
+	}
+
+	if (dev_priv->mm.gtt->needs_dmar) {
+		ppgtt->pt_dma_addr = kzalloc(sizeof(dma_addr_t)
+						*ppgtt->num_pd_entries,
+					     GFP_KERNEL);
+		if (!ppgtt->pt_dma_addr)
+			goto err_pt_alloc;
+	}
+
+	pd_addr = dev_priv->mm.gtt->gtt + 512*1024 - 512;
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		dma_addr_t pt_addr;
+		if (dev_priv->mm.gtt->needs_dmar) {
+			pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i],
+					       0, 4096,
+					       PCI_DMA_BIDIRECTIONAL);
+
+			if (pci_dma_mapping_error(dev->pdev,
+						  pt_addr)) {
+				ret = -EIO;
+				goto err_pd_pin;
+
+			}
+			ppgtt->pt_dma_addr[i] = pt_addr;
+		} else
+			pt_addr = page_to_phys(ppgtt->pt_pages[i]);
+
+		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
+		pd_entry |= GEN6_PDE_VALID;
+
+		writel(pd_entry, pd_addr + i);
+	}
+	readl(pd_addr);
+
+	ppgtt->scratch_page_dma_addr = dev_priv->mm.gtt->scratch_page_dma;
+
+	i915_ppgtt_clear_range(ppgtt, 0,
+			       ppgtt->num_pd_entries*I915_PPGTT_PT_ENTRIES);
+
+	ppgtt->pd_offset = (512*1024 - 512) * 4;
+
+	dev_priv->mm.aliasing_ppgtt = ppgtt;
+
+	return 0;
+
+err_pd_pin:
+	if (ppgtt->pt_dma_addr) {
+		for (i--; i >= 0; i--)
+			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
+				       4096, PCI_DMA_BIDIRECTIONAL);
+	}
+err_pt_alloc:
+	kfree(ppgtt->pt_dma_addr);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		if (ppgtt->pt_pages[i])
+			__free_page(ppgtt->pt_pages[i]);
+	}
+	kfree(ppgtt->pt_pages);
+err_ppgtt:
+	kfree(ppgtt);
+
+	return ret;
+}
+
+void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
+	int i;
+
+	if (!ppgtt)
+		return;
+
+	if (ppgtt->pt_dma_addr) {
+		for (i = 0; i < ppgtt->num_pd_entries; i++)
+			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
+				       4096, PCI_DMA_BIDIRECTIONAL);
+	}
+
+	kfree(ppgtt->pt_dma_addr);
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		__free_page(ppgtt->pt_pages[i]);
+	kfree(ppgtt->pt_pages);
+	kfree(ppgtt);
+}
+
 /* XXX kill agp_type! */
 static unsigned int cache_level_to_agp_type(struct drm_device *dev,
 					    enum i915_cache_level cache_level)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index e810723..e9f5dc8 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -92,6 +92,24 @@
 #define  GEN6_GRDOM_MEDIA		(1 << 2)
 #define  GEN6_GRDOM_BLT			(1 << 3)
 
+/* PPGTT stuff */
+#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
+
+#define GEN6_PDE_VALID			(1 << 0)
+#define GEN6_PDE_LARGE_PAGE		(2 << 0) /* use 32kb pages */
+/* gen6+ has bit 11-4 for physical addr bit 39-32 */
+#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+
+#define GEN6_PTE_VALID			(1 << 0)
+#define GEN6_PTE_UNCACHED		(1 << 1)
+#define GEN6_PTE_CACHE_LLC		(2 << 1)
+#define GEN6_PTE_CACHE_LLC_MLC		(3 << 1)
+#define GEN6_PTE_CACHE_BITS		(3 << 1)
+#define GEN6_PTE_GFDT			(1 << 3)
+#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+
+#define GEN6_PTES_PER_PD		1024
+
 /* VGA stuff */
 
 #define VGA_ST01_MDA 0x3ba
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-29 19:09       ` [PATCH] " Daniel Vetter
@ 2011-11-29 19:54         ` Daniel Vetter
  2011-11-29 20:55           ` Daniel Vetter
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-29 19:54 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

This just adds the setup and teardown code for the ppgtt PDE and the
last-level pagetables, which are fixed for the entire lifetime, at
least for the moment.

v2: Kill the stray debug printk noted by and improve the pte
definitions as suggested by Chris Wilson.

v3: Clean up the aperture stealing code as noted by Ben Widawsky.

v4: Paint the init code in a more pleasing colour as suggest by Chris
Wilson.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_dma.c     |   41 ++++++++---
 drivers/gpu/drm/i915/i915_drv.h     |   18 +++++
 drivers/gpu/drm/i915/i915_gem_gtt.c |  133 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h     |   18 +++++
 4 files changed, 199 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index fd617fb..2cc0ba4 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1190,22 +1190,39 @@ static int i915_load_gem_init(struct drm_device *dev)
 	/* Basic memrange allocator for stolen space */
 	drm_mm_init(&dev_priv->mm.stolen, 0, prealloc_size);
 
-	/* Let GEM Manage all of the aperture.
-	 *
-	 * However, leave one page at the end still bound to the scratch page.
-	 * There are a number of places where the hardware apparently
-	 * prefetches past the end of the object, and we've seen multiple
-	 * hangs with the GPU head pointer stuck in a batchbuffer bound
-	 * at the last page of the aperture.  One page should be enough to
-	 * keep any prefetching inside of the aperture.
-	 */
-	i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE);
+	if (HAS_ALIASING_PPGTT(dev)) {
+		/* PPGTT pdes are stolen from global gtt ptes, so shrink the
+		 * aperture accordingly when using aliasing ppgtt. */
+		gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE;
+		/* For paranoia keep the guard page in between. */
+		gtt_size -= PAGE_SIZE;
+
+		i915_gem_do_init(dev, 0, mappable_size, gtt_size);
+
+		ret = i915_gem_init_aliasing_ppgtt(dev);
+		if (ret)
+			return ret;
+	} else {
+		/* Let GEM Manage all of the aperture.
+		 *
+		 * However, leave one page at the end still bound to the scratch
+		 * page.  There are a number of places where the hardware
+		 * apparently prefetches past the end of the object, and we've
+		 * seen multiple hangs with the GPU head pointer stuck in a
+		 * batchbuffer bound at the last page of the aperture.  One page
+		 * should be enough to keep any prefetching inside of the
+		 * aperture.
+		 */
+		i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE);
+	}
 
 	mutex_lock(&dev->struct_mutex);
 	ret = i915_gem_init_hw(dev);
 	mutex_unlock(&dev->struct_mutex);
-	if (ret)
+	if (ret) {
+		i915_gem_cleanup_aliasing_ppgtt(dev);
 		return ret;
+	}
 
 	/* Try to set up FBC with a reasonable compressed buffer size */
 	if (I915_HAS_FBC(dev) && i915_powersave) {
@@ -1292,6 +1309,7 @@ cleanup_gem:
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_cleanup_ringbuffer(dev);
 	mutex_unlock(&dev->struct_mutex);
+	i915_gem_cleanup_aliasing_ppgtt(dev);
 cleanup_vga_switcheroo:
 	vga_switcheroo_unregister_client(dev->pdev);
 cleanup_vga_client:
@@ -2179,6 +2197,7 @@ int i915_driver_unload(struct drm_device *dev)
 		i915_gem_free_all_phys_object(dev);
 		i915_gem_cleanup_ringbuffer(dev);
 		mutex_unlock(&dev->struct_mutex);
+		i915_gem_cleanup_aliasing_ppgtt(dev);
 		if (I915_HAS_FBC(dev) && i915_powersave)
 			i915_cleanup_compression(dev);
 		drm_mm_takedown(&dev_priv->mm.stolen);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bdbd6d8..8b6f1b7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -254,6 +254,16 @@ struct intel_device_info {
 	u8 has_blt_ring:1;
 };
 
+#define I915_PPGTT_PD_ENTRIES 512
+#define I915_PPGTT_PT_ENTRIES 1024
+struct i915_hw_ppgtt {
+	unsigned num_pd_entries;
+	struct page **pt_pages;
+	uint32_t pd_offset;
+	dma_addr_t *pt_dma_addr;
+	dma_addr_t scratch_page_dma_addr;
+};
+
 enum no_fbc_reason {
 	FBC_NO_OUTPUT, /* no outputs enabled to compress */
 	FBC_STOLEN_TOO_SMALL, /* not enough space to hold compressed buffers */
@@ -584,6 +594,9 @@ typedef struct drm_i915_private {
 		struct io_mapping *gtt_mapping;
 		int gtt_mtrr;
 
+		/** PPGTT used for aliasing the PPGTT with the GTT */
+		struct i915_hw_ppgtt *aliasing_ppgtt;
+
 		struct shrinker inactive_shrinker;
 
 		/**
@@ -976,6 +989,8 @@ struct drm_i915_file_private {
 #define HAS_BLT(dev)            (INTEL_INFO(dev)->has_blt_ring)
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
+#define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >=6)
+
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
 
@@ -1247,6 +1262,9 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 				    enum i915_cache_level cache_level);
 
 /* i915_gem_gtt.c */
+int __must_check i915_gem_init_aliasing_ppgtt(struct drm_device *dev);
+void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev);
+
 void i915_gem_restore_gtt_mappings(struct drm_device *dev);
 int __must_check i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj);
 void i915_gem_gtt_rebind_object(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6042c5e..bd9b520 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -29,6 +29,139 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
+/* PPGTT support for Sandybdrige/Gen6 and later */
+static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
+				   unsigned first_entry,
+				   unsigned num_entries)
+{
+	int i, j;
+	uint32_t *pt_vaddr;
+	uint32_t scratch_pte;
+
+	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
+	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
+
+		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
+			pt_vaddr[j] = scratch_pte;
+
+		kunmap_atomic(pt_vaddr);
+	}
+
+}
+
+int i915_gem_init_aliasing_ppgtt(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_hw_ppgtt *ppgtt;
+	uint32_t pd_entry;
+	uint32_t __iomem *pd_addr;
+	int i;
+	int ret = -ENOMEM;
+
+	ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
+	if (!ppgtt)
+		return ret;
+
+	ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES;
+	ppgtt->pt_pages = kzalloc(sizeof(struct page *)*ppgtt->num_pd_entries,
+				  GFP_KERNEL);
+	if (!ppgtt->pt_pages)
+		goto err_ppgtt;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pt_pages[i])
+			goto err_pt_alloc;
+	}
+
+	if (dev_priv->mm.gtt->needs_dmar) {
+		ppgtt->pt_dma_addr = kzalloc(sizeof(dma_addr_t)
+						*ppgtt->num_pd_entries,
+					     GFP_KERNEL);
+		if (!ppgtt->pt_dma_addr)
+			goto err_pt_alloc;
+	}
+
+	pd_addr = dev_priv->mm.gtt->gtt + 512*1024 - 512;
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		dma_addr_t pt_addr;
+		if (dev_priv->mm.gtt->needs_dmar) {
+			pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i],
+					       0, 4096,
+					       PCI_DMA_BIDIRECTIONAL);
+
+			if (pci_dma_mapping_error(dev->pdev,
+						  pt_addr)) {
+				ret = -EIO;
+				goto err_pd_pin;
+
+			}
+			ppgtt->pt_dma_addr[i] = pt_addr;
+		} else
+			pt_addr = page_to_phys(ppgtt->pt_pages[i]);
+
+		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
+		pd_entry |= GEN6_PDE_VALID;
+
+		writel(pd_entry, pd_addr + i);
+	}
+	readl(pd_addr);
+
+	ppgtt->scratch_page_dma_addr = dev_priv->mm.gtt->scratch_page_dma;
+
+	i915_ppgtt_clear_range(ppgtt, 0,
+			       ppgtt->num_pd_entries*I915_PPGTT_PT_ENTRIES);
+
+	ppgtt->pd_offset = (512*1024 - 512) * 4;
+
+	dev_priv->mm.aliasing_ppgtt = ppgtt;
+
+	return 0;
+
+err_pd_pin:
+	if (ppgtt->pt_dma_addr) {
+		for (i--; i >= 0; i--)
+			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
+				       4096, PCI_DMA_BIDIRECTIONAL);
+	}
+err_pt_alloc:
+	kfree(ppgtt->pt_dma_addr);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		if (ppgtt->pt_pages[i])
+			__free_page(ppgtt->pt_pages[i]);
+	}
+	kfree(ppgtt->pt_pages);
+err_ppgtt:
+	kfree(ppgtt);
+
+	return ret;
+}
+
+void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
+	int i;
+
+	if (!ppgtt)
+		return;
+
+	if (ppgtt->pt_dma_addr) {
+		for (i = 0; i < ppgtt->num_pd_entries; i++)
+			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
+				       4096, PCI_DMA_BIDIRECTIONAL);
+	}
+
+	kfree(ppgtt->pt_dma_addr);
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		__free_page(ppgtt->pt_pages[i]);
+	kfree(ppgtt->pt_pages);
+	kfree(ppgtt);
+}
+
 /* XXX kill agp_type! */
 static unsigned int cache_level_to_agp_type(struct drm_device *dev,
 					    enum i915_cache_level cache_level)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index e810723..e9f5dc8 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -92,6 +92,24 @@
 #define  GEN6_GRDOM_MEDIA		(1 << 2)
 #define  GEN6_GRDOM_BLT			(1 << 3)
 
+/* PPGTT stuff */
+#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
+
+#define GEN6_PDE_VALID			(1 << 0)
+#define GEN6_PDE_LARGE_PAGE		(2 << 0) /* use 32kb pages */
+/* gen6+ has bit 11-4 for physical addr bit 39-32 */
+#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+
+#define GEN6_PTE_VALID			(1 << 0)
+#define GEN6_PTE_UNCACHED		(1 << 1)
+#define GEN6_PTE_CACHE_LLC		(2 << 1)
+#define GEN6_PTE_CACHE_LLC_MLC		(3 << 1)
+#define GEN6_PTE_CACHE_BITS		(3 << 1)
+#define GEN6_PTE_GFDT			(1 << 3)
+#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+
+#define GEN6_PTES_PER_PD		1024
+
 /* VGA stuff */
 
 #define VGA_ST01_MDA 0x3ba
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-29 19:54         ` Daniel Vetter
@ 2011-11-29 20:55           ` Daniel Vetter
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Vetter @ 2011-11-29 20:55 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

This just adds the setup and teardown code for the ppgtt PDE and the
last-level pagetables, which are fixed for the entire lifetime, at
least for the moment.

v2: Kill the stray debug printk noted by and improve the pte
definitions as suggested by Chris Wilson.

v3: Clean up the aperture stealing code as noted by Ben Widawsky.

v4: Paint the init code in a more pleasing colour as suggest by Chris
Wilson.

v5: Explain the magic numbers noticed by Ben Widawsky.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_dma.c     |   41 ++++++++---
 drivers/gpu/drm/i915/i915_drv.h     |   18 +++++
 drivers/gpu/drm/i915/i915_gem_gtt.c |  139 +++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h     |   18 +++++
 4 files changed, 205 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index fd617fb..2cc0ba4 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1190,22 +1190,39 @@ static int i915_load_gem_init(struct drm_device *dev)
 	/* Basic memrange allocator for stolen space */
 	drm_mm_init(&dev_priv->mm.stolen, 0, prealloc_size);
 
-	/* Let GEM Manage all of the aperture.
-	 *
-	 * However, leave one page at the end still bound to the scratch page.
-	 * There are a number of places where the hardware apparently
-	 * prefetches past the end of the object, and we've seen multiple
-	 * hangs with the GPU head pointer stuck in a batchbuffer bound
-	 * at the last page of the aperture.  One page should be enough to
-	 * keep any prefetching inside of the aperture.
-	 */
-	i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE);
+	if (HAS_ALIASING_PPGTT(dev)) {
+		/* PPGTT pdes are stolen from global gtt ptes, so shrink the
+		 * aperture accordingly when using aliasing ppgtt. */
+		gtt_size -= I915_PPGTT_PD_ENTRIES*PAGE_SIZE;
+		/* For paranoia keep the guard page in between. */
+		gtt_size -= PAGE_SIZE;
+
+		i915_gem_do_init(dev, 0, mappable_size, gtt_size);
+
+		ret = i915_gem_init_aliasing_ppgtt(dev);
+		if (ret)
+			return ret;
+	} else {
+		/* Let GEM Manage all of the aperture.
+		 *
+		 * However, leave one page at the end still bound to the scratch
+		 * page.  There are a number of places where the hardware
+		 * apparently prefetches past the end of the object, and we've
+		 * seen multiple hangs with the GPU head pointer stuck in a
+		 * batchbuffer bound at the last page of the aperture.  One page
+		 * should be enough to keep any prefetching inside of the
+		 * aperture.
+		 */
+		i915_gem_do_init(dev, 0, mappable_size, gtt_size - PAGE_SIZE);
+	}
 
 	mutex_lock(&dev->struct_mutex);
 	ret = i915_gem_init_hw(dev);
 	mutex_unlock(&dev->struct_mutex);
-	if (ret)
+	if (ret) {
+		i915_gem_cleanup_aliasing_ppgtt(dev);
 		return ret;
+	}
 
 	/* Try to set up FBC with a reasonable compressed buffer size */
 	if (I915_HAS_FBC(dev) && i915_powersave) {
@@ -1292,6 +1309,7 @@ cleanup_gem:
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_cleanup_ringbuffer(dev);
 	mutex_unlock(&dev->struct_mutex);
+	i915_gem_cleanup_aliasing_ppgtt(dev);
 cleanup_vga_switcheroo:
 	vga_switcheroo_unregister_client(dev->pdev);
 cleanup_vga_client:
@@ -2179,6 +2197,7 @@ int i915_driver_unload(struct drm_device *dev)
 		i915_gem_free_all_phys_object(dev);
 		i915_gem_cleanup_ringbuffer(dev);
 		mutex_unlock(&dev->struct_mutex);
+		i915_gem_cleanup_aliasing_ppgtt(dev);
 		if (I915_HAS_FBC(dev) && i915_powersave)
 			i915_cleanup_compression(dev);
 		drm_mm_takedown(&dev_priv->mm.stolen);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index bdbd6d8..8b6f1b7 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -254,6 +254,16 @@ struct intel_device_info {
 	u8 has_blt_ring:1;
 };
 
+#define I915_PPGTT_PD_ENTRIES 512
+#define I915_PPGTT_PT_ENTRIES 1024
+struct i915_hw_ppgtt {
+	unsigned num_pd_entries;
+	struct page **pt_pages;
+	uint32_t pd_offset;
+	dma_addr_t *pt_dma_addr;
+	dma_addr_t scratch_page_dma_addr;
+};
+
 enum no_fbc_reason {
 	FBC_NO_OUTPUT, /* no outputs enabled to compress */
 	FBC_STOLEN_TOO_SMALL, /* not enough space to hold compressed buffers */
@@ -584,6 +594,9 @@ typedef struct drm_i915_private {
 		struct io_mapping *gtt_mapping;
 		int gtt_mtrr;
 
+		/** PPGTT used for aliasing the PPGTT with the GTT */
+		struct i915_hw_ppgtt *aliasing_ppgtt;
+
 		struct shrinker inactive_shrinker;
 
 		/**
@@ -976,6 +989,8 @@ struct drm_i915_file_private {
 #define HAS_BLT(dev)            (INTEL_INFO(dev)->has_blt_ring)
 #define I915_NEED_GFX_HWS(dev)	(INTEL_INFO(dev)->need_gfx_hws)
 
+#define HAS_ALIASING_PPGTT(dev)	(INTEL_INFO(dev)->gen >=6)
+
 #define HAS_OVERLAY(dev)		(INTEL_INFO(dev)->has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev)	(INTEL_INFO(dev)->overlay_needs_physical)
 
@@ -1247,6 +1262,9 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 				    enum i915_cache_level cache_level);
 
 /* i915_gem_gtt.c */
+int __must_check i915_gem_init_aliasing_ppgtt(struct drm_device *dev);
+void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev);
+
 void i915_gem_restore_gtt_mappings(struct drm_device *dev);
 int __must_check i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj);
 void i915_gem_gtt_rebind_object(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 6042c5e..548b885 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -29,6 +29,145 @@
 #include "i915_trace.h"
 #include "intel_drv.h"
 
+/* PPGTT support for Sandybdrige/Gen6 and later */
+static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
+				   unsigned first_entry,
+				   unsigned num_entries)
+{
+	int i, j;
+	uint32_t *pt_vaddr;
+	uint32_t scratch_pte;
+
+	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
+	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
+
+		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
+			pt_vaddr[j] = scratch_pte;
+
+		kunmap_atomic(pt_vaddr);
+	}
+
+}
+
+int i915_gem_init_aliasing_ppgtt(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_hw_ppgtt *ppgtt;
+	uint32_t pd_entry;
+	unsigned first_pd_entry_in_global_pt;
+	uint32_t __iomem *pd_addr;
+	int i;
+	int ret = -ENOMEM;
+
+	/* ppgtt PDEs reside in the global gtt pagetable, which has 512*1024
+	 * entries. For aliasing ppgtt support we just steal them at the end for
+	 * now. */
+	first_pd_entry_in_global_pt = 512*1024 - I915_PPGTT_PD_ENTRIES;
+
+	ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL);
+	if (!ppgtt)
+		return ret;
+
+	ppgtt->num_pd_entries = I915_PPGTT_PD_ENTRIES;
+	ppgtt->pt_pages = kzalloc(sizeof(struct page *)*ppgtt->num_pd_entries,
+				  GFP_KERNEL);
+	if (!ppgtt->pt_pages)
+		goto err_ppgtt;
+
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		ppgtt->pt_pages[i] = alloc_page(GFP_KERNEL);
+		if (!ppgtt->pt_pages[i])
+			goto err_pt_alloc;
+	}
+
+	if (dev_priv->mm.gtt->needs_dmar) {
+		ppgtt->pt_dma_addr = kzalloc(sizeof(dma_addr_t)
+						*ppgtt->num_pd_entries,
+					     GFP_KERNEL);
+		if (!ppgtt->pt_dma_addr)
+			goto err_pt_alloc;
+	}
+
+	pd_addr = dev_priv->mm.gtt->gtt + first_pd_entry_in_global_pt;
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		dma_addr_t pt_addr;
+		if (dev_priv->mm.gtt->needs_dmar) {
+			pt_addr = pci_map_page(dev->pdev, ppgtt->pt_pages[i],
+					       0, 4096,
+					       PCI_DMA_BIDIRECTIONAL);
+
+			if (pci_dma_mapping_error(dev->pdev,
+						  pt_addr)) {
+				ret = -EIO;
+				goto err_pd_pin;
+
+			}
+			ppgtt->pt_dma_addr[i] = pt_addr;
+		} else
+			pt_addr = page_to_phys(ppgtt->pt_pages[i]);
+
+		pd_entry = GEN6_PDE_ADDR_ENCODE(pt_addr);
+		pd_entry |= GEN6_PDE_VALID;
+
+		writel(pd_entry, pd_addr + i);
+	}
+	readl(pd_addr);
+
+	ppgtt->scratch_page_dma_addr = dev_priv->mm.gtt->scratch_page_dma;
+
+	i915_ppgtt_clear_range(ppgtt, 0,
+			       ppgtt->num_pd_entries*I915_PPGTT_PT_ENTRIES);
+
+	ppgtt->pd_offset = (first_pd_entry_in_global_pt)*sizeof(uint32_t);
+
+	dev_priv->mm.aliasing_ppgtt = ppgtt;
+
+	return 0;
+
+err_pd_pin:
+	if (ppgtt->pt_dma_addr) {
+		for (i--; i >= 0; i--)
+			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
+				       4096, PCI_DMA_BIDIRECTIONAL);
+	}
+err_pt_alloc:
+	kfree(ppgtt->pt_dma_addr);
+	for (i = 0; i < ppgtt->num_pd_entries; i++) {
+		if (ppgtt->pt_pages[i])
+			__free_page(ppgtt->pt_pages[i]);
+	}
+	kfree(ppgtt->pt_pages);
+err_ppgtt:
+	kfree(ppgtt);
+
+	return ret;
+}
+
+void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct i915_hw_ppgtt *ppgtt = dev_priv->mm.aliasing_ppgtt;
+	int i;
+
+	if (!ppgtt)
+		return;
+
+	if (ppgtt->pt_dma_addr) {
+		for (i = 0; i < ppgtt->num_pd_entries; i++)
+			pci_unmap_page(dev->pdev, ppgtt->pt_dma_addr[i],
+				       4096, PCI_DMA_BIDIRECTIONAL);
+	}
+
+	kfree(ppgtt->pt_dma_addr);
+	for (i = 0; i < ppgtt->num_pd_entries; i++)
+		__free_page(ppgtt->pt_pages[i]);
+	kfree(ppgtt->pt_pages);
+	kfree(ppgtt);
+}
+
 /* XXX kill agp_type! */
 static unsigned int cache_level_to_agp_type(struct drm_device *dev,
 					    enum i915_cache_level cache_level)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index e810723..e9f5dc8 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -92,6 +92,24 @@
 #define  GEN6_GRDOM_MEDIA		(1 << 2)
 #define  GEN6_GRDOM_BLT			(1 << 3)
 
+/* PPGTT stuff */
+#define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
+
+#define GEN6_PDE_VALID			(1 << 0)
+#define GEN6_PDE_LARGE_PAGE		(2 << 0) /* use 32kb pages */
+/* gen6+ has bit 11-4 for physical addr bit 39-32 */
+#define GEN6_PDE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+
+#define GEN6_PTE_VALID			(1 << 0)
+#define GEN6_PTE_UNCACHED		(1 << 1)
+#define GEN6_PTE_CACHE_LLC		(2 << 1)
+#define GEN6_PTE_CACHE_LLC_MLC		(3 << 1)
+#define GEN6_PTE_CACHE_BITS		(3 << 1)
+#define GEN6_PTE_GFDT			(1 << 3)
+#define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
+
+#define GEN6_PTES_PER_PD		1024
+
 /* VGA stuff */
 
 #define VGA_ST01_MDA 0x3ba
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-28 20:35 ` [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt Daniel Vetter
  2011-11-29 18:50   ` Ben Widawsky
@ 2011-11-29 23:29   ` Ben Widawsky
  2011-11-30  8:07     ` Daniel Vetter
  2011-11-29 23:41   ` Ben Widawsky
  2011-11-29 23:49   ` Ben Widawsky
  3 siblings, 1 reply; 30+ messages in thread
From: Ben Widawsky @ 2011-11-29 23:29 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Nov 28, 2011 at 09:35:30PM +0100, Daniel Vetter wrote:
> This just adds the setup and teardown code for the ppgtt PDE and the
> last-level pagetables, which are fixed for the entire lifetime, at
> least for the moment.
> 
> v2: Kill the stray debug printk noted by and improve the pte
> definitions as suggested by Chris Wilson.
> 
> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The last patch you sent out seems fine to me. Just one final unrelated
question.
> ---

> +/* PPGTT support for Sandybdrige/Gen6 and later */
> +static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
> +				   unsigned first_entry,
> +				   unsigned num_entries)
> +{
> +	int i, j;
> +	uint32_t *pt_vaddr;
> +	uint32_t scratch_pte;
> +
> +	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
> +	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
> +
> +	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> +		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
> +
> +		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
> +			pt_vaddr[j] = scratch_pte;
> +
> +		kunmap_atomic(pt_vaddr);
> +	}
> +
> +}

Why did you make the pages valid and point to scratch? It would seem
this is a really good opportunity to take advantage of page faults, and
allow us in the future to give useful info back to userspace.

Assuming you have a good answer, can you put it in the commit, or as a
comment.

Ben

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-28 20:35 ` [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt Daniel Vetter
  2011-11-29 18:50   ` Ben Widawsky
  2011-11-29 23:29   ` [PATCH 03/11] " Ben Widawsky
@ 2011-11-29 23:41   ` Ben Widawsky
  2011-11-30  8:09     ` Daniel Vetter
  2011-11-29 23:49   ` Ben Widawsky
  3 siblings, 1 reply; 30+ messages in thread
From: Ben Widawsky @ 2011-11-29 23:41 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Nov 28, 2011 at 09:35:30PM +0100, Daniel Vetter wrote:
> This just adds the setup and teardown code for the ppgtt PDE and the
> last-level pagetables, which are fixed for the entire lifetime, at
> least for the moment.
> 
> v2: Kill the stray debug printk noted by and improve the pte
> definitions as suggested by Chris Wilson.
> 
> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Crap... one more point

> +/* PPGTT support for Sandybdrige/Gen6 and later */
> +static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
> +				   unsigned first_entry,
> +				   unsigned num_entries)
> +{
> +	int i, j;
> +	uint32_t *pt_vaddr;
> +	uint32_t scratch_pte;
> +
> +	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
> +	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
> +
> +	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> +		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
> +
> +		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
> +			pt_vaddr[j] = scratch_pte;
> +
> +		kunmap_atomic(pt_vaddr);
> +	}
> +
> +}

Don't you want to clflush here (unless I missed it somewhere else).
Ideally I think you'd also flush the TLB with a PIPE_CONTROL, but I
guess so long as we have that bit set that always flushes we're okay on
that one... Still feel we need a clflush though.

Ben

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 04/11] drm/i915: ppgtt binding/unbinding support
  2011-11-28 20:35 ` [PATCH 04/11] drm/i915: ppgtt binding/unbinding support Daniel Vetter
@ 2011-11-29 23:46   ` Ben Widawsky
  0 siblings, 0 replies; 30+ messages in thread
From: Ben Widawsky @ 2011-11-29 23:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Nov 28, 2011 at 09:35:31PM +0100, Daniel Vetter wrote:
> This adds support to bind/unbind objects and wires it up. Objects are
> only put into the ppgtt when necessary, i.e. at execbuf time.
> 
> Objects are still unconditionally put into the global gtt.
> 
> v2: Kill the quick hack and explicitly pass cache_level to ppgtt_bind
> like for the global gtt function. Noticed by Chris Wilson.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---

> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index c918124..9c81cda 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -513,6 +513,7 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
>  			    struct drm_file *file,
>  			    struct list_head *objects)
>  {
> +	drm_i915_private_t *dev_priv = ring->dev->dev_private;
>  	struct drm_i915_gem_object *obj;
>  	int ret, retry;
>  	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
> @@ -621,6 +622,14 @@ i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
>  			}
>  
>  			i915_gem_object_unpin(obj);
> +
> +			/* ... and ensure ppgtt mapping exist if needed. */
> +			if (dev_priv->mm.aliasing_ppgtt && !obj->has_aliasing_ppgtt_mapping) {
> +				i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt,
> +						       obj, obj->cache_level);
> +
> +				obj->has_aliasing_ppgtt_mapping = 1;
> +			}
>  		}
>  
>  		if (ret != -ENOSPC || retry > 1)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index bd9b520..061ae12 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -34,22 +34,31 @@ static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
>  				   unsigned first_entry,
>  				   unsigned num_entries)
>  {
> -	int i, j;
>  	uint32_t *pt_vaddr;
>  	uint32_t scratch_pte;
> +	unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES;
> +	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
> +	unsigned last_pte, i;
>  
>  	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
>  	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
>  
> -	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> -		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
> +	while (num_entries) {
> +		last_pte = first_pte + num_entries;
> +		if (last_pte > I915_PPGTT_PT_ENTRIES)
> +			last_pte = I915_PPGTT_PT_ENTRIES;
> +
> +		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pd]);
>  
> -		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
> -			pt_vaddr[j] = scratch_pte;
> +		for (i = first_pte; i < last_pte; i++)
> +			pt_vaddr[i] = scratch_pte;
>  
>  		kunmap_atomic(pt_vaddr);
> -	}
>  
> +		num_entries -= last_pte - first_pte;
> +		first_pte = 0;
> +		act_pd++;
> +	}
>  }
>  
>  int i915_gem_init_aliasing_ppgtt(struct drm_device *dev)
> @@ -162,6 +171,131 @@ void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev)
>  	kfree(ppgtt);
>  }
>  
> +static void i915_ppgtt_insert_sg_entries(struct i915_hw_ppgtt *ppgtt,
> +					 struct scatterlist *sg_list,
> +					 unsigned sg_len,
> +					 unsigned first_entry,
> +					 uint32_t pte_flags)
> +{
> +	uint32_t *pt_vaddr, pte;
> +	unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES;
> +	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
> +	unsigned i, j, m, segment_len;
> +	dma_addr_t page_addr;
> +	struct scatterlist *sg;
> +
> +	/* init sg walking */
> +	sg = sg_list;
> +	i = 0;
> +	segment_len = sg_dma_len(sg) >> PAGE_SHIFT;
> +	m = 0;
> +
> +	while (i < sg_len) {
> +		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pd]);
> +
> +		for (j = first_pte; j < I915_PPGTT_PT_ENTRIES; j++) {
> +			page_addr = sg_dma_address(sg) + (m << PAGE_SHIFT);
> +			pte = GEN6_PTE_ADDR_ENCODE(page_addr);
> +			pt_vaddr[j] = pte | pte_flags;
> +
> +			/* grab the next page */
> +			m++;
> +			if (m == segment_len) {
> +				sg = sg_next(sg);
> +				i++;
> +				if (i == sg_len)
> +					break;
> +
> +				segment_len = sg_dma_len(sg) >> PAGE_SHIFT;
> +				m = 0;
> +			}
> +		}
> +
> +		kunmap_atomic(pt_vaddr);
> +
> +		first_pte = 0;
> +		act_pd++;
> +	}
> +}
> +
> +static void i915_ppgtt_insert_pages(struct i915_hw_ppgtt *ppgtt,
> +				    unsigned first_entry, unsigned num_entries,
> +				    struct page **pages, uint32_t pte_flags)
> +{
> +	uint32_t *pt_vaddr, pte;
> +	unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES;
> +	unsigned first_pte = first_entry % I915_PPGTT_PT_ENTRIES;
> +	unsigned last_pte, i;
> +	dma_addr_t page_addr;
> +
> +	while (num_entries) {
> +		last_pte = first_pte + num_entries;
> +		last_pte = min_t(unsigned, last_pte, I915_PPGTT_PT_ENTRIES);
> +
> +		pt_vaddr = kmap_atomic(ppgtt->pt_pages[act_pd]);
> +
> +		for (i = first_pte; i < last_pte; i++) {
> +			page_addr = page_to_phys(*pages);
> +			pte = GEN6_PTE_ADDR_ENCODE(page_addr);
> +			pt_vaddr[i] = pte | pte_flags;
> +
> +			pages++;
> +		}
> +
> +		kunmap_atomic(pt_vaddr);
> +
> +		num_entries -= last_pte - first_pte;
> +		first_pte = 0;
> +		act_pd++;
> +	}
> +}

Same comment as 3/11 on these... don't we need a clflush?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-28 20:35 ` [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt Daniel Vetter
                     ` (2 preceding siblings ...)
  2011-11-29 23:41   ` Ben Widawsky
@ 2011-11-29 23:49   ` Ben Widawsky
  2011-11-30  9:55     ` Daniel Vetter
  3 siblings, 1 reply; 30+ messages in thread
From: Ben Widawsky @ 2011-11-29 23:49 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Nov 28, 2011 at 09:35:30PM +0100, Daniel Vetter wrote:
> This just adds the setup and teardown code for the ppgtt PDE and the
> last-level pagetables, which are fixed for the entire lifetime, at
> least for the moment.
> 
> v2: Kill the stray debug printk noted by and improve the pte
> definitions as suggested by Chris Wilson.
> 
> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
{...}
>  
> +#define I915_PPGTT_PD_ENTRIES 512
> +#define I915_PPGTT_PT_ENTRIES 1024
Not to fond that we have this redefined as GEN6_PTES_PER_PD (which makes
more sense as far as naming goes IMO)

{...}

> +#define GEN6_PTES_PER_PD		1024

Ben

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 01/11] agp/intel-gtt: export the scratch page dma address
  2011-11-28 20:35 ` [PATCH 01/11] agp/intel-gtt: export the scratch page dma address Daniel Vetter
@ 2011-11-29 23:52   ` Ben Widawsky
  0 siblings, 0 replies; 30+ messages in thread
From: Ben Widawsky @ 2011-11-29 23:52 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Nov 28, 2011 at 09:35:28PM +0100, Daniel Vetter wrote:
> To implement a PPGTT for drm/i915 that fully aliases the GTT, we also
> need to properly alias the scratch page.
> 
> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 02/11] agp/intel-gtt: export the gtt pagetable iomapping
  2011-11-28 20:35 ` [PATCH 02/11] agp/intel-gtt: export the gtt pagetable iomapping Daniel Vetter
@ 2011-11-29 23:53   ` Ben Widawsky
  0 siblings, 0 replies; 30+ messages in thread
From: Ben Widawsky @ 2011-11-29 23:53 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Nov 28, 2011 at 09:35:29PM +0100, Daniel Vetter wrote:
> We need this because ppgtt page directory entries need to be in the
> global gtt pagetable.
> 
> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I think it makes sense to export the size of the GTT as well, but I'm
good either way.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/11] drm/i915: ppgtt register definitions
  2011-11-28 20:35 ` [PATCH 05/11] drm/i915: ppgtt register definitions Daniel Vetter
@ 2011-11-29 23:57   ` Ben Widawsky
  2011-11-30  7:57     ` Daniel Vetter
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Widawsky @ 2011-11-29 23:57 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Nov 28, 2011 at 09:35:32PM +0100, Daniel Vetter wrote:
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_reg.h |   18 ++++++++++++++++++
>  1 files changed, 18 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index e9f5dc8..7227446 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -86,6 +86,13 @@
>  #define   GEN6_MBC_SNPCR_LOW	(2<<21)
>  #define   GEN6_MBC_SNPCR_MIN	(3<<21) /* only 1/16th of the cache is shared */
>  
> +#define GEN6_MBCTL		0x0907c
> +#define   GEN6_MBCTL_ENABLE_BOOT_FETCH	(1 << 4)
> +#define   GEN6_MBCTL_CTX_FETCH_NEEDED	(1 << 3)
> +#define   GEN6_MBCTL_BME_UPDATE_ENABLE	(1 << 2)
> +#define   GEN6_MBCTL_MAE_UPDATE_ENABLE	(1 << 1)
> +#define   GEN6_MBCTL_BOOT_FETCH_MECH	(1 << 0)
> +

Maybe grep fail, but I don't see these registers used later in the
series.

>  #define GEN6_GDRST	0x941c
>  #define  GEN6_GRDOM_FULL		(1 << 0)
>  #define  GEN6_GRDOM_RENDER		(1 << 1)
> @@ -110,6 +117,16 @@
>  
>  #define GEN6_PTES_PER_PD		1024
>  
> +#define RING_PP_DIR_BASE(ring)		((ring)->mmio_base+0x228)
> +#define RING_PP_DIR_BASE_READ(ring)	((ring)->mmio_base+0x518)
> +#define RING_PP_DIR_DCLV(ring)		((ring)->mmio_base+0x220)
> +#define   PP_DIR_DCLV_2G		0xffffffff
> +
> +#define GAM_ECOCHK			0x4090
> +#define   ECOCHK_SNB_BIT		(1<<10)
> +#define   ECOCHK_PPGTT_CACHE64B		(0x3<<3)
> +#define   ECOCHK_PPGTT_CACHE4B		(0x0<<3)
> +
>  /* VGA stuff */
>  
>  #define VGA_ST01_MDA 0x3ba
> @@ -422,6 +439,7 @@
>  
>  #define GFX_MODE	0x02520
>  #define GFX_MODE_GEN7	0x0229c
> +#define RING_MODE_GEN7(ring)	((ring)->mmio_base+0x29c)
>  #define   GFX_RUN_LIST_ENABLE		(1<<15)
>  #define   GFX_TLB_INVALIDATE_ALWAYS	(1<<13)
>  #define   GFX_SURFACE_FAULT_ENABLE	(1<<12)

Ben

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 06/11] drm/i915: ppgtt debugfs info
  2011-11-28 20:35 ` [PATCH 06/11] drm/i915: ppgtt debugfs info Daniel Vetter
@ 2011-11-30  0:02   ` Ben Widawsky
  0 siblings, 0 replies; 30+ messages in thread
From: Ben Widawsky @ 2011-11-30  0:02 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Mon, Nov 28, 2011 at 09:35:33PM +0100, Daniel Vetter wrote:
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I am fine with the patch, but I don't yet understand how useful this is.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 00/11] aliasing ppgtt support v2
  2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
                   ` (7 preceding siblings ...)
  2011-11-29  9:42 ` [PATCH 00/11] aliasing ppgtt support v2 Chris Wilson
@ 2011-11-30  0:11 ` Eugeni Dodonov
  8 siblings, 0 replies; 30+ messages in thread
From: Eugeni Dodonov @ 2011-11-30  0:11 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1171 bytes --]

On Mon, Nov 28, 2011 at 18:35, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> Hi all,
>
> Changes since the last submission:
> - fixed issues pointed out by Chris Wilson on irc.
> - fixed an oops on pre-snb, shame on me for that one.
> - added two new patches to only bind objects to the global gtt when
> required.
> - added a new patch so that userspace can find out whether ppgtt is on.
> This is
>  required to use MI_STORE/LOAD commands correctly from userspace
> batchbuffers.
>  Luckily no currently released userspace code depends on this, it's just
> prep
>  work.
>
> Comments, test reports, reviews and flames highly welcome.
>

For the series (with the latest patches on your fd.o repo)

Tested-by: Eugeni Dodonov <eugeni.dodonov@intel.com>

I left it running on 2 SNBs and 1 IVB for couple of hours under different
set of workloads, and haven't seen any issues so far.

Also, after the latest round of fixes with Chris and Ben's comments, I
guess I can also give a
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>

for the series, with one small bike shed comment for Patch 7 which I'll
reply there.

-- 
Eugeni Dodonov
 <http://eugeni.dodonov.net/>

[-- Attachment #1.2: Type: text/html, Size: 1723 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/11] drm/i915: per-ring fault reg
  2011-11-28 20:35 ` [PATCH 07/11] drm/i915: per-ring fault reg Daniel Vetter
@ 2011-11-30  0:13   ` Eugeni Dodonov
  2011-11-30  9:41     ` Daniel Vetter
  0 siblings, 1 reply; 30+ messages in thread
From: Eugeni Dodonov @ 2011-11-30  0:13 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 280 bytes --]

On Mon, Nov 28, 2011 at 18:35, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> +#define ERROR                  0x40a0
>

You do not seem to use this, and it is the same value as ERROR_GEN6. Do we
need it? Or I misread something?

-- 
Eugeni Dodonov
 <http://eugeni.dodonov.net/>

[-- Attachment #1.2: Type: text/html, Size: 568 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 05/11] drm/i915: ppgtt register definitions
  2011-11-29 23:57   ` Ben Widawsky
@ 2011-11-30  7:57     ` Daniel Vetter
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Vetter @ 2011-11-30  7:57 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx

On Tue, Nov 29, 2011 at 03:57:41PM -0800, Ben Widawsky wrote:
> On Mon, Nov 28, 2011 at 09:35:32PM +0100, Daniel Vetter wrote:
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/i915_reg.h |   18 ++++++++++++++++++
> >  1 files changed, 18 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> > index e9f5dc8..7227446 100644
> > --- a/drivers/gpu/drm/i915/i915_reg.h
> > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > @@ -86,6 +86,13 @@
> >  #define   GEN6_MBC_SNPCR_LOW	(2<<21)
> >  #define   GEN6_MBC_SNPCR_MIN	(3<<21) /* only 1/16th of the cache is shared */
> >  
> > +#define GEN6_MBCTL		0x0907c
> > +#define   GEN6_MBCTL_ENABLE_BOOT_FETCH	(1 << 4)
> > +#define   GEN6_MBCTL_CTX_FETCH_NEEDED	(1 << 3)
> > +#define   GEN6_MBCTL_BME_UPDATE_ENABLE	(1 << 2)
> > +#define   GEN6_MBCTL_MAE_UPDATE_ENABLE	(1 << 1)
> > +#define   GEN6_MBCTL_BOOT_FETCH_MECH	(1 << 0)
> > +
> 
> Maybe grep fail, but I don't see these registers used later in the
> series.

They're not used. But both public snb docs and Bspec contain notices that
you need to frob the boot mode enable bit in here, which afaics doesn't
exist under that exact name. Also, frobbing any of these seems to just
reduce the expected lifetime of my gpus.

But in case anyone wants to try things out, it's imo good to include the
definitions (especially since public Docs don't explain anything about
this reg than that warning about ppgtt).
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-29 23:29   ` [PATCH 03/11] " Ben Widawsky
@ 2011-11-30  8:07     ` Daniel Vetter
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Vetter @ 2011-11-30  8:07 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx

On Tue, Nov 29, 2011 at 03:29:39PM -0800, Ben Widawsky wrote:
> On Mon, Nov 28, 2011 at 09:35:30PM +0100, Daniel Vetter wrote:
> > This just adds the setup and teardown code for the ppgtt PDE and the
> > last-level pagetables, which are fixed for the entire lifetime, at
> > least for the moment.
> > 
> > v2: Kill the stray debug printk noted by and improve the pte
> > definitions as suggested by Chris Wilson.
> > 
> > Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> The last patch you sent out seems fine to me. Just one final unrelated
> question.
> > ---
> 
> > +/* PPGTT support for Sandybdrige/Gen6 and later */
> > +static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
> > +				   unsigned first_entry,
> > +				   unsigned num_entries)
> > +{
> > +	int i, j;
> > +	uint32_t *pt_vaddr;
> > +	uint32_t scratch_pte;
> > +
> > +	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
> > +	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
> > +
> > +	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> > +		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
> > +
> > +		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
> > +			pt_vaddr[j] = scratch_pte;
> > +
> > +		kunmap_atomic(pt_vaddr);
> > +	}
> > +
> > +}
> 
> Why did you make the pages valid and point to scratch? It would seem
> this is a really good opportunity to take advantage of page faults, and
> allow us in the future to give useful info back to userspace.

This patch set sneaks in ppgtt in transparent way an hence we may not
break anything. And it has already some decent potential for breakage, so
I don't want to tempt the dragons too much. I'm fine with trying this as a
later patch (maybe also for the global gtt, because that's now only used
by display and ringbuffers, and these don't prefetch afaik). Honestly
haven't bothered to test it at all.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-29 23:41   ` Ben Widawsky
@ 2011-11-30  8:09     ` Daniel Vetter
  2011-11-30 17:35       ` Ben Widawsky
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Vetter @ 2011-11-30  8:09 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx

On Tue, Nov 29, 2011 at 03:41:40PM -0800, Ben Widawsky wrote:
> On Mon, Nov 28, 2011 at 09:35:30PM +0100, Daniel Vetter wrote:
> > This just adds the setup and teardown code for the ppgtt PDE and the
> > last-level pagetables, which are fixed for the entire lifetime, at
> > least for the moment.
> > 
> > v2: Kill the stray debug printk noted by and improve the pte
> > definitions as suggested by Chris Wilson.
> > 
> > Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Crap... one more point
> 
> > +/* PPGTT support for Sandybdrige/Gen6 and later */
> > +static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
> > +				   unsigned first_entry,
> > +				   unsigned num_entries)
> > +{
> > +	int i, j;
> > +	uint32_t *pt_vaddr;
> > +	uint32_t scratch_pte;
> > +
> > +	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
> > +	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
> > +
> > +	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> > +		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
> > +
> > +		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
> > +			pt_vaddr[j] = scratch_pte;
> > +
> > +		kunmap_atomic(pt_vaddr);
> > +	}
> > +
> > +}
> 
> Don't you want to clflush here (unless I missed it somewhere else).
> Ideally I think you'd also flush the TLB with a PIPE_CONTROL, but I
> guess so long as we have that bit set that always flushes we're okay on
> that one... Still feel we need a clflush though.

Afaics ppgtt pte fetches are coherent with the cpu irrespective of the pte
cacheability control setting. So like for llc cached buffers, no flushing
needed at all. We also have a fair share of tests that trash mappings like
crazy (and easily hit all the historical tlb issues we've had), so I'm
fairly sure it works.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 07/11] drm/i915: per-ring fault reg
  2011-11-30  0:13   ` Eugeni Dodonov
@ 2011-11-30  9:41     ` Daniel Vetter
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Vetter @ 2011-11-30  9:41 UTC (permalink / raw)
  To: Eugeni Dodonov; +Cc: Daniel Vetter, intel-gfx

On Tue, Nov 29, 2011 at 10:13:33PM -0200, Eugeni Dodonov wrote:
> On Mon, Nov 28, 2011 at 18:35, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> 
> > +#define ERROR                  0x40a0
> >
> 
> You do not seem to use this, and it is the same value as ERROR_GEN6. Do we
> need it? Or I misread something?

Yep, that's unused because I later on noticed ERROR_GEN6. Removed.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-29 23:49   ` Ben Widawsky
@ 2011-11-30  9:55     ` Daniel Vetter
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Vetter @ 2011-11-30  9:55 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx

On Tue, Nov 29, 2011 at 03:49:23PM -0800, Ben Widawsky wrote:
> On Mon, Nov 28, 2011 at 09:35:30PM +0100, Daniel Vetter wrote:
> > This just adds the setup and teardown code for the ppgtt PDE and the
> > last-level pagetables, which are fixed for the entire lifetime, at
> > least for the moment.
> > 
> > v2: Kill the stray debug printk noted by and improve the pte
> > definitions as suggested by Chris Wilson.
> > 
> > Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
> {...}
> >  
> > +#define I915_PPGTT_PD_ENTRIES 512
> > +#define I915_PPGTT_PT_ENTRIES 1024
> Not to fond that we have this redefined as GEN6_PTES_PER_PD (which makes
> more sense as far as naming goes IMO)

I've killed the redundant define.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt
  2011-11-30  8:09     ` Daniel Vetter
@ 2011-11-30 17:35       ` Ben Widawsky
  0 siblings, 0 replies; 30+ messages in thread
From: Ben Widawsky @ 2011-11-30 17:35 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx

On Wed, Nov 30, 2011 at 09:09:20AM +0100, Daniel Vetter wrote:
> On Tue, Nov 29, 2011 at 03:41:40PM -0800, Ben Widawsky wrote:
> > On Mon, Nov 28, 2011 at 09:35:30PM +0100, Daniel Vetter wrote:
> > > This just adds the setup and teardown code for the ppgtt PDE and the
> > > last-level pagetables, which are fixed for the entire lifetime, at
> > > least for the moment.
> > > 
> > > v2: Kill the stray debug printk noted by and improve the pte
> > > definitions as suggested by Chris Wilson.
> > > 
> > > Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Crap... one more point
> > 
> > > +/* PPGTT support for Sandybdrige/Gen6 and later */
> > > +static void i915_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt,
> > > +				   unsigned first_entry,
> > > +				   unsigned num_entries)
> > > +{
> > > +	int i, j;
> > > +	uint32_t *pt_vaddr;
> > > +	uint32_t scratch_pte;
> > > +
> > > +	scratch_pte = GEN6_PTE_ADDR_ENCODE(ppgtt->scratch_page_dma_addr);
> > > +	scratch_pte |= GEN6_PTE_VALID | GEN6_PTE_CACHE_LLC;
> > > +
> > > +	for (i = 0; i < ppgtt->num_pd_entries; i++) {
> > > +		pt_vaddr = kmap_atomic(ppgtt->pt_pages[i]);
> > > +
> > > +		for (j = 0; j < I915_PPGTT_PT_ENTRIES; j++)
> > > +			pt_vaddr[j] = scratch_pte;
> > > +
> > > +		kunmap_atomic(pt_vaddr);
> > > +	}
> > > +
> > > +}
> > 
> > Don't you want to clflush here (unless I missed it somewhere else).
> > Ideally I think you'd also flush the TLB with a PIPE_CONTROL, but I
> > guess so long as we have that bit set that always flushes we're okay on
> > that one... Still feel we need a clflush though.
> 
> Afaics ppgtt pte fetches are coherent with the cpu irrespective of the pte
> cacheability control setting. So like for llc cached buffers, no flushing
> needed at all. We also have a fair share of tests that trash mappings like
> crazy (and easily hit all the historical tlb issues we've had), so I'm
> fairly sure it works.

Well that's convenient. In that case, given how often we invalidate
TLBs, it'd be really nice if we could be smart about managing these
cache lines. Perhaps as part of execbuf we could warm up the the ptes?

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-11-30 17:35 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-28 20:35 [PATCH 00/11] aliasing ppgtt support v2 Daniel Vetter
2011-11-28 20:35 ` [PATCH 01/11] agp/intel-gtt: export the scratch page dma address Daniel Vetter
2011-11-29 23:52   ` Ben Widawsky
2011-11-28 20:35 ` [PATCH 02/11] agp/intel-gtt: export the gtt pagetable iomapping Daniel Vetter
2011-11-29 23:53   ` Ben Widawsky
2011-11-28 20:35 ` [PATCH 03/11] drm/i915: initialization/teardown for the aliasing ppgtt Daniel Vetter
2011-11-29 18:50   ` Ben Widawsky
2011-11-29 19:06     ` Daniel Vetter
2011-11-29 19:09       ` [PATCH] " Daniel Vetter
2011-11-29 19:54         ` Daniel Vetter
2011-11-29 20:55           ` Daniel Vetter
2011-11-29 23:29   ` [PATCH 03/11] " Ben Widawsky
2011-11-30  8:07     ` Daniel Vetter
2011-11-29 23:41   ` Ben Widawsky
2011-11-30  8:09     ` Daniel Vetter
2011-11-30 17:35       ` Ben Widawsky
2011-11-29 23:49   ` Ben Widawsky
2011-11-30  9:55     ` Daniel Vetter
2011-11-28 20:35 ` [PATCH 04/11] drm/i915: ppgtt binding/unbinding support Daniel Vetter
2011-11-29 23:46   ` Ben Widawsky
2011-11-28 20:35 ` [PATCH 05/11] drm/i915: ppgtt register definitions Daniel Vetter
2011-11-29 23:57   ` Ben Widawsky
2011-11-30  7:57     ` Daniel Vetter
2011-11-28 20:35 ` [PATCH 06/11] drm/i915: ppgtt debugfs info Daniel Vetter
2011-11-30  0:02   ` Ben Widawsky
2011-11-28 20:35 ` [PATCH 07/11] drm/i915: per-ring fault reg Daniel Vetter
2011-11-30  0:13   ` Eugeni Dodonov
2011-11-30  9:41     ` Daniel Vetter
2011-11-29  9:42 ` [PATCH 00/11] aliasing ppgtt support v2 Chris Wilson
2011-11-30  0:11 ` Eugeni Dodonov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.