All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Set cache level ioctl
@ 2012-07-09 11:34 Chris Wilson
  2012-07-09 11:34 ` [PATCH 1/3] drm: Add colouring to the range allocator Chris Wilson
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Chris Wilson @ 2012-07-09 11:34 UTC (permalink / raw)
  To: intel-gfx

One of the niceties of SandyBridge was that it introduced a shared LLC
cache between the CPU and the GPU. This allows us assume cache coherency
of a bo and so mix CPU and GPU access to the same buffer (with a few
caveats of course). Older architectures used snooping to achieve very
nearly the same effect (the only observable difference for the driver is
the greatly increased bandwidth between the CPU and GPU and memory that
SandyBridge also introduced). However, whilst we were able to enable LLC
sharing by default on SNB as it has no obvious penalty and could
be handled implicitly by the drivers, snoopable memory on the other hand
must be managed explicitly (only certain types of buffers are allowed to
snoopable, and overuse leads to degraded performace). So we expose an
ioctl to grant userspace the ability to do so.

Note that this has a significant patch to the drm_mm to enable
segregation of different memory domains within the GTT, and that Daniel
wants a battery of coherency checks for i-g-t, hence it is not quite
ready just yet. As it stands I have used it to good effect within the
DDX. Bring on vmap!
-Chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] drm: Add colouring to the range allocator
  2012-07-09 11:34 [RFC] Set cache level ioctl Chris Wilson
@ 2012-07-09 11:34 ` Chris Wilson
  2012-07-10  9:21   ` Daniel Vetter
  2012-07-09 11:34 ` [PATCH 2/3] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
  2012-07-09 11:34 ` [PATCH 3/3] drm/i915: Export ability of changing cache levels to userspace Chris Wilson
  2 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2012-07-09 11:34 UTC (permalink / raw)
  To: intel-gfx
  Cc: Benjamin Herrenschmidt, Jerome Glisse, Ben Skeggs, Daniel Vetter,
	Alex Deucher

In order to support snoopable memory on non-LLC architectures (so that
we can bind vgem objects into the i915 GATT for example), we have to
avoid the prefetcher on the GPU from crossing memory domains and so
prevent allocation of a snoopable PTE immediately following an uncached
PTE. To do that, we need to extend the range allocator with support for
tracking and segregating different node colours.

This will be used by i915 to segregate memory domains within the GTT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Conflicts:

	drivers/gpu/drm/i915/i915_gem_stolen.c
---
 drivers/gpu/drm/drm_gem.c                  |    2 +-
 drivers/gpu/drm/drm_mm.c                   |  151 +++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_gem.c            |    6 +-
 drivers/gpu/drm/i915/i915_gem_evict.c      |    9 +-
 drivers/gpu/drm/i915/i915_gem_stolen.c     |    5 +-
 drivers/gpu/drm/nouveau/nouveau_notifier.c |    4 +-
 drivers/gpu/drm/nouveau/nouveau_object.c   |    2 +-
 drivers/gpu/drm/nouveau/nv04_instmem.c     |    2 +-
 drivers/gpu/drm/nouveau/nv20_fb.c          |    2 +-
 drivers/gpu/drm/nouveau/nv50_vram.c        |    2 +-
 drivers/gpu/drm/ttm/ttm_bo.c               |    2 +-
 drivers/gpu/drm/ttm/ttm_bo_manager.c       |    4 +-
 include/drm/drm_mm.h                       |   38 +++++--
 13 files changed, 143 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d58e69d..961ccd8 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -354,7 +354,7 @@ drm_gem_create_mmap_offset(struct drm_gem_object *obj)
 
 	/* Get a DRM GEM mmap offset allocated... */
 	list->file_offset_node = drm_mm_search_free(&mm->offset_manager,
-			obj->size / PAGE_SIZE, 0, 0);
+			obj->size / PAGE_SIZE, 0, 0, false);
 
 	if (!list->file_offset_node) {
 		DRM_ERROR("failed to allocate offset for bo %d\n", obj->name);
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 961fb54..0311dba 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -118,45 +118,53 @@ static inline unsigned long drm_mm_hole_node_end(struct drm_mm_node *hole_node)
 
 static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 				 struct drm_mm_node *node,
-				 unsigned long size, unsigned alignment)
+				 unsigned long size, unsigned alignment,
+				 unsigned long color)
 {
 	struct drm_mm *mm = hole_node->mm;
-	unsigned long tmp = 0, wasted = 0;
 	unsigned long hole_start = drm_mm_hole_node_start(hole_node);
 	unsigned long hole_end = drm_mm_hole_node_end(hole_node);
+	unsigned long adj_start = hole_start;
+	unsigned long adj_end = hole_end;
 
 	BUG_ON(!hole_node->hole_follows || node->allocated);
 
-	if (alignment)
-		tmp = hole_start % alignment;
+	if (mm->color_adjust)
+		mm->color_adjust(hole_node, color, &adj_start, &adj_end);
 
-	if (!tmp) {
+	if (alignment) {
+		unsigned tmp = adj_start % alignment;
+		if (tmp)
+			adj_start += alignment - tmp;
+	}
+
+	if (adj_start == hole_start) {
 		hole_node->hole_follows = 0;
-		list_del_init(&hole_node->hole_stack);
-	} else
-		wasted = alignment - tmp;
+		list_del(&hole_node->hole_stack);
+	}
 
-	node->start = hole_start + wasted;
+	node->start = adj_start;
 	node->size = size;
 	node->mm = mm;
+	node->color = color;
 	node->allocated = 1;
 
 	INIT_LIST_HEAD(&node->hole_stack);
 	list_add(&node->node_list, &hole_node->node_list);
 
-	BUG_ON(node->start + node->size > hole_end);
+	BUG_ON(node->start + node->size > adj_end);
 
+	node->hole_follows = 0;
 	if (node->start + node->size < hole_end) {
 		list_add(&node->hole_stack, &mm->hole_stack);
 		node->hole_follows = 1;
-	} else {
-		node->hole_follows = 0;
 	}
 }
 
 struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node,
 					     unsigned long size,
 					     unsigned alignment,
+					     unsigned long color,
 					     int atomic)
 {
 	struct drm_mm_node *node;
@@ -165,7 +173,7 @@ struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node,
 	if (unlikely(node == NULL))
 		return NULL;
 
-	drm_mm_insert_helper(hole_node, node, size, alignment);
+	drm_mm_insert_helper(hole_node, node, size, alignment, color);
 
 	return node;
 }
@@ -181,11 +189,11 @@ int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node,
 {
 	struct drm_mm_node *hole_node;
 
-	hole_node = drm_mm_search_free(mm, size, alignment, 0);
+	hole_node = drm_mm_search_free(mm, size, alignment, 0, false);
 	if (!hole_node)
 		return -ENOSPC;
 
-	drm_mm_insert_helper(hole_node, node, size, alignment);
+	drm_mm_insert_helper(hole_node, node, size, alignment, 0);
 
 	return 0;
 }
@@ -194,50 +202,57 @@ EXPORT_SYMBOL(drm_mm_insert_node);
 static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node,
 				       struct drm_mm_node *node,
 				       unsigned long size, unsigned alignment,
+				       unsigned long color,
 				       unsigned long start, unsigned long end)
 {
 	struct drm_mm *mm = hole_node->mm;
-	unsigned long tmp = 0, wasted = 0;
 	unsigned long hole_start = drm_mm_hole_node_start(hole_node);
 	unsigned long hole_end = drm_mm_hole_node_end(hole_node);
+	unsigned long adj_start = hole_start;
+	unsigned long adj_end = hole_end;
 
 	BUG_ON(!hole_node->hole_follows || node->allocated);
 
-	if (hole_start < start)
-		wasted += start - hole_start;
-	if (alignment)
-		tmp = (hole_start + wasted) % alignment;
+	if (mm->color_adjust)
+		mm->color_adjust(hole_node, color, &adj_start, &adj_end);
 
-	if (tmp)
-		wasted += alignment - tmp;
+	if (adj_start < start)
+		adj_start = start;
 
-	if (!wasted) {
+	if (alignment) {
+		unsigned tmp = adj_start % alignment;
+		if (tmp)
+			adj_start += alignment - tmp;
+	}
+
+	if (adj_start == hole_start) {
 		hole_node->hole_follows = 0;
-		list_del_init(&hole_node->hole_stack);
+		list_del(&hole_node->hole_stack);
 	}
 
-	node->start = hole_start + wasted;
+	node->start = adj_start;
 	node->size = size;
 	node->mm = mm;
+	node->color = color;
 	node->allocated = 1;
 
 	INIT_LIST_HEAD(&node->hole_stack);
 	list_add(&node->node_list, &hole_node->node_list);
 
-	BUG_ON(node->start + node->size > hole_end);
+	BUG_ON(node->start + node->size > adj_end);
 	BUG_ON(node->start + node->size > end);
 
+	node->hole_follows = 0;
 	if (node->start + node->size < hole_end) {
 		list_add(&node->hole_stack, &mm->hole_stack);
 		node->hole_follows = 1;
-	} else {
-		node->hole_follows = 0;
 	}
 }
 
 struct drm_mm_node *drm_mm_get_block_range_generic(struct drm_mm_node *hole_node,
 						unsigned long size,
 						unsigned alignment,
+						unsigned long color,
 						unsigned long start,
 						unsigned long end,
 						int atomic)
@@ -248,7 +263,7 @@ struct drm_mm_node *drm_mm_get_block_range_generic(struct drm_mm_node *hole_node
 	if (unlikely(node == NULL))
 		return NULL;
 
-	drm_mm_insert_helper_range(hole_node, node, size, alignment,
+	drm_mm_insert_helper_range(hole_node, node, size, alignment, color,
 				   start, end);
 
 	return node;
@@ -266,12 +281,12 @@ int drm_mm_insert_node_in_range(struct drm_mm *mm, struct drm_mm_node *node,
 {
 	struct drm_mm_node *hole_node;
 
-	hole_node = drm_mm_search_free_in_range(mm, size, alignment,
-						start, end, 0);
+	hole_node = drm_mm_search_free_in_range(mm, size, alignment, 0,
+						start, end, false);
 	if (!hole_node)
 		return -ENOSPC;
 
-	drm_mm_insert_helper_range(hole_node, node, size, alignment,
+	drm_mm_insert_helper_range(hole_node, node, size, alignment, 0,
 				   start, end);
 
 	return 0;
@@ -336,27 +351,23 @@ EXPORT_SYMBOL(drm_mm_put_block);
 static int check_free_hole(unsigned long start, unsigned long end,
 			   unsigned long size, unsigned alignment)
 {
-	unsigned wasted = 0;
-
 	if (end - start < size)
 		return 0;
 
 	if (alignment) {
 		unsigned tmp = start % alignment;
 		if (tmp)
-			wasted = alignment - tmp;
-	}
-
-	if (end >= start + size + wasted) {
-		return 1;
+			start += alignment - tmp;
 	}
 
-	return 0;
+	return end >= start + size;
 }
 
 struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
 				       unsigned long size,
-				       unsigned alignment, int best_match)
+				       unsigned alignment,
+				       unsigned long color,
+				       bool best_match)
 {
 	struct drm_mm_node *entry;
 	struct drm_mm_node *best;
@@ -368,10 +379,17 @@ struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
 	best_size = ~0UL;
 
 	list_for_each_entry(entry, &mm->hole_stack, hole_stack) {
+		unsigned long adj_start = drm_mm_hole_node_start(entry);
+		unsigned long adj_end = drm_mm_hole_node_end(entry);
+
+		if (mm->color_adjust) {
+			mm->color_adjust(entry, color, &adj_start, &adj_end);
+			if (adj_end <= adj_start)
+				continue;
+		}
+
 		BUG_ON(!entry->hole_follows);
-		if (!check_free_hole(drm_mm_hole_node_start(entry),
-				     drm_mm_hole_node_end(entry),
-				     size, alignment))
+		if (!check_free_hole(adj_start, adj_end, size, alignment))
 			continue;
 
 		if (!best_match)
@@ -390,9 +408,10 @@ EXPORT_SYMBOL(drm_mm_search_free);
 struct drm_mm_node *drm_mm_search_free_in_range(const struct drm_mm *mm,
 						unsigned long size,
 						unsigned alignment,
+						unsigned long color,
 						unsigned long start,
 						unsigned long end,
-						int best_match)
+						bool best_match)
 {
 	struct drm_mm_node *entry;
 	struct drm_mm_node *best;
@@ -410,6 +429,13 @@ struct drm_mm_node *drm_mm_search_free_in_range(const struct drm_mm *mm,
 			end : drm_mm_hole_node_end(entry);
 
 		BUG_ON(!entry->hole_follows);
+
+		if (mm->color_adjust) {
+			mm->color_adjust(entry, color, &adj_start, &adj_end);
+			if (adj_end <= adj_start)
+				continue;
+		}
+
 		if (!check_free_hole(adj_start, adj_end, size, alignment))
 			continue;
 
@@ -437,6 +463,7 @@ void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new)
 	new->mm = old->mm;
 	new->start = old->start;
 	new->size = old->size;
+	new->color = old->color;
 
 	old->allocated = 0;
 	new->allocated = 1;
@@ -452,9 +479,12 @@ EXPORT_SYMBOL(drm_mm_replace_node);
  * Warning: As long as the scan list is non-empty, no other operations than
  * adding/removing nodes to/from the scan list are allowed.
  */
-void drm_mm_init_scan(struct drm_mm *mm, unsigned long size,
-		      unsigned alignment)
+void drm_mm_init_scan(struct drm_mm *mm,
+		      unsigned long size,
+		      unsigned alignment,
+		      unsigned long color)
 {
+	mm->scan_color = color;
 	mm->scan_alignment = alignment;
 	mm->scan_size = size;
 	mm->scanned_blocks = 0;
@@ -474,11 +504,14 @@ EXPORT_SYMBOL(drm_mm_init_scan);
  * Warning: As long as the scan list is non-empty, no other operations than
  * adding/removing nodes to/from the scan list are allowed.
  */
-void drm_mm_init_scan_with_range(struct drm_mm *mm, unsigned long size,
+void drm_mm_init_scan_with_range(struct drm_mm *mm,
+				 unsigned long size,
 				 unsigned alignment,
+				 unsigned long color,
 				 unsigned long start,
 				 unsigned long end)
 {
+	mm->scan_color = color;
 	mm->scan_alignment = alignment;
 	mm->scan_size = size;
 	mm->scanned_blocks = 0;
@@ -522,17 +555,21 @@ int drm_mm_scan_add_block(struct drm_mm_node *node)
 
 	hole_start = drm_mm_hole_node_start(prev_node);
 	hole_end = drm_mm_hole_node_end(prev_node);
+
+	adj_start = hole_start;
+	adj_end = hole_end;
+
+	if (mm->color_adjust)
+		mm->color_adjust(prev_node, mm->scan_color, &adj_start, &adj_end);
+
 	if (mm->scan_check_range) {
-		adj_start = hole_start < mm->scan_start ?
-			mm->scan_start : hole_start;
-		adj_end = hole_end > mm->scan_end ?
-			mm->scan_end : hole_end;
-	} else {
-		adj_start = hole_start;
-		adj_end = hole_end;
+		if (adj_start < mm->scan_start)
+			adj_start = mm->scan_start;
+		if (adj_end > mm->scan_end)
+			adj_end = mm->scan_end;
 	}
 
-	if (check_free_hole(adj_start , adj_end,
+	if (check_free_hole(adj_start, adj_end,
 			    mm->scan_size, mm->scan_alignment)) {
 		mm->scan_hit_start = hole_start;
 		mm->scan_hit_size = hole_end;
@@ -616,6 +653,8 @@ int drm_mm_init(struct drm_mm * mm, unsigned long start, unsigned long size)
 	mm->head_node.size = start - mm->head_node.start;
 	list_add_tail(&mm->head_node.hole_stack, &mm->hole_stack);
 
+	mm->color_adjust = NULL;
+
 	return 0;
 }
 EXPORT_SYMBOL(drm_mm_init);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index db438f0..cad56dd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2756,18 +2756,18 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 		free_space =
 			drm_mm_search_free_in_range(&dev_priv->mm.gtt_space,
 						    size, alignment, 0,
-						    dev_priv->mm.gtt_mappable_end,
+						    0, dev_priv->mm.gtt_mappable_end,
 						    0);
 	else
 		free_space = drm_mm_search_free(&dev_priv->mm.gtt_space,
-						size, alignment, 0);
+						size, alignment, 0, 0);
 
 	if (free_space != NULL) {
 		if (map_and_fenceable)
 			obj->gtt_space =
 				drm_mm_get_block_range_generic(free_space,
 							       size, alignment, 0,
-							       dev_priv->mm.gtt_mappable_end,
+							       0, dev_priv->mm.gtt_mappable_end,
 							       0);
 		else
 			obj->gtt_space =
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index ae7c24e..eba0308 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -78,11 +78,12 @@ i915_gem_evict_something(struct drm_device *dev, int min_size,
 
 	INIT_LIST_HEAD(&unwind_list);
 	if (mappable)
-		drm_mm_init_scan_with_range(&dev_priv->mm.gtt_space, min_size,
-					    alignment, 0,
-					    dev_priv->mm.gtt_mappable_end);
+		drm_mm_init_scan_with_range(&dev_priv->mm.gtt_space,
+					    min_size, alignment, 0,
+					    0, dev_priv->mm.gtt_mappable_end);
 	else
-		drm_mm_init_scan(&dev_priv->mm.gtt_space, min_size, alignment);
+		drm_mm_init_scan(&dev_priv->mm.gtt_space,
+				 min_size, alignment, 0);
 
 	/* First see if there is a large enough contiguous idle region... */
 	list_for_each_entry(obj, &dev_priv->mm.inactive_list, mm_list) {
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index ada2e90..dba13cf 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -111,7 +111,8 @@ static void i915_setup_compression(struct drm_device *dev, int size)
 	/* Just in case the BIOS is doing something questionable. */
 	intel_disable_fbc(dev);
 
-	compressed_fb = drm_mm_search_free(&dev_priv->mm.stolen, size, 4096, 0);
+	compressed_fb = drm_mm_search_free(&dev_priv->mm.stolen,
+					   size, 4096, 0, 0);
 	if (compressed_fb)
 		compressed_fb = drm_mm_get_block(compressed_fb, size, 4096);
 	if (!compressed_fb)
@@ -123,7 +124,7 @@ static void i915_setup_compression(struct drm_device *dev, int size)
 
 	if (!(IS_GM45(dev) || HAS_PCH_SPLIT(dev))) {
 		compressed_llb = drm_mm_search_free(&dev_priv->mm.stolen,
-						    4096, 4096, 0);
+						    4096, 4096, 0, 0);
 		if (compressed_llb)
 			compressed_llb = drm_mm_get_block(compressed_llb,
 							  4096, 4096);
diff --git a/drivers/gpu/drm/nouveau/nouveau_notifier.c b/drivers/gpu/drm/nouveau/nouveau_notifier.c
index 2ef883c..65c64b1 100644
--- a/drivers/gpu/drm/nouveau/nouveau_notifier.c
+++ b/drivers/gpu/drm/nouveau/nouveau_notifier.c
@@ -118,10 +118,10 @@ nouveau_notifier_alloc(struct nouveau_channel *chan, uint32_t handle,
 	uint64_t offset;
 	int target, ret;
 
-	mem = drm_mm_search_free_in_range(&chan->notifier_heap, size, 0,
+	mem = drm_mm_search_free_in_range(&chan->notifier_heap, size, 0, 0,
 					  start, end, 0);
 	if (mem)
-		mem = drm_mm_get_block_range(mem, size, 0, start, end);
+		mem = drm_mm_get_block_range(mem, size, 0, 0, start, end);
 	if (!mem) {
 		NV_ERROR(dev, "Channel %d notifier block full\n", chan->id);
 		return -ENOMEM;
diff --git a/drivers/gpu/drm/nouveau/nouveau_object.c b/drivers/gpu/drm/nouveau/nouveau_object.c
index b190cc0..15d5d97 100644
--- a/drivers/gpu/drm/nouveau/nouveau_object.c
+++ b/drivers/gpu/drm/nouveau/nouveau_object.c
@@ -163,7 +163,7 @@ nouveau_gpuobj_new(struct drm_device *dev, struct nouveau_channel *chan,
 	spin_unlock(&dev_priv->ramin_lock);
 
 	if (!(flags & NVOBJ_FLAG_VM) && chan) {
-		ramin = drm_mm_search_free(&chan->ramin_heap, size, align, 0);
+		ramin = drm_mm_search_free(&chan->ramin_heap, size, align, 0, 0);
 		if (ramin)
 			ramin = drm_mm_get_block(ramin, size, align);
 		if (!ramin) {
diff --git a/drivers/gpu/drm/nouveau/nv04_instmem.c b/drivers/gpu/drm/nouveau/nv04_instmem.c
index ef7a934..ce57bcd 100644
--- a/drivers/gpu/drm/nouveau/nv04_instmem.c
+++ b/drivers/gpu/drm/nouveau/nv04_instmem.c
@@ -149,7 +149,7 @@ nv04_instmem_get(struct nouveau_gpuobj *gpuobj, struct nouveau_channel *chan,
 			return -ENOMEM;
 
 		spin_lock(&dev_priv->ramin_lock);
-		ramin = drm_mm_search_free(&dev_priv->ramin_heap, size, align, 0);
+		ramin = drm_mm_search_free(&dev_priv->ramin_heap, size, align, 0, 0);
 		if (ramin == NULL) {
 			spin_unlock(&dev_priv->ramin_lock);
 			return -ENOMEM;
diff --git a/drivers/gpu/drm/nouveau/nv20_fb.c b/drivers/gpu/drm/nouveau/nv20_fb.c
index 19bd640..754f47f 100644
--- a/drivers/gpu/drm/nouveau/nv20_fb.c
+++ b/drivers/gpu/drm/nouveau/nv20_fb.c
@@ -16,7 +16,7 @@ nv20_fb_alloc_tag(struct drm_device *dev, uint32_t size)
 		return NULL;
 
 	spin_lock(&dev_priv->tile.lock);
-	mem = drm_mm_search_free(&pfb->tag_heap, size, 0, 0);
+	mem = drm_mm_search_free(&pfb->tag_heap, size, 0, 0, 0);
 	if (mem)
 		mem = drm_mm_get_block_atomic(mem, size, 0);
 	spin_unlock(&dev_priv->tile.lock);
diff --git a/drivers/gpu/drm/nouveau/nv50_vram.c b/drivers/gpu/drm/nouveau/nv50_vram.c
index 9ed9ae39..6c8ea3f 100644
--- a/drivers/gpu/drm/nouveau/nv50_vram.c
+++ b/drivers/gpu/drm/nouveau/nv50_vram.c
@@ -105,7 +105,7 @@ nv50_vram_new(struct drm_device *dev, u64 size, u32 align, u32 size_nc,
 			struct nouveau_fb_engine *pfb = &dev_priv->engine.fb;
 			int n = (size >> 4) * comp;
 
-			mem->tag = drm_mm_search_free(&pfb->tag_heap, n, 0, 0);
+			mem->tag = drm_mm_search_free(&pfb->tag_heap, n, 0, 0, 0);
 			if (mem->tag)
 				mem->tag = drm_mm_get_block(mem->tag, n, 0);
 		}
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 36f4b28..76ee39f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1686,7 +1686,7 @@ retry_pre_get:
 
 	write_lock(&bdev->vm_lock);
 	bo->vm_node = drm_mm_search_free(&bdev->addr_space_mm,
-					 bo->mem.num_pages, 0, 0);
+					 bo->mem.num_pages, 0, 0, 0);
 
 	if (unlikely(bo->vm_node == NULL)) {
 		ret = -ENOMEM;
diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
index 038e947..b426b29 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
@@ -68,14 +68,14 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
 
 		spin_lock(&rman->lock);
 		node = drm_mm_search_free_in_range(mm,
-					mem->num_pages, mem->page_alignment,
+					mem->num_pages, mem->page_alignment, 0,
 					placement->fpfn, lpfn, 1);
 		if (unlikely(node == NULL)) {
 			spin_unlock(&rman->lock);
 			return 0;
 		}
 		node = drm_mm_get_block_atomic_range(node, mem->num_pages,
-						     mem->page_alignment,
+						     mem->page_alignment, 0,
 						     placement->fpfn,
 						     lpfn);
 		spin_unlock(&rman->lock);
diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
index 564b14a..04a9554 100644
--- a/include/drm/drm_mm.h
+++ b/include/drm/drm_mm.h
@@ -50,6 +50,7 @@ struct drm_mm_node {
 	unsigned scanned_next_free : 1;
 	unsigned scanned_preceeds_hole : 1;
 	unsigned allocated : 1;
+	unsigned long color;
 	unsigned long start;
 	unsigned long size;
 	struct drm_mm *mm;
@@ -66,6 +67,7 @@ struct drm_mm {
 	spinlock_t unused_lock;
 	unsigned int scan_check_range : 1;
 	unsigned scan_alignment;
+	unsigned long scan_color;
 	unsigned long scan_size;
 	unsigned long scan_hit_start;
 	unsigned scan_hit_size;
@@ -73,6 +75,9 @@ struct drm_mm {
 	unsigned long scan_start;
 	unsigned long scan_end;
 	struct drm_mm_node *prev_scanned_node;
+
+	void (*color_adjust)(struct drm_mm_node *node, unsigned long color,
+			     unsigned long *start, unsigned long *end);
 };
 
 static inline bool drm_mm_node_allocated(struct drm_mm_node *node)
@@ -100,11 +105,13 @@ static inline bool drm_mm_initialized(struct drm_mm *mm)
 extern struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *node,
 						    unsigned long size,
 						    unsigned alignment,
+						    unsigned long color,
 						    int atomic);
 extern struct drm_mm_node *drm_mm_get_block_range_generic(
 						struct drm_mm_node *node,
 						unsigned long size,
 						unsigned alignment,
+						unsigned long color,
 						unsigned long start,
 						unsigned long end,
 						int atomic);
@@ -112,32 +119,34 @@ static inline struct drm_mm_node *drm_mm_get_block(struct drm_mm_node *parent,
 						   unsigned long size,
 						   unsigned alignment)
 {
-	return drm_mm_get_block_generic(parent, size, alignment, 0);
+	return drm_mm_get_block_generic(parent, size, alignment, 0, 0);
 }
 static inline struct drm_mm_node *drm_mm_get_block_atomic(struct drm_mm_node *parent,
 							  unsigned long size,
 							  unsigned alignment)
 {
-	return drm_mm_get_block_generic(parent, size, alignment, 1);
+	return drm_mm_get_block_generic(parent, size, alignment, 0, 1);
 }
 static inline struct drm_mm_node *drm_mm_get_block_range(
 						struct drm_mm_node *parent,
 						unsigned long size,
 						unsigned alignment,
+						unsigned long color,
 						unsigned long start,
 						unsigned long end)
 {
-	return drm_mm_get_block_range_generic(parent, size, alignment,
-						start, end, 0);
+	return drm_mm_get_block_range_generic(parent, size, alignment, color,
+					      start, end, 0);
 }
 static inline struct drm_mm_node *drm_mm_get_block_atomic_range(
 						struct drm_mm_node *parent,
 						unsigned long size,
 						unsigned alignment,
+						unsigned long color,
 						unsigned long start,
 						unsigned long end)
 {
-	return drm_mm_get_block_range_generic(parent, size, alignment,
+	return drm_mm_get_block_range_generic(parent, size, alignment, color,
 						start, end, 1);
 }
 extern int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node,
@@ -152,15 +161,18 @@ extern void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new
 extern struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
 					      unsigned long size,
 					      unsigned alignment,
-					      int best_match);
+					      unsigned long color,
+					      bool best_match);
 extern struct drm_mm_node *drm_mm_search_free_in_range(
 						const struct drm_mm *mm,
 						unsigned long size,
 						unsigned alignment,
+						unsigned long color,
 						unsigned long start,
 						unsigned long end,
-						int best_match);
-extern int drm_mm_init(struct drm_mm *mm, unsigned long start,
+						bool best_match);
+extern int drm_mm_init(struct drm_mm *mm,
+		       unsigned long start,
 		       unsigned long size);
 extern void drm_mm_takedown(struct drm_mm *mm);
 extern int drm_mm_clean(struct drm_mm *mm);
@@ -171,10 +183,14 @@ static inline struct drm_mm *drm_get_mm(struct drm_mm_node *block)
 	return block->mm;
 }
 
-void drm_mm_init_scan(struct drm_mm *mm, unsigned long size,
-		      unsigned alignment);
-void drm_mm_init_scan_with_range(struct drm_mm *mm, unsigned long size,
+void drm_mm_init_scan(struct drm_mm *mm,
+		      unsigned long size,
+		      unsigned alignment,
+		      unsigned long color);
+void drm_mm_init_scan_with_range(struct drm_mm *mm,
+				 unsigned long size,
 				 unsigned alignment,
+				 unsigned long color,
 				 unsigned long start,
 				 unsigned long end);
 int drm_mm_scan_add_block(struct drm_mm_node *node);
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] drm/i915: Segregate memory domains in the GTT using coloring
  2012-07-09 11:34 [RFC] Set cache level ioctl Chris Wilson
  2012-07-09 11:34 ` [PATCH 1/3] drm: Add colouring to the range allocator Chris Wilson
@ 2012-07-09 11:34 ` Chris Wilson
  2012-07-09 11:34 ` [PATCH 3/3] drm/i915: Export ability of changing cache levels to userspace Chris Wilson
  2 siblings, 0 replies; 18+ messages in thread
From: Chris Wilson @ 2012-07-09 11:34 UTC (permalink / raw)
  To: intel-gfx

Several functions of the GPU have the restriction that differing memory
domains cannot be placed next to each other (as the GPU may prefetch
beyond the end of one domain and hang as it crosses into the other
domain). We use the facility of the drm_mm to mark ranges with a
particular color that corresponds to the cache attributes of those pages
in order to prevent allocating adjacent blocks of differing memory
types.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Conflicts:

	drivers/gpu/drm/i915/i915_drv.h
	drivers/gpu/drm/i915/i915_gem.c
	drivers/gpu/drm/i915/i915_gem_evict.c
---
 drivers/gpu/drm/i915/i915_drv.h       |    4 ++-
 drivers/gpu/drm/i915/i915_gem.c       |   61 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/i915_gem_evict.c |    7 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.c   |   19 ++++++++++
 4 files changed, 80 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b53bd8f..4fb358e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1422,7 +1422,9 @@ void i915_gem_init_global_gtt(struct drm_device *dev,
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev, int min_size,
-					  unsigned alignment, bool mappable);
+					  unsigned alignment,
+					  unsigned cache_level,
+					  bool mappable);
 int i915_gem_evict_everything(struct drm_device *dev, bool purgeable_only);
 
 /* i915_gem_stolen.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cad56dd..da89f13 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2700,6 +2700,36 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
+static bool i915_gem_valid_gtt_space(struct drm_device *dev,
+				     struct drm_mm_node *gtt_space,
+				     unsigned long cache_level)
+{
+	struct drm_mm_node *other;
+
+	/* On non-LLC machines we have to be careful when putting differing
+	 * types of snoopable memory together to avoid the prefetcher
+	 * crossing memory domains and dieing.
+	 */
+	if (HAS_LLC(dev))
+		return true;
+
+	if (gtt_space == NULL)
+		return true;
+
+	if (list_empty(&gtt_space->node_list))
+		return true;
+
+	other = list_entry(gtt_space->node_list.prev, struct drm_mm_node, node_list);
+	if (other->allocated && !other->hole_follows && other->color != cache_level)
+		return false;
+
+	other = list_entry(gtt_space->node_list.next, struct drm_mm_node, node_list);
+	if (other->allocated && !gtt_space->hole_follows && other->color != cache_level)
+		return false;
+
+	return true;
+}
+
 /**
  * Finds free space in the GTT aperture and binds the object there.
  */
@@ -2755,35 +2785,46 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 	if (map_and_fenceable)
 		free_space =
 			drm_mm_search_free_in_range(&dev_priv->mm.gtt_space,
-						    size, alignment, 0,
+						    size, alignment, obj->cache_level,
 						    0, dev_priv->mm.gtt_mappable_end,
-						    0);
+						    false);
 	else
 		free_space = drm_mm_search_free(&dev_priv->mm.gtt_space,
-						size, alignment, 0, 0);
+						size, alignment, obj->cache_level,
+						false);
 
 	if (free_space != NULL) {
 		if (map_and_fenceable)
 			obj->gtt_space =
 				drm_mm_get_block_range_generic(free_space,
-							       size, alignment, 0,
+							       size, alignment, obj->cache_level,
 							       0, dev_priv->mm.gtt_mappable_end,
-							       0);
+							       false);
 		else
 			obj->gtt_space =
-				drm_mm_get_block(free_space, size, alignment);
+				drm_mm_get_block_generic(free_space,
+							 size, alignment, obj->cache_level,
+							 false);
 	}
 	if (obj->gtt_space == NULL) {
 		/* If the gtt is empty and we're still having trouble
 		 * fitting our object in, we're out of memory.
 		 */
 		ret = i915_gem_evict_something(dev, size, alignment,
+					       obj->cache_level,
 					       map_and_fenceable);
 		if (ret)
 			return ret;
 
 		goto search_free;
 	}
+	if (WARN_ON(!i915_gem_valid_gtt_space(dev,
+					      obj->gtt_space,
+					      obj->cache_level))) {
+		drm_mm_put_block(obj->gtt_space);
+		obj->gtt_space = NULL;
+		return -EINVAL;
+	}
 
 	ret = i915_gem_object_get_pages_gtt(obj, gfpmask);
 	if (ret) {
@@ -3004,6 +3045,12 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		return -EBUSY;
 	}
 
+	if (!i915_gem_valid_gtt_space(dev, obj->gtt_space, cache_level)) {
+		ret = i915_gem_object_unbind(obj);
+		if (ret)
+			return ret;
+	}
+
 	if (obj->gtt_space) {
 		ret = i915_gem_object_finish_gpu(obj);
 		if (ret)
@@ -3015,7 +3062,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		 * registers with snooped memory, so relinquish any fences
 		 * currently pointing to our region in the aperture.
 		 */
-		if (INTEL_INFO(obj->base.dev)->gen < 6) {
+		if (INTEL_INFO(dev)->gen < 6) {
 			ret = i915_gem_object_put_fence(obj);
 			if (ret)
 				return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index eba0308..9c5fb08 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -44,7 +44,8 @@ mark_free(struct drm_i915_gem_object *obj, struct list_head *unwind)
 
 int
 i915_gem_evict_something(struct drm_device *dev, int min_size,
-			 unsigned alignment, bool mappable)
+			 unsigned alignment, unsigned cache_level,
+			 bool mappable)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct list_head eviction_list, unwind_list;
@@ -79,11 +80,11 @@ i915_gem_evict_something(struct drm_device *dev, int min_size,
 	INIT_LIST_HEAD(&unwind_list);
 	if (mappable)
 		drm_mm_init_scan_with_range(&dev_priv->mm.gtt_space,
-					    min_size, alignment, 0,
+					    min_size, alignment, cache_level,
 					    0, dev_priv->mm.gtt_mappable_end);
 	else
 		drm_mm_init_scan(&dev_priv->mm.gtt_space,
-				 min_size, alignment, 0);
+				 min_size, alignment, cache_level);
 
 	/* First see if there is a large enough contiguous idle region... */
 	list_for_each_entry(obj, &dev_priv->mm.inactive_list, mm_list) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9fd25a4..4584f7f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -422,6 +422,23 @@ void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj)
 	undo_idling(dev_priv, interruptible);
 }
 
+static void i915_gtt_color_adjust(struct drm_mm_node *node,
+				  unsigned long color,
+				  unsigned long *start,
+				  unsigned long *end)
+{
+	if (node->color != color)
+		*start += 4096;
+
+	if (!list_empty(&node->node_list)) {
+		node = list_entry(node->node_list.next,
+				  struct drm_mm_node,
+				  node_list);
+		if (node->allocated && node->color != color)
+			*end -= 4096;
+	}
+}
+
 void i915_gem_init_global_gtt(struct drm_device *dev,
 			      unsigned long start,
 			      unsigned long mappable_end,
@@ -431,6 +448,8 @@ void i915_gem_init_global_gtt(struct drm_device *dev,
 
 	/* Substract the guard page ... */
 	drm_mm_init(&dev_priv->mm.gtt_space, start, end - start - PAGE_SIZE);
+	if (!HAS_LLC(dev))
+		dev_priv->mm.gtt_space.color_adjust = i915_gtt_color_adjust;
 
 	dev_priv->mm.gtt_start = start;
 	dev_priv->mm.gtt_mappable_end = mappable_end;
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] drm/i915: Export ability of changing cache levels to userspace
  2012-07-09 11:34 [RFC] Set cache level ioctl Chris Wilson
  2012-07-09 11:34 ` [PATCH 1/3] drm: Add colouring to the range allocator Chris Wilson
  2012-07-09 11:34 ` [PATCH 2/3] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
@ 2012-07-09 11:34 ` Chris Wilson
  2012-07-10  8:54   ` Daniel Vetter
  2 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2012-07-09 11:34 UTC (permalink / raw)
  To: intel-gfx

By selecting the cache level (essentially whether or not the CPU snoops
any updates to the bo, and on more recent machines whether it resides
inside the CPU's last-level-cache) a userspace driver is able to then
manage all of its memory within buffer objects, if it so desires. This
enables the userspace driver to accelerate uploads and more importantly
downloads from the GPU and to able to mix CPU and GPU rendering/activity
efficiently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c |    2 ++
 drivers/gpu/drm/i915/i915_drv.h |   11 ++++++---
 drivers/gpu/drm/i915/i915_gem.c |   50 +++++++++++++++++++++++++++++++++++++++
 include/drm/i915_drm.h          |   16 +++++++++++++
 4 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index f64ef4b..2302e008 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1829,6 +1829,8 @@ struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_UNPIN, i915_gem_unpin_ioctl, DRM_AUTH|DRM_ROOT_ONLY|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_BUSY, i915_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(I915_GEM_SET_CACHE_LEVEL, i915_gem_set_cache_level_ioctl, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(I915_GEM_GET_CACHE_LEVEL, i915_gem_get_cache_level_ioctl, DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_THROTTLE, i915_gem_throttle_ioctl, DRM_AUTH|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_ENTERVT, i915_gem_entervt_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_LEAVEVT, i915_gem_leavevt_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4fb358e..00a4cb1 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -40,6 +40,7 @@
 #include <linux/backlight.h>
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
+#include <drm/i915_drm.h>
 
 /* General customization:
  */
@@ -854,9 +855,9 @@ enum hdmi_force_audio {
 };
 
 enum i915_cache_level {
-	I915_CACHE_NONE,
-	I915_CACHE_LLC,
-	I915_CACHE_LLC_MLC, /* gen6+ */
+	I915_CACHE_NONE = I915_CACHE_LEVEL_NONE,
+	I915_CACHE_LLC = I915_CACHE_LEVEL_LLC,
+	I915_CACHE_LLC_MLC = I915_CACHE_LEVEL_LLC_MLC, /* gen6+ */
 };
 
 struct drm_i915_gem_object {
@@ -1249,6 +1250,10 @@ int i915_gem_unpin_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
 int i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
+int i915_gem_get_cache_level_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file);
+int i915_gem_set_cache_level_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file);
 int i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
 			    struct drm_file *file_priv);
 int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index da89f13..75d67b8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3102,6 +3102,56 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
+int i915_gem_get_cache_level_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file)
+{
+	struct drm_i915_gem_cache_level *args = data;
+	struct drm_i915_gem_object *obj;
+	int ret;
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+
+	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
+	if (&obj->base == NULL) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+
+	args->cache_level = obj->cache_level;
+
+	drm_gem_object_unreference(&obj->base);
+unlock:
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
+}
+
+int i915_gem_set_cache_level_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file)
+{
+	struct drm_i915_gem_cache_level *args = data;
+	struct drm_i915_gem_object *obj;
+	int ret;
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+
+	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
+	if (&obj->base == NULL) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+
+	ret = i915_gem_object_set_cache_level(obj, args->cache_level);
+
+	drm_gem_object_unreference(&obj->base);
+unlock:
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
+}
+
 /*
  * Prepare buffer for display plane (scanout, cursors, etc).
  * Can be called from an uninterruptible phase (modesetting) and allows
diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 564005e..058feba 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -203,6 +203,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_WAIT	0x2c
 #define DRM_I915_GEM_CONTEXT_CREATE	0x2d
 #define DRM_I915_GEM_CONTEXT_DESTROY	0x2e
+#define DRM_I915_GEM_SET_CACHE_LEVEL	0x2f
+#define DRM_I915_GEM_GET_CACHE_LEVEL	0x30
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -227,6 +229,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_PIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_PIN, struct drm_i915_gem_pin)
 #define DRM_IOCTL_I915_GEM_UNPIN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_UNPIN, struct drm_i915_gem_unpin)
 #define DRM_IOCTL_I915_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_BUSY, struct drm_i915_gem_busy)
+#define DRM_IOCTL_I915_GEM_SET_CACHE_LEVEL		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_SET_CACHE_LEVEL, struct drm_i915_gem_cache_level)
+#define DRM_IOCTL_I915_GEM_GET_CACHE_LEVEL		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_GET_CACHE_LEVEL, struct drm_i915_gem_cache_level)
 #define DRM_IOCTL_I915_GEM_THROTTLE	DRM_IO ( DRM_COMMAND_BASE + DRM_I915_GEM_THROTTLE)
 #define DRM_IOCTL_I915_GEM_ENTERVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_ENTERVT)
 #define DRM_IOCTL_I915_GEM_LEAVEVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_LEAVEVT)
@@ -707,6 +711,18 @@ struct drm_i915_gem_busy {
 	__u32 busy;
 };
 
+#define I915_CACHE_LEVEL_NONE		0
+#define I915_CACHE_LEVEL_LLC		1
+#define I915_CACHE_LEVEL_LLC_MLC	2 /* gen6+ */
+
+struct drm_i915_gem_cache_level {
+	/** Handle of the buffer to check for busy */
+	__u32 handle;
+
+	/** Cache level to apply or return value */
+	__u32 cache_level;
+};
+
 #define I915_TILING_NONE	0
 #define I915_TILING_X		1
 #define I915_TILING_Y		2
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] drm/i915: Export ability of changing cache levels to userspace
  2012-07-09 11:34 ` [PATCH 3/3] drm/i915: Export ability of changing cache levels to userspace Chris Wilson
@ 2012-07-10  8:54   ` Daniel Vetter
  2012-07-10  9:00     ` Chris Wilson
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2012-07-10  8:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Mon, Jul 09, 2012 at 12:34:39PM +0100, Chris Wilson wrote:
> By selecting the cache level (essentially whether or not the CPU snoops
> any updates to the bo, and on more recent machines whether it resides
> inside the CPU's last-level-cache) a userspace driver is able to then
> manage all of its memory within buffer objects, if it so desires. This
> enables the userspace driver to accelerate uploads and more importantly
> downloads from the GPU and to able to mix CPU and GPU rendering/activity
> efficiently.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_dma.c |    2 ++
>  drivers/gpu/drm/i915/i915_drv.h |   11 ++++++---
>  drivers/gpu/drm/i915/i915_gem.c |   50 +++++++++++++++++++++++++++++++++++++++
>  include/drm/i915_drm.h          |   16 +++++++++++++
>  4 files changed, 76 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index f64ef4b..2302e008 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1829,6 +1829,8 @@ struct drm_ioctl_desc i915_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY|DRM_UNLOCKED),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_UNPIN, i915_gem_unpin_ioctl, DRM_AUTH|DRM_ROOT_ONLY|DRM_UNLOCKED),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_BUSY, i915_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_SET_CACHE_LEVEL, i915_gem_set_cache_level_ioctl, DRM_UNLOCKED),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_GET_CACHE_LEVEL, i915_gem_get_cache_level_ioctl, DRM_UNLOCKED),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_THROTTLE, i915_gem_throttle_ioctl, DRM_AUTH|DRM_UNLOCKED),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_ENTERVT, i915_gem_entervt_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_LEAVEVT, i915_gem_leavevt_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4fb358e..00a4cb1 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -40,6 +40,7 @@
>  #include <linux/backlight.h>
>  #include <linux/intel-iommu.h>
>  #include <linux/kref.h>
> +#include <drm/i915_drm.h>
>  
>  /* General customization:
>   */
> @@ -854,9 +855,9 @@ enum hdmi_force_audio {
>  };
>  
>  enum i915_cache_level {
> -	I915_CACHE_NONE,
> -	I915_CACHE_LLC,
> -	I915_CACHE_LLC_MLC, /* gen6+ */
> +	I915_CACHE_NONE = I915_CACHE_LEVEL_NONE,
> +	I915_CACHE_LLC = I915_CACHE_LEVEL_LLC,
> +	I915_CACHE_LLC_MLC = I915_CACHE_LEVEL_LLC_MLC, /* gen6+ */

LLC_MLC is a lie, it doesn't exist on gen6. And gen7 has something else
called l3$ cache, but that seems to be more special-purpose in nature
(hence I have a feeling it's better if userspace just sets the desired
caching in the surface state).

The other thing that's irking me is whether we want different names for
different kinds of caching or not, i.e. whether we should split out
pre-gen6 coherent mem from gen6+ coherent stuff. Also, on vlv we don't
have a llc cache, but we can still support coherent memory like on gen6+
with llc (it's just a bit slower for gpu-only use, hence not the default).

I guess I'd bikeshed less if we color this I915_CACHE_LEVEL_CPU_COHERENT
(and maybe add more specific variants in the future if we need them).
-Daniel

>  };
>  
>  struct drm_i915_gem_object {
> @@ -1249,6 +1250,10 @@ int i915_gem_unpin_ioctl(struct drm_device *dev, void *data,
>  			 struct drm_file *file_priv);
>  int i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>  			struct drm_file *file_priv);
> +int i915_gem_get_cache_level_ioctl(struct drm_device *dev, void *data,
> +				   struct drm_file *file);
> +int i915_gem_set_cache_level_ioctl(struct drm_device *dev, void *data,
> +				   struct drm_file *file);
>  int i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
>  			    struct drm_file *file_priv);
>  int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index da89f13..75d67b8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3102,6 +3102,56 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  	return 0;
>  }
>  
> +int i915_gem_get_cache_level_ioctl(struct drm_device *dev, void *data,
> +				   struct drm_file *file)
> +{
> +	struct drm_i915_gem_cache_level *args = data;
> +	struct drm_i915_gem_object *obj;
> +	int ret;
> +
> +	ret = i915_mutex_lock_interruptible(dev);
> +	if (ret)
> +		return ret;
> +
> +	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
> +	if (&obj->base == NULL) {
> +		ret = -ENOENT;
> +		goto unlock;
> +	}
> +
> +	args->cache_level = obj->cache_level;
> +
> +	drm_gem_object_unreference(&obj->base);
> +unlock:
> +	mutex_unlock(&dev->struct_mutex);
> +	return ret;
> +}
> +
> +int i915_gem_set_cache_level_ioctl(struct drm_device *dev, void *data,
> +				   struct drm_file *file)
> +{
> +	struct drm_i915_gem_cache_level *args = data;
> +	struct drm_i915_gem_object *obj;
> +	int ret;
> +
> +	ret = i915_mutex_lock_interruptible(dev);
> +	if (ret)
> +		return ret;
> +
> +	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
> +	if (&obj->base == NULL) {
> +		ret = -ENOENT;
> +		goto unlock;
> +	}
> +
> +	ret = i915_gem_object_set_cache_level(obj, args->cache_level);
> +
> +	drm_gem_object_unreference(&obj->base);
> +unlock:
> +	mutex_unlock(&dev->struct_mutex);
> +	return ret;
> +}
> +
>  /*
>   * Prepare buffer for display plane (scanout, cursors, etc).
>   * Can be called from an uninterruptible phase (modesetting) and allows
> diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
> index 564005e..058feba 100644
> --- a/include/drm/i915_drm.h
> +++ b/include/drm/i915_drm.h
> @@ -203,6 +203,8 @@ typedef struct _drm_i915_sarea {
>  #define DRM_I915_GEM_WAIT	0x2c
>  #define DRM_I915_GEM_CONTEXT_CREATE	0x2d
>  #define DRM_I915_GEM_CONTEXT_DESTROY	0x2e
> +#define DRM_I915_GEM_SET_CACHE_LEVEL	0x2f
> +#define DRM_I915_GEM_GET_CACHE_LEVEL	0x30
>  
>  #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
>  #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
> @@ -227,6 +229,8 @@ typedef struct _drm_i915_sarea {
>  #define DRM_IOCTL_I915_GEM_PIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_PIN, struct drm_i915_gem_pin)
>  #define DRM_IOCTL_I915_GEM_UNPIN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_UNPIN, struct drm_i915_gem_unpin)
>  #define DRM_IOCTL_I915_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_BUSY, struct drm_i915_gem_busy)
> +#define DRM_IOCTL_I915_GEM_SET_CACHE_LEVEL		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_SET_CACHE_LEVEL, struct drm_i915_gem_cache_level)
> +#define DRM_IOCTL_I915_GEM_GET_CACHE_LEVEL		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_GET_CACHE_LEVEL, struct drm_i915_gem_cache_level)
>  #define DRM_IOCTL_I915_GEM_THROTTLE	DRM_IO ( DRM_COMMAND_BASE + DRM_I915_GEM_THROTTLE)
>  #define DRM_IOCTL_I915_GEM_ENTERVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_ENTERVT)
>  #define DRM_IOCTL_I915_GEM_LEAVEVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_LEAVEVT)
> @@ -707,6 +711,18 @@ struct drm_i915_gem_busy {
>  	__u32 busy;
>  };
>  
> +#define I915_CACHE_LEVEL_NONE		0
> +#define I915_CACHE_LEVEL_LLC		1
> +#define I915_CACHE_LEVEL_LLC_MLC	2 /* gen6+ */
> +
> +struct drm_i915_gem_cache_level {
> +	/** Handle of the buffer to check for busy */
> +	__u32 handle;
> +
> +	/** Cache level to apply or return value */
> +	__u32 cache_level;
> +};
> +
>  #define I915_TILING_NONE	0
>  #define I915_TILING_X		1
>  #define I915_TILING_Y		2
> -- 
> 1.7.10
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] drm/i915: Export ability of changing cache levels to userspace
  2012-07-10  8:54   ` Daniel Vetter
@ 2012-07-10  9:00     ` Chris Wilson
  2012-07-10  9:27       ` [PATCH] " Chris Wilson
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2012-07-10  9:00 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, 10 Jul 2012 10:54:02 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Jul 09, 2012 at 12:34:39PM +0100, Chris Wilson wrote:
> >  enum i915_cache_level {
> > -	I915_CACHE_NONE,
> > -	I915_CACHE_LLC,
> > -	I915_CACHE_LLC_MLC, /* gen6+ */
> > +	I915_CACHE_NONE = I915_CACHE_LEVEL_NONE,
> > +	I915_CACHE_LLC = I915_CACHE_LEVEL_LLC,
> > +	I915_CACHE_LLC_MLC = I915_CACHE_LEVEL_LLC_MLC, /* gen6+ */
> 
> LLC_MLC is a lie, it doesn't exist on gen6. And gen7 has something else
> called l3$ cache, but that seems to be more special-purpose in nature
> (hence I have a feeling it's better if userspace just sets the desired
> caching in the surface state).
> 
> The other thing that's irking me is whether we want different names for
> different kinds of caching or not, i.e. whether we should split out
> pre-gen6 coherent mem from gen6+ coherent stuff. Also, on vlv we don't
> have a llc cache, but we can still support coherent memory like on gen6+
> with llc (it's just a bit slower for gpu-only use, hence not the default).

"Just a bit" will be an understatement judging by the snoopable
architectures upon which it is based. :-p

> I guess I'd bikeshed less if we color this I915_CACHE_LEVEL_CPU_COHERENT
> (and maybe add more specific variants in the future if we need them).

I was half thinking towards extensibility, but we are more likely to
need new ioctls rather than just add to this set of "cache levels".

I am happy with just having two levels in the userspace for this:
uncached, snoopable. They are generic enough to cover the last 7
generations, good enough for the next 3?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] drm: Add colouring to the range allocator
  2012-07-09 11:34 ` [PATCH 1/3] drm: Add colouring to the range allocator Chris Wilson
@ 2012-07-10  9:21   ` Daniel Vetter
  2012-07-10  9:29     ` Chris Wilson
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2012-07-10  9:21 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Benjamin Herrenschmidt, intel-gfx, Jerome Glisse, Ben Skeggs,
	Daniel Vetter, Alex Deucher

On Mon, Jul 09, 2012 at 12:34:37PM +0100, Chris Wilson wrote:
> In order to support snoopable memory on non-LLC architectures (so that
> we can bind vgem objects into the i915 GATT for example), we have to
> avoid the prefetcher on the GPU from crossing memory domains and so
> prevent allocation of a snoopable PTE immediately following an uncached
> PTE. To do that, we need to extend the range allocator with support for
> tracking and segregating different node colours.
> 
> This will be used by i915 to segregate memory domains within the GTT.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Dave Airlie <airlied@redhat.com
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Jerome Glisse <jglisse@redhat.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Two little bikesheds:
- Do we really need 64bits of colour? Especially since we have quite a few
  bits of space left ...
- I think we could add a new insert_color helper that always takes a range
  (we can select the right rang in the driver). That way this patch
  wouldn't need to touch the drivers, and we could take the opportunity to
  embed the gtt_space mm_node into our gem object ...

Besides that this looks good and I like it, but I think I've mentioned
that way back when this patch first popped up ;-)
-Daniel

> 
> Conflicts:
> 
> 	drivers/gpu/drm/i915/i915_gem_stolen.c
> ---
>  drivers/gpu/drm/drm_gem.c                  |    2 +-
>  drivers/gpu/drm/drm_mm.c                   |  151 +++++++++++++++++-----------
>  drivers/gpu/drm/i915/i915_gem.c            |    6 +-
>  drivers/gpu/drm/i915/i915_gem_evict.c      |    9 +-
>  drivers/gpu/drm/i915/i915_gem_stolen.c     |    5 +-
>  drivers/gpu/drm/nouveau/nouveau_notifier.c |    4 +-
>  drivers/gpu/drm/nouveau/nouveau_object.c   |    2 +-
>  drivers/gpu/drm/nouveau/nv04_instmem.c     |    2 +-
>  drivers/gpu/drm/nouveau/nv20_fb.c          |    2 +-
>  drivers/gpu/drm/nouveau/nv50_vram.c        |    2 +-
>  drivers/gpu/drm/ttm/ttm_bo.c               |    2 +-
>  drivers/gpu/drm/ttm/ttm_bo_manager.c       |    4 +-
>  include/drm/drm_mm.h                       |   38 +++++--
>  13 files changed, 143 insertions(+), 86 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index d58e69d..961ccd8 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -354,7 +354,7 @@ drm_gem_create_mmap_offset(struct drm_gem_object *obj)
>  
>  	/* Get a DRM GEM mmap offset allocated... */
>  	list->file_offset_node = drm_mm_search_free(&mm->offset_manager,
> -			obj->size / PAGE_SIZE, 0, 0);
> +			obj->size / PAGE_SIZE, 0, 0, false);
>  
>  	if (!list->file_offset_node) {
>  		DRM_ERROR("failed to allocate offset for bo %d\n", obj->name);
> diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> index 961fb54..0311dba 100644
> --- a/drivers/gpu/drm/drm_mm.c
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -118,45 +118,53 @@ static inline unsigned long drm_mm_hole_node_end(struct drm_mm_node *hole_node)
>  
>  static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
>  				 struct drm_mm_node *node,
> -				 unsigned long size, unsigned alignment)
> +				 unsigned long size, unsigned alignment,
> +				 unsigned long color)
>  {
>  	struct drm_mm *mm = hole_node->mm;
> -	unsigned long tmp = 0, wasted = 0;
>  	unsigned long hole_start = drm_mm_hole_node_start(hole_node);
>  	unsigned long hole_end = drm_mm_hole_node_end(hole_node);
> +	unsigned long adj_start = hole_start;
> +	unsigned long adj_end = hole_end;
>  
>  	BUG_ON(!hole_node->hole_follows || node->allocated);
>  
> -	if (alignment)
> -		tmp = hole_start % alignment;
> +	if (mm->color_adjust)
> +		mm->color_adjust(hole_node, color, &adj_start, &adj_end);
>  
> -	if (!tmp) {
> +	if (alignment) {
> +		unsigned tmp = adj_start % alignment;
> +		if (tmp)
> +			adj_start += alignment - tmp;
> +	}
> +
> +	if (adj_start == hole_start) {
>  		hole_node->hole_follows = 0;
> -		list_del_init(&hole_node->hole_stack);
> -	} else
> -		wasted = alignment - tmp;
> +		list_del(&hole_node->hole_stack);
> +	}
>  
> -	node->start = hole_start + wasted;
> +	node->start = adj_start;
>  	node->size = size;
>  	node->mm = mm;
> +	node->color = color;
>  	node->allocated = 1;
>  
>  	INIT_LIST_HEAD(&node->hole_stack);
>  	list_add(&node->node_list, &hole_node->node_list);
>  
> -	BUG_ON(node->start + node->size > hole_end);
> +	BUG_ON(node->start + node->size > adj_end);
>  
> +	node->hole_follows = 0;
>  	if (node->start + node->size < hole_end) {
>  		list_add(&node->hole_stack, &mm->hole_stack);
>  		node->hole_follows = 1;
> -	} else {
> -		node->hole_follows = 0;
>  	}
>  }
>  
>  struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node,
>  					     unsigned long size,
>  					     unsigned alignment,
> +					     unsigned long color,
>  					     int atomic)
>  {
>  	struct drm_mm_node *node;
> @@ -165,7 +173,7 @@ struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node,
>  	if (unlikely(node == NULL))
>  		return NULL;
>  
> -	drm_mm_insert_helper(hole_node, node, size, alignment);
> +	drm_mm_insert_helper(hole_node, node, size, alignment, color);
>  
>  	return node;
>  }
> @@ -181,11 +189,11 @@ int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node,
>  {
>  	struct drm_mm_node *hole_node;
>  
> -	hole_node = drm_mm_search_free(mm, size, alignment, 0);
> +	hole_node = drm_mm_search_free(mm, size, alignment, 0, false);
>  	if (!hole_node)
>  		return -ENOSPC;
>  
> -	drm_mm_insert_helper(hole_node, node, size, alignment);
> +	drm_mm_insert_helper(hole_node, node, size, alignment, 0);
>  
>  	return 0;
>  }
> @@ -194,50 +202,57 @@ EXPORT_SYMBOL(drm_mm_insert_node);
>  static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node,
>  				       struct drm_mm_node *node,
>  				       unsigned long size, unsigned alignment,
> +				       unsigned long color,
>  				       unsigned long start, unsigned long end)
>  {
>  	struct drm_mm *mm = hole_node->mm;
> -	unsigned long tmp = 0, wasted = 0;
>  	unsigned long hole_start = drm_mm_hole_node_start(hole_node);
>  	unsigned long hole_end = drm_mm_hole_node_end(hole_node);
> +	unsigned long adj_start = hole_start;
> +	unsigned long adj_end = hole_end;
>  
>  	BUG_ON(!hole_node->hole_follows || node->allocated);
>  
> -	if (hole_start < start)
> -		wasted += start - hole_start;
> -	if (alignment)
> -		tmp = (hole_start + wasted) % alignment;
> +	if (mm->color_adjust)
> +		mm->color_adjust(hole_node, color, &adj_start, &adj_end);
>  
> -	if (tmp)
> -		wasted += alignment - tmp;
> +	if (adj_start < start)
> +		adj_start = start;
>  
> -	if (!wasted) {
> +	if (alignment) {
> +		unsigned tmp = adj_start % alignment;
> +		if (tmp)
> +			adj_start += alignment - tmp;
> +	}
> +
> +	if (adj_start == hole_start) {
>  		hole_node->hole_follows = 0;
> -		list_del_init(&hole_node->hole_stack);
> +		list_del(&hole_node->hole_stack);
>  	}
>  
> -	node->start = hole_start + wasted;
> +	node->start = adj_start;
>  	node->size = size;
>  	node->mm = mm;
> +	node->color = color;
>  	node->allocated = 1;
>  
>  	INIT_LIST_HEAD(&node->hole_stack);
>  	list_add(&node->node_list, &hole_node->node_list);
>  
> -	BUG_ON(node->start + node->size > hole_end);
> +	BUG_ON(node->start + node->size > adj_end);
>  	BUG_ON(node->start + node->size > end);
>  
> +	node->hole_follows = 0;
>  	if (node->start + node->size < hole_end) {
>  		list_add(&node->hole_stack, &mm->hole_stack);
>  		node->hole_follows = 1;
> -	} else {
> -		node->hole_follows = 0;
>  	}
>  }
>  
>  struct drm_mm_node *drm_mm_get_block_range_generic(struct drm_mm_node *hole_node,
>  						unsigned long size,
>  						unsigned alignment,
> +						unsigned long color,
>  						unsigned long start,
>  						unsigned long end,
>  						int atomic)
> @@ -248,7 +263,7 @@ struct drm_mm_node *drm_mm_get_block_range_generic(struct drm_mm_node *hole_node
>  	if (unlikely(node == NULL))
>  		return NULL;
>  
> -	drm_mm_insert_helper_range(hole_node, node, size, alignment,
> +	drm_mm_insert_helper_range(hole_node, node, size, alignment, color,
>  				   start, end);
>  
>  	return node;
> @@ -266,12 +281,12 @@ int drm_mm_insert_node_in_range(struct drm_mm *mm, struct drm_mm_node *node,
>  {
>  	struct drm_mm_node *hole_node;
>  
> -	hole_node = drm_mm_search_free_in_range(mm, size, alignment,
> -						start, end, 0);
> +	hole_node = drm_mm_search_free_in_range(mm, size, alignment, 0,
> +						start, end, false);
>  	if (!hole_node)
>  		return -ENOSPC;
>  
> -	drm_mm_insert_helper_range(hole_node, node, size, alignment,
> +	drm_mm_insert_helper_range(hole_node, node, size, alignment, 0,
>  				   start, end);
>  
>  	return 0;
> @@ -336,27 +351,23 @@ EXPORT_SYMBOL(drm_mm_put_block);
>  static int check_free_hole(unsigned long start, unsigned long end,
>  			   unsigned long size, unsigned alignment)
>  {
> -	unsigned wasted = 0;
> -
>  	if (end - start < size)
>  		return 0;
>  
>  	if (alignment) {
>  		unsigned tmp = start % alignment;
>  		if (tmp)
> -			wasted = alignment - tmp;
> -	}
> -
> -	if (end >= start + size + wasted) {
> -		return 1;
> +			start += alignment - tmp;
>  	}
>  
> -	return 0;
> +	return end >= start + size;
>  }
>  
>  struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
>  				       unsigned long size,
> -				       unsigned alignment, int best_match)
> +				       unsigned alignment,
> +				       unsigned long color,
> +				       bool best_match)
>  {
>  	struct drm_mm_node *entry;
>  	struct drm_mm_node *best;
> @@ -368,10 +379,17 @@ struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
>  	best_size = ~0UL;
>  
>  	list_for_each_entry(entry, &mm->hole_stack, hole_stack) {
> +		unsigned long adj_start = drm_mm_hole_node_start(entry);
> +		unsigned long adj_end = drm_mm_hole_node_end(entry);
> +
> +		if (mm->color_adjust) {
> +			mm->color_adjust(entry, color, &adj_start, &adj_end);
> +			if (adj_end <= adj_start)
> +				continue;
> +		}
> +
>  		BUG_ON(!entry->hole_follows);
> -		if (!check_free_hole(drm_mm_hole_node_start(entry),
> -				     drm_mm_hole_node_end(entry),
> -				     size, alignment))
> +		if (!check_free_hole(adj_start, adj_end, size, alignment))
>  			continue;
>  
>  		if (!best_match)
> @@ -390,9 +408,10 @@ EXPORT_SYMBOL(drm_mm_search_free);
>  struct drm_mm_node *drm_mm_search_free_in_range(const struct drm_mm *mm,
>  						unsigned long size,
>  						unsigned alignment,
> +						unsigned long color,
>  						unsigned long start,
>  						unsigned long end,
> -						int best_match)
> +						bool best_match)
>  {
>  	struct drm_mm_node *entry;
>  	struct drm_mm_node *best;
> @@ -410,6 +429,13 @@ struct drm_mm_node *drm_mm_search_free_in_range(const struct drm_mm *mm,
>  			end : drm_mm_hole_node_end(entry);
>  
>  		BUG_ON(!entry->hole_follows);
> +
> +		if (mm->color_adjust) {
> +			mm->color_adjust(entry, color, &adj_start, &adj_end);
> +			if (adj_end <= adj_start)
> +				continue;
> +		}
> +
>  		if (!check_free_hole(adj_start, adj_end, size, alignment))
>  			continue;
>  
> @@ -437,6 +463,7 @@ void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new)
>  	new->mm = old->mm;
>  	new->start = old->start;
>  	new->size = old->size;
> +	new->color = old->color;
>  
>  	old->allocated = 0;
>  	new->allocated = 1;
> @@ -452,9 +479,12 @@ EXPORT_SYMBOL(drm_mm_replace_node);
>   * Warning: As long as the scan list is non-empty, no other operations than
>   * adding/removing nodes to/from the scan list are allowed.
>   */
> -void drm_mm_init_scan(struct drm_mm *mm, unsigned long size,
> -		      unsigned alignment)
> +void drm_mm_init_scan(struct drm_mm *mm,
> +		      unsigned long size,
> +		      unsigned alignment,
> +		      unsigned long color)
>  {
> +	mm->scan_color = color;
>  	mm->scan_alignment = alignment;
>  	mm->scan_size = size;
>  	mm->scanned_blocks = 0;
> @@ -474,11 +504,14 @@ EXPORT_SYMBOL(drm_mm_init_scan);
>   * Warning: As long as the scan list is non-empty, no other operations than
>   * adding/removing nodes to/from the scan list are allowed.
>   */
> -void drm_mm_init_scan_with_range(struct drm_mm *mm, unsigned long size,
> +void drm_mm_init_scan_with_range(struct drm_mm *mm,
> +				 unsigned long size,
>  				 unsigned alignment,
> +				 unsigned long color,
>  				 unsigned long start,
>  				 unsigned long end)
>  {
> +	mm->scan_color = color;
>  	mm->scan_alignment = alignment;
>  	mm->scan_size = size;
>  	mm->scanned_blocks = 0;
> @@ -522,17 +555,21 @@ int drm_mm_scan_add_block(struct drm_mm_node *node)
>  
>  	hole_start = drm_mm_hole_node_start(prev_node);
>  	hole_end = drm_mm_hole_node_end(prev_node);
> +
> +	adj_start = hole_start;
> +	adj_end = hole_end;
> +
> +	if (mm->color_adjust)
> +		mm->color_adjust(prev_node, mm->scan_color, &adj_start, &adj_end);
> +
>  	if (mm->scan_check_range) {
> -		adj_start = hole_start < mm->scan_start ?
> -			mm->scan_start : hole_start;
> -		adj_end = hole_end > mm->scan_end ?
> -			mm->scan_end : hole_end;
> -	} else {
> -		adj_start = hole_start;
> -		adj_end = hole_end;
> +		if (adj_start < mm->scan_start)
> +			adj_start = mm->scan_start;
> +		if (adj_end > mm->scan_end)
> +			adj_end = mm->scan_end;
>  	}
>  
> -	if (check_free_hole(adj_start , adj_end,
> +	if (check_free_hole(adj_start, adj_end,
>  			    mm->scan_size, mm->scan_alignment)) {
>  		mm->scan_hit_start = hole_start;
>  		mm->scan_hit_size = hole_end;
> @@ -616,6 +653,8 @@ int drm_mm_init(struct drm_mm * mm, unsigned long start, unsigned long size)
>  	mm->head_node.size = start - mm->head_node.start;
>  	list_add_tail(&mm->head_node.hole_stack, &mm->hole_stack);
>  
> +	mm->color_adjust = NULL;
> +
>  	return 0;
>  }
>  EXPORT_SYMBOL(drm_mm_init);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index db438f0..cad56dd 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2756,18 +2756,18 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
>  		free_space =
>  			drm_mm_search_free_in_range(&dev_priv->mm.gtt_space,
>  						    size, alignment, 0,
> -						    dev_priv->mm.gtt_mappable_end,
> +						    0, dev_priv->mm.gtt_mappable_end,
>  						    0);
>  	else
>  		free_space = drm_mm_search_free(&dev_priv->mm.gtt_space,
> -						size, alignment, 0);
> +						size, alignment, 0, 0);
>  
>  	if (free_space != NULL) {
>  		if (map_and_fenceable)
>  			obj->gtt_space =
>  				drm_mm_get_block_range_generic(free_space,
>  							       size, alignment, 0,
> -							       dev_priv->mm.gtt_mappable_end,
> +							       0, dev_priv->mm.gtt_mappable_end,
>  							       0);
>  		else
>  			obj->gtt_space =
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index ae7c24e..eba0308 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -78,11 +78,12 @@ i915_gem_evict_something(struct drm_device *dev, int min_size,
>  
>  	INIT_LIST_HEAD(&unwind_list);
>  	if (mappable)
> -		drm_mm_init_scan_with_range(&dev_priv->mm.gtt_space, min_size,
> -					    alignment, 0,
> -					    dev_priv->mm.gtt_mappable_end);
> +		drm_mm_init_scan_with_range(&dev_priv->mm.gtt_space,
> +					    min_size, alignment, 0,
> +					    0, dev_priv->mm.gtt_mappable_end);
>  	else
> -		drm_mm_init_scan(&dev_priv->mm.gtt_space, min_size, alignment);
> +		drm_mm_init_scan(&dev_priv->mm.gtt_space,
> +				 min_size, alignment, 0);
>  
>  	/* First see if there is a large enough contiguous idle region... */
>  	list_for_each_entry(obj, &dev_priv->mm.inactive_list, mm_list) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index ada2e90..dba13cf 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -111,7 +111,8 @@ static void i915_setup_compression(struct drm_device *dev, int size)
>  	/* Just in case the BIOS is doing something questionable. */
>  	intel_disable_fbc(dev);
>  
> -	compressed_fb = drm_mm_search_free(&dev_priv->mm.stolen, size, 4096, 0);
> +	compressed_fb = drm_mm_search_free(&dev_priv->mm.stolen,
> +					   size, 4096, 0, 0);
>  	if (compressed_fb)
>  		compressed_fb = drm_mm_get_block(compressed_fb, size, 4096);
>  	if (!compressed_fb)
> @@ -123,7 +124,7 @@ static void i915_setup_compression(struct drm_device *dev, int size)
>  
>  	if (!(IS_GM45(dev) || HAS_PCH_SPLIT(dev))) {
>  		compressed_llb = drm_mm_search_free(&dev_priv->mm.stolen,
> -						    4096, 4096, 0);
> +						    4096, 4096, 0, 0);
>  		if (compressed_llb)
>  			compressed_llb = drm_mm_get_block(compressed_llb,
>  							  4096, 4096);
> diff --git a/drivers/gpu/drm/nouveau/nouveau_notifier.c b/drivers/gpu/drm/nouveau/nouveau_notifier.c
> index 2ef883c..65c64b1 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_notifier.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_notifier.c
> @@ -118,10 +118,10 @@ nouveau_notifier_alloc(struct nouveau_channel *chan, uint32_t handle,
>  	uint64_t offset;
>  	int target, ret;
>  
> -	mem = drm_mm_search_free_in_range(&chan->notifier_heap, size, 0,
> +	mem = drm_mm_search_free_in_range(&chan->notifier_heap, size, 0, 0,
>  					  start, end, 0);
>  	if (mem)
> -		mem = drm_mm_get_block_range(mem, size, 0, start, end);
> +		mem = drm_mm_get_block_range(mem, size, 0, 0, start, end);
>  	if (!mem) {
>  		NV_ERROR(dev, "Channel %d notifier block full\n", chan->id);
>  		return -ENOMEM;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_object.c b/drivers/gpu/drm/nouveau/nouveau_object.c
> index b190cc0..15d5d97 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_object.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_object.c
> @@ -163,7 +163,7 @@ nouveau_gpuobj_new(struct drm_device *dev, struct nouveau_channel *chan,
>  	spin_unlock(&dev_priv->ramin_lock);
>  
>  	if (!(flags & NVOBJ_FLAG_VM) && chan) {
> -		ramin = drm_mm_search_free(&chan->ramin_heap, size, align, 0);
> +		ramin = drm_mm_search_free(&chan->ramin_heap, size, align, 0, 0);
>  		if (ramin)
>  			ramin = drm_mm_get_block(ramin, size, align);
>  		if (!ramin) {
> diff --git a/drivers/gpu/drm/nouveau/nv04_instmem.c b/drivers/gpu/drm/nouveau/nv04_instmem.c
> index ef7a934..ce57bcd 100644
> --- a/drivers/gpu/drm/nouveau/nv04_instmem.c
> +++ b/drivers/gpu/drm/nouveau/nv04_instmem.c
> @@ -149,7 +149,7 @@ nv04_instmem_get(struct nouveau_gpuobj *gpuobj, struct nouveau_channel *chan,
>  			return -ENOMEM;
>  
>  		spin_lock(&dev_priv->ramin_lock);
> -		ramin = drm_mm_search_free(&dev_priv->ramin_heap, size, align, 0);
> +		ramin = drm_mm_search_free(&dev_priv->ramin_heap, size, align, 0, 0);
>  		if (ramin == NULL) {
>  			spin_unlock(&dev_priv->ramin_lock);
>  			return -ENOMEM;
> diff --git a/drivers/gpu/drm/nouveau/nv20_fb.c b/drivers/gpu/drm/nouveau/nv20_fb.c
> index 19bd640..754f47f 100644
> --- a/drivers/gpu/drm/nouveau/nv20_fb.c
> +++ b/drivers/gpu/drm/nouveau/nv20_fb.c
> @@ -16,7 +16,7 @@ nv20_fb_alloc_tag(struct drm_device *dev, uint32_t size)
>  		return NULL;
>  
>  	spin_lock(&dev_priv->tile.lock);
> -	mem = drm_mm_search_free(&pfb->tag_heap, size, 0, 0);
> +	mem = drm_mm_search_free(&pfb->tag_heap, size, 0, 0, 0);
>  	if (mem)
>  		mem = drm_mm_get_block_atomic(mem, size, 0);
>  	spin_unlock(&dev_priv->tile.lock);
> diff --git a/drivers/gpu/drm/nouveau/nv50_vram.c b/drivers/gpu/drm/nouveau/nv50_vram.c
> index 9ed9ae39..6c8ea3f 100644
> --- a/drivers/gpu/drm/nouveau/nv50_vram.c
> +++ b/drivers/gpu/drm/nouveau/nv50_vram.c
> @@ -105,7 +105,7 @@ nv50_vram_new(struct drm_device *dev, u64 size, u32 align, u32 size_nc,
>  			struct nouveau_fb_engine *pfb = &dev_priv->engine.fb;
>  			int n = (size >> 4) * comp;
>  
> -			mem->tag = drm_mm_search_free(&pfb->tag_heap, n, 0, 0);
> +			mem->tag = drm_mm_search_free(&pfb->tag_heap, n, 0, 0, 0);
>  			if (mem->tag)
>  				mem->tag = drm_mm_get_block(mem->tag, n, 0);
>  		}
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 36f4b28..76ee39f 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1686,7 +1686,7 @@ retry_pre_get:
>  
>  	write_lock(&bdev->vm_lock);
>  	bo->vm_node = drm_mm_search_free(&bdev->addr_space_mm,
> -					 bo->mem.num_pages, 0, 0);
> +					 bo->mem.num_pages, 0, 0, 0);
>  
>  	if (unlikely(bo->vm_node == NULL)) {
>  		ret = -ENOMEM;
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_manager.c b/drivers/gpu/drm/ttm/ttm_bo_manager.c
> index 038e947..b426b29 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_manager.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_manager.c
> @@ -68,14 +68,14 @@ static int ttm_bo_man_get_node(struct ttm_mem_type_manager *man,
>  
>  		spin_lock(&rman->lock);
>  		node = drm_mm_search_free_in_range(mm,
> -					mem->num_pages, mem->page_alignment,
> +					mem->num_pages, mem->page_alignment, 0,
>  					placement->fpfn, lpfn, 1);
>  		if (unlikely(node == NULL)) {
>  			spin_unlock(&rman->lock);
>  			return 0;
>  		}
>  		node = drm_mm_get_block_atomic_range(node, mem->num_pages,
> -						     mem->page_alignment,
> +						     mem->page_alignment, 0,
>  						     placement->fpfn,
>  						     lpfn);
>  		spin_unlock(&rman->lock);
> diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
> index 564b14a..04a9554 100644
> --- a/include/drm/drm_mm.h
> +++ b/include/drm/drm_mm.h
> @@ -50,6 +50,7 @@ struct drm_mm_node {
>  	unsigned scanned_next_free : 1;
>  	unsigned scanned_preceeds_hole : 1;
>  	unsigned allocated : 1;
> +	unsigned long color;
>  	unsigned long start;
>  	unsigned long size;
>  	struct drm_mm *mm;
> @@ -66,6 +67,7 @@ struct drm_mm {
>  	spinlock_t unused_lock;
>  	unsigned int scan_check_range : 1;
>  	unsigned scan_alignment;
> +	unsigned long scan_color;
>  	unsigned long scan_size;
>  	unsigned long scan_hit_start;
>  	unsigned scan_hit_size;
> @@ -73,6 +75,9 @@ struct drm_mm {
>  	unsigned long scan_start;
>  	unsigned long scan_end;
>  	struct drm_mm_node *prev_scanned_node;
> +
> +	void (*color_adjust)(struct drm_mm_node *node, unsigned long color,
> +			     unsigned long *start, unsigned long *end);
>  };
>  
>  static inline bool drm_mm_node_allocated(struct drm_mm_node *node)
> @@ -100,11 +105,13 @@ static inline bool drm_mm_initialized(struct drm_mm *mm)
>  extern struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *node,
>  						    unsigned long size,
>  						    unsigned alignment,
> +						    unsigned long color,
>  						    int atomic);
>  extern struct drm_mm_node *drm_mm_get_block_range_generic(
>  						struct drm_mm_node *node,
>  						unsigned long size,
>  						unsigned alignment,
> +						unsigned long color,
>  						unsigned long start,
>  						unsigned long end,
>  						int atomic);
> @@ -112,32 +119,34 @@ static inline struct drm_mm_node *drm_mm_get_block(struct drm_mm_node *parent,
>  						   unsigned long size,
>  						   unsigned alignment)
>  {
> -	return drm_mm_get_block_generic(parent, size, alignment, 0);
> +	return drm_mm_get_block_generic(parent, size, alignment, 0, 0);
>  }
>  static inline struct drm_mm_node *drm_mm_get_block_atomic(struct drm_mm_node *parent,
>  							  unsigned long size,
>  							  unsigned alignment)
>  {
> -	return drm_mm_get_block_generic(parent, size, alignment, 1);
> +	return drm_mm_get_block_generic(parent, size, alignment, 0, 1);
>  }
>  static inline struct drm_mm_node *drm_mm_get_block_range(
>  						struct drm_mm_node *parent,
>  						unsigned long size,
>  						unsigned alignment,
> +						unsigned long color,
>  						unsigned long start,
>  						unsigned long end)
>  {
> -	return drm_mm_get_block_range_generic(parent, size, alignment,
> -						start, end, 0);
> +	return drm_mm_get_block_range_generic(parent, size, alignment, color,
> +					      start, end, 0);
>  }
>  static inline struct drm_mm_node *drm_mm_get_block_atomic_range(
>  						struct drm_mm_node *parent,
>  						unsigned long size,
>  						unsigned alignment,
> +						unsigned long color,
>  						unsigned long start,
>  						unsigned long end)
>  {
> -	return drm_mm_get_block_range_generic(parent, size, alignment,
> +	return drm_mm_get_block_range_generic(parent, size, alignment, color,
>  						start, end, 1);
>  }
>  extern int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node,
> @@ -152,15 +161,18 @@ extern void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new
>  extern struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
>  					      unsigned long size,
>  					      unsigned alignment,
> -					      int best_match);
> +					      unsigned long color,
> +					      bool best_match);
>  extern struct drm_mm_node *drm_mm_search_free_in_range(
>  						const struct drm_mm *mm,
>  						unsigned long size,
>  						unsigned alignment,
> +						unsigned long color,
>  						unsigned long start,
>  						unsigned long end,
> -						int best_match);
> -extern int drm_mm_init(struct drm_mm *mm, unsigned long start,
> +						bool best_match);
> +extern int drm_mm_init(struct drm_mm *mm,
> +		       unsigned long start,
>  		       unsigned long size);
>  extern void drm_mm_takedown(struct drm_mm *mm);
>  extern int drm_mm_clean(struct drm_mm *mm);
> @@ -171,10 +183,14 @@ static inline struct drm_mm *drm_get_mm(struct drm_mm_node *block)
>  	return block->mm;
>  }
>  
> -void drm_mm_init_scan(struct drm_mm *mm, unsigned long size,
> -		      unsigned alignment);
> -void drm_mm_init_scan_with_range(struct drm_mm *mm, unsigned long size,
> +void drm_mm_init_scan(struct drm_mm *mm,
> +		      unsigned long size,
> +		      unsigned alignment,
> +		      unsigned long color);
> +void drm_mm_init_scan_with_range(struct drm_mm *mm,
> +				 unsigned long size,
>  				 unsigned alignment,
> +				 unsigned long color,
>  				 unsigned long start,
>  				 unsigned long end);
>  int drm_mm_scan_add_block(struct drm_mm_node *node);
> -- 
> 1.7.10
> 

-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] drm/i915: Export ability of changing cache levels to userspace
  2012-07-10  9:00     ` Chris Wilson
@ 2012-07-10  9:27       ` Chris Wilson
  2012-07-18 18:06         ` Daniel Vetter
  2012-07-26 10:34         ` Daniel Vetter
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Wilson @ 2012-07-10  9:27 UTC (permalink / raw)
  To: intel-gfx

By selecting the cache level (essentially whether or not the CPU snoops
any updates to the bo, and on more recent machines whether it resides
inside the CPU's last-level-cache) a userspace driver is able to then
manage all of its memory within buffer objects, if it so desires. This
enables the userspace driver to accelerate uploads and more importantly
downloads from the GPU and to able to mix CPU and GPU rendering/activity
efficiently.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_dma.c |    2 ++
 drivers/gpu/drm/i915/i915_drv.h |    8 +++--
 drivers/gpu/drm/i915/i915_gem.c |   62 +++++++++++++++++++++++++++++++++++++++
 include/drm/i915_drm.h          |   15 ++++++++++
 4 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index f64ef4b..ed462fe 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1829,6 +1829,8 @@ struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_PIN, i915_gem_pin_ioctl, DRM_AUTH|DRM_ROOT_ONLY|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_UNPIN, i915_gem_unpin_ioctl, DRM_AUTH|DRM_ROOT_ONLY|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_BUSY, i915_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(I915_GEM_SET_CACHEING, i915_gem_set_cacheing_ioctl, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(I915_GEM_GET_CACHEING, i915_gem_get_cacheing_ioctl, DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_THROTTLE, i915_gem_throttle_ioctl, DRM_AUTH|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_ENTERVT, i915_gem_entervt_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(I915_GEM_LEAVEVT, i915_gem_leavevt_ioctl, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4fb358e..038c29c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -854,9 +854,9 @@ enum hdmi_force_audio {
 };
 
 enum i915_cache_level {
-	I915_CACHE_NONE,
+	I915_CACHE_NONE = 0,
 	I915_CACHE_LLC,
-	I915_CACHE_LLC_MLC, /* gen6+ */
+	I915_CACHE_LLC_MLC, /* gen6+, in docs at least! */
 };
 
 struct drm_i915_gem_object {
@@ -1249,6 +1249,10 @@ int i915_gem_unpin_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
 int i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
+int i915_gem_get_cacheing_ioctl(struct drm_device *dev, void *data,
+				struct drm_file *file);
+int i915_gem_set_cacheing_ioctl(struct drm_device *dev, void *data,
+				struct drm_file *file);
 int i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
 			    struct drm_file *file_priv);
 int i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index da89f13..b8f5c4f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3102,6 +3102,68 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	return 0;
 }
 
+int i915_gem_get_cacheing_ioctl(struct drm_device *dev, void *data,
+				struct drm_file *file)
+{
+	struct drm_i915_gem_cacheing *args = data;
+	struct drm_i915_gem_object *obj;
+	int ret;
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+
+	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
+	if (&obj->base == NULL) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+
+	args->cacheing = obj->cache_level != I915_CACHE_NONE;
+
+	drm_gem_object_unreference(&obj->base);
+unlock:
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
+}
+
+int i915_gem_set_cacheing_ioctl(struct drm_device *dev, void *data,
+				struct drm_file *file)
+{
+	struct drm_i915_gem_cacheing *args = data;
+	struct drm_i915_gem_object *obj;
+	enum i915_cache_level level;
+	int ret;
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+
+	switch (args->cacheing) {
+	case I915_CACHEING_NONE:
+		level = I915_CACHE_NONE;
+		break;
+	case I915_CACHEING_CACHED:
+		level = I915_CACHE_LLC;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	obj = to_intel_bo(drm_gem_object_lookup(dev, file, args->handle));
+	if (&obj->base == NULL) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+
+	ret = i915_gem_object_set_cache_level(obj, level);
+
+	drm_gem_object_unreference(&obj->base);
+unlock:
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
+}
+
 /*
  * Prepare buffer for display plane (scanout, cursors, etc).
  * Can be called from an uninterruptible phase (modesetting) and allows
diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 564005e..87e46b4 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -203,6 +203,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_WAIT	0x2c
 #define DRM_I915_GEM_CONTEXT_CREATE	0x2d
 #define DRM_I915_GEM_CONTEXT_DESTROY	0x2e
+#define DRM_I915_GEM_SET_CACHEING	0x2f
+#define DRM_I915_GEM_GET_CACHEING	0x30
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -227,6 +229,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_PIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_PIN, struct drm_i915_gem_pin)
 #define DRM_IOCTL_I915_GEM_UNPIN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_UNPIN, struct drm_i915_gem_unpin)
 #define DRM_IOCTL_I915_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_BUSY, struct drm_i915_gem_busy)
+#define DRM_IOCTL_I915_GEM_SET_CACHEING		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_SET_CACHEING, struct drm_i915_gem_cacheing)
+#define DRM_IOCTL_I915_GEM_GET_CACHEING		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_GET_CACHEING, struct drm_i915_gem_cacheing)
 #define DRM_IOCTL_I915_GEM_THROTTLE	DRM_IO ( DRM_COMMAND_BASE + DRM_I915_GEM_THROTTLE)
 #define DRM_IOCTL_I915_GEM_ENTERVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_ENTERVT)
 #define DRM_IOCTL_I915_GEM_LEAVEVT	DRM_IO(DRM_COMMAND_BASE + DRM_I915_GEM_LEAVEVT)
@@ -707,6 +711,17 @@ struct drm_i915_gem_busy {
 	__u32 busy;
 };
 
+#define I915_CACHEING_NONE		0
+#define I915_CACHEING_CACHED		1
+
+struct drm_i915_gem_cacheing {
+	/** Handle of the buffer to check for busy */
+	__u32 handle;
+
+	/** Cacheing level to apply or return value */
+	__u32 cacheing;
+};
+
 #define I915_TILING_NONE	0
 #define I915_TILING_X		1
 #define I915_TILING_Y		2
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] drm: Add colouring to the range allocator
  2012-07-10  9:21   ` Daniel Vetter
@ 2012-07-10  9:29     ` Chris Wilson
  2012-07-10  9:40       ` Daniel Vetter
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2012-07-10  9:29 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Benjamin Herrenschmidt, intel-gfx, Jerome Glisse, Ben Skeggs,
	Daniel Vetter, Alex Deucher

On Tue, 10 Jul 2012 11:21:57 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Jul 09, 2012 at 12:34:37PM +0100, Chris Wilson wrote:
> > In order to support snoopable memory on non-LLC architectures (so that
> > we can bind vgem objects into the i915 GATT for example), we have to
> > avoid the prefetcher on the GPU from crossing memory domains and so
> > prevent allocation of a snoopable PTE immediately following an uncached
> > PTE. To do that, we need to extend the range allocator with support for
> > tracking and segregating different node colours.
> > 
> > This will be used by i915 to segregate memory domains within the GTT.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Dave Airlie <airlied@redhat.com
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Ben Skeggs <bskeggs@redhat.com>
> > Cc: Jerome Glisse <jglisse@redhat.com>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Two little bikesheds:
> - Do we really need 64bits of colour? Especially since we have quite a few
>   bits of space left ...

It was following the convention that we passed around an argument large
enough to stuff a pointer into if we ever needed to make a far more
complex decision.

> - I think we could add a new insert_color helper that always takes a range
>   (we can select the right rang in the driver). That way this patch
>   wouldn't need to touch the drivers, and we could take the opportunity to
>   embed the gtt_space mm_node into our gem object ...

I was just a bit more wary of adding yet another helper since they
quickly get just as confusing as the extra arguments they replace. :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] drm: Add colouring to the range allocator
  2012-07-10  9:29     ` Chris Wilson
@ 2012-07-10  9:40       ` Daniel Vetter
  2012-07-10 10:15         ` [PATCH 1/2] " Chris Wilson
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2012-07-10  9:40 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Benjamin Herrenschmidt, intel-gfx, Jerome Glisse, Ben Skeggs,
	Daniel Vetter, Alex Deucher

On Tue, Jul 10, 2012 at 10:29:09AM +0100, Chris Wilson wrote:
> On Tue, 10 Jul 2012 11:21:57 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Mon, Jul 09, 2012 at 12:34:37PM +0100, Chris Wilson wrote:
> > > In order to support snoopable memory on non-LLC architectures (so that
> > > we can bind vgem objects into the i915 GATT for example), we have to
> > > avoid the prefetcher on the GPU from crossing memory domains and so
> > > prevent allocation of a snoopable PTE immediately following an uncached
> > > PTE. To do that, we need to extend the range allocator with support for
> > > tracking and segregating different node colours.
> > > 
> > > This will be used by i915 to segregate memory domains within the GTT.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Dave Airlie <airlied@redhat.com
> > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > > Cc: Ben Skeggs <bskeggs@redhat.com>
> > > Cc: Jerome Glisse <jglisse@redhat.com>
> > > Cc: Alex Deucher <alexander.deucher@amd.com>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > Two little bikesheds:
> > - Do we really need 64bits of colour? Especially since we have quite a few
> >   bits of space left ...
> 
> It was following the convention that we passed around an argument large
> enough to stuff a pointer into if we ever needed to make a far more
> complex decision.

I think the right thing to do in that case would be to embed the gtt_space
and do an upcast ;-)

> > - I think we could add a new insert_color helper that always takes a range
> >   (we can select the right rang in the driver). That way this patch
> >   wouldn't need to touch the drivers, and we could take the opportunity to
> >   embed the gtt_space mm_node into our gem object ...
> 
> I was just a bit more wary of adding yet another helper since they
> quickly get just as confusing as the extra arguments they replace. :)

Oh, I guess you mean a different helper than I do. I think we should add a
new drm_mm_insert_node_colour function that takes a pre-allocated mm_node,
color and range and goes hole-hunting. That way we'd avoid changing any of
the existing drivers (who rather likely will never care about). And I
wouldn't have to convert over the drm_mm functions that deal with
pre-allocated drm_mm_node structs when I get around to resurrect the
embedd gtt_space patch.

I agree that shoveling all the alignment constrains into a new helper for
would be a bit too much overkill
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] drm: Add colouring to the range allocator
  2012-07-10  9:40       ` Daniel Vetter
@ 2012-07-10 10:15         ` Chris Wilson
  2012-07-10 10:15           ` [PATCH 2/2] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
  2012-07-12 12:15           ` [PATCH 1/2] drm: Add colouring to the range allocator Daniel Vetter
  0 siblings, 2 replies; 18+ messages in thread
From: Chris Wilson @ 2012-07-10 10:15 UTC (permalink / raw)
  To: intel-gfx
  Cc: dri-devel, Jerome Glisse, Ben Skeggs, Daniel Vetter, Alex Deucher

In order to support snoopable memory on non-LLC architectures (so that
we can bind vgem objects into the i915 GATT for example), we have to
avoid the prefetcher on the GPU from crossing memory domains and so
prevent allocation of a snoopable PTE immediately following an uncached
PTE. To do that, we need to extend the range allocator with support for
tracking and segregating different node colours.

This will be used by i915 to segregate memory domains within the GTT.

v2: Now with more drm_mm helpers and less driver interference.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/drm_gem.c             |    2 +-
 drivers/gpu/drm/drm_mm.c              |  169 ++++++++++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem.c       |    6 +-
 drivers/gpu/drm/i915/i915_gem_evict.c |    9 +-
 include/drm/drm_mm.h                  |   93 +++++++++++++++---
 5 files changed, 191 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d58e69d..fbe0842 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -354,7 +354,7 @@ drm_gem_create_mmap_offset(struct drm_gem_object *obj)
 
 	/* Get a DRM GEM mmap offset allocated... */
 	list->file_offset_node = drm_mm_search_free(&mm->offset_manager,
-			obj->size / PAGE_SIZE, 0, 0);
+			obj->size / PAGE_SIZE, 0, false);
 
 	if (!list->file_offset_node) {
 		DRM_ERROR("failed to allocate offset for bo %d\n", obj->name);
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index 961fb54..9bb82f7 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -118,45 +118,53 @@ static inline unsigned long drm_mm_hole_node_end(struct drm_mm_node *hole_node)
 
 static void drm_mm_insert_helper(struct drm_mm_node *hole_node,
 				 struct drm_mm_node *node,
-				 unsigned long size, unsigned alignment)
+				 unsigned long size, unsigned alignment,
+				 unsigned long color)
 {
 	struct drm_mm *mm = hole_node->mm;
-	unsigned long tmp = 0, wasted = 0;
 	unsigned long hole_start = drm_mm_hole_node_start(hole_node);
 	unsigned long hole_end = drm_mm_hole_node_end(hole_node);
+	unsigned long adj_start = hole_start;
+	unsigned long adj_end = hole_end;
 
 	BUG_ON(!hole_node->hole_follows || node->allocated);
 
-	if (alignment)
-		tmp = hole_start % alignment;
+	if (mm->color_adjust)
+		mm->color_adjust(hole_node, color, &adj_start, &adj_end);
 
-	if (!tmp) {
+	if (alignment) {
+		unsigned tmp = adj_start % alignment;
+		if (tmp)
+			adj_start += alignment - tmp;
+	}
+
+	if (adj_start == hole_start) {
 		hole_node->hole_follows = 0;
-		list_del_init(&hole_node->hole_stack);
-	} else
-		wasted = alignment - tmp;
+		list_del(&hole_node->hole_stack);
+	}
 
-	node->start = hole_start + wasted;
+	node->start = adj_start;
 	node->size = size;
 	node->mm = mm;
+	node->color = color;
 	node->allocated = 1;
 
 	INIT_LIST_HEAD(&node->hole_stack);
 	list_add(&node->node_list, &hole_node->node_list);
 
-	BUG_ON(node->start + node->size > hole_end);
+	BUG_ON(node->start + node->size > adj_end);
 
+	node->hole_follows = 0;
 	if (node->start + node->size < hole_end) {
 		list_add(&node->hole_stack, &mm->hole_stack);
 		node->hole_follows = 1;
-	} else {
-		node->hole_follows = 0;
 	}
 }
 
 struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node,
 					     unsigned long size,
 					     unsigned alignment,
+					     unsigned long color,
 					     int atomic)
 {
 	struct drm_mm_node *node;
@@ -165,7 +173,7 @@ struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *hole_node,
 	if (unlikely(node == NULL))
 		return NULL;
 
-	drm_mm_insert_helper(hole_node, node, size, alignment);
+	drm_mm_insert_helper(hole_node, node, size, alignment, color);
 
 	return node;
 }
@@ -181,11 +189,11 @@ int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node,
 {
 	struct drm_mm_node *hole_node;
 
-	hole_node = drm_mm_search_free(mm, size, alignment, 0);
+	hole_node = drm_mm_search_free(mm, size, alignment, false);
 	if (!hole_node)
 		return -ENOSPC;
 
-	drm_mm_insert_helper(hole_node, node, size, alignment);
+	drm_mm_insert_helper(hole_node, node, size, alignment, 0);
 
 	return 0;
 }
@@ -194,50 +202,57 @@ EXPORT_SYMBOL(drm_mm_insert_node);
 static void drm_mm_insert_helper_range(struct drm_mm_node *hole_node,
 				       struct drm_mm_node *node,
 				       unsigned long size, unsigned alignment,
+				       unsigned long color,
 				       unsigned long start, unsigned long end)
 {
 	struct drm_mm *mm = hole_node->mm;
-	unsigned long tmp = 0, wasted = 0;
 	unsigned long hole_start = drm_mm_hole_node_start(hole_node);
 	unsigned long hole_end = drm_mm_hole_node_end(hole_node);
+	unsigned long adj_start = hole_start;
+	unsigned long adj_end = hole_end;
 
 	BUG_ON(!hole_node->hole_follows || node->allocated);
 
-	if (hole_start < start)
-		wasted += start - hole_start;
-	if (alignment)
-		tmp = (hole_start + wasted) % alignment;
+	if (mm->color_adjust)
+		mm->color_adjust(hole_node, color, &adj_start, &adj_end);
 
-	if (tmp)
-		wasted += alignment - tmp;
+	if (adj_start < start)
+		adj_start = start;
+
+	if (alignment) {
+		unsigned tmp = adj_start % alignment;
+		if (tmp)
+			adj_start += alignment - tmp;
+	}
 
-	if (!wasted) {
+	if (adj_start == hole_start) {
 		hole_node->hole_follows = 0;
-		list_del_init(&hole_node->hole_stack);
+		list_del(&hole_node->hole_stack);
 	}
 
-	node->start = hole_start + wasted;
+	node->start = adj_start;
 	node->size = size;
 	node->mm = mm;
+	node->color = color;
 	node->allocated = 1;
 
 	INIT_LIST_HEAD(&node->hole_stack);
 	list_add(&node->node_list, &hole_node->node_list);
 
-	BUG_ON(node->start + node->size > hole_end);
+	BUG_ON(node->start + node->size > adj_end);
 	BUG_ON(node->start + node->size > end);
 
+	node->hole_follows = 0;
 	if (node->start + node->size < hole_end) {
 		list_add(&node->hole_stack, &mm->hole_stack);
 		node->hole_follows = 1;
-	} else {
-		node->hole_follows = 0;
 	}
 }
 
 struct drm_mm_node *drm_mm_get_block_range_generic(struct drm_mm_node *hole_node,
 						unsigned long size,
 						unsigned alignment,
+						unsigned long color,
 						unsigned long start,
 						unsigned long end,
 						int atomic)
@@ -248,7 +263,7 @@ struct drm_mm_node *drm_mm_get_block_range_generic(struct drm_mm_node *hole_node
 	if (unlikely(node == NULL))
 		return NULL;
 
-	drm_mm_insert_helper_range(hole_node, node, size, alignment,
+	drm_mm_insert_helper_range(hole_node, node, size, alignment, color,
 				   start, end);
 
 	return node;
@@ -267,11 +282,11 @@ int drm_mm_insert_node_in_range(struct drm_mm *mm, struct drm_mm_node *node,
 	struct drm_mm_node *hole_node;
 
 	hole_node = drm_mm_search_free_in_range(mm, size, alignment,
-						start, end, 0);
+						start, end, false);
 	if (!hole_node)
 		return -ENOSPC;
 
-	drm_mm_insert_helper_range(hole_node, node, size, alignment,
+	drm_mm_insert_helper_range(hole_node, node, size, alignment, 0,
 				   start, end);
 
 	return 0;
@@ -336,27 +351,23 @@ EXPORT_SYMBOL(drm_mm_put_block);
 static int check_free_hole(unsigned long start, unsigned long end,
 			   unsigned long size, unsigned alignment)
 {
-	unsigned wasted = 0;
-
 	if (end - start < size)
 		return 0;
 
 	if (alignment) {
 		unsigned tmp = start % alignment;
 		if (tmp)
-			wasted = alignment - tmp;
-	}
-
-	if (end >= start + size + wasted) {
-		return 1;
+			start += alignment - tmp;
 	}
 
-	return 0;
+	return end >= start + size;
 }
 
-struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
-				       unsigned long size,
-				       unsigned alignment, int best_match)
+struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
+					       unsigned long size,
+					       unsigned alignment,
+					       unsigned long color,
+					       bool best_match)
 {
 	struct drm_mm_node *entry;
 	struct drm_mm_node *best;
@@ -368,10 +379,17 @@ struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
 	best_size = ~0UL;
 
 	list_for_each_entry(entry, &mm->hole_stack, hole_stack) {
+		unsigned long adj_start = drm_mm_hole_node_start(entry);
+		unsigned long adj_end = drm_mm_hole_node_end(entry);
+
+		if (mm->color_adjust) {
+			mm->color_adjust(entry, color, &adj_start, &adj_end);
+			if (adj_end <= adj_start)
+				continue;
+		}
+
 		BUG_ON(!entry->hole_follows);
-		if (!check_free_hole(drm_mm_hole_node_start(entry),
-				     drm_mm_hole_node_end(entry),
-				     size, alignment))
+		if (!check_free_hole(adj_start, adj_end, size, alignment))
 			continue;
 
 		if (!best_match)
@@ -385,14 +403,15 @@ struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
 
 	return best;
 }
-EXPORT_SYMBOL(drm_mm_search_free);
-
-struct drm_mm_node *drm_mm_search_free_in_range(const struct drm_mm *mm,
-						unsigned long size,
-						unsigned alignment,
-						unsigned long start,
-						unsigned long end,
-						int best_match)
+EXPORT_SYMBOL(drm_mm_search_free_generic);
+
+struct drm_mm_node *drm_mm_search_free_in_range_generic(const struct drm_mm *mm,
+							unsigned long size,
+							unsigned alignment,
+							unsigned long color,
+							unsigned long start,
+							unsigned long end,
+							bool best_match)
 {
 	struct drm_mm_node *entry;
 	struct drm_mm_node *best;
@@ -410,6 +429,13 @@ struct drm_mm_node *drm_mm_search_free_in_range(const struct drm_mm *mm,
 			end : drm_mm_hole_node_end(entry);
 
 		BUG_ON(!entry->hole_follows);
+
+		if (mm->color_adjust) {
+			mm->color_adjust(entry, color, &adj_start, &adj_end);
+			if (adj_end <= adj_start)
+				continue;
+		}
+
 		if (!check_free_hole(adj_start, adj_end, size, alignment))
 			continue;
 
@@ -424,7 +450,7 @@ struct drm_mm_node *drm_mm_search_free_in_range(const struct drm_mm *mm,
 
 	return best;
 }
-EXPORT_SYMBOL(drm_mm_search_free_in_range);
+EXPORT_SYMBOL(drm_mm_search_free_in_range_generic);
 
 /**
  * Moves an allocation. To be used with embedded struct drm_mm_node.
@@ -437,6 +463,7 @@ void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new)
 	new->mm = old->mm;
 	new->start = old->start;
 	new->size = old->size;
+	new->color = old->color;
 
 	old->allocated = 0;
 	new->allocated = 1;
@@ -452,9 +479,12 @@ EXPORT_SYMBOL(drm_mm_replace_node);
  * Warning: As long as the scan list is non-empty, no other operations than
  * adding/removing nodes to/from the scan list are allowed.
  */
-void drm_mm_init_scan(struct drm_mm *mm, unsigned long size,
-		      unsigned alignment)
+void drm_mm_init_scan(struct drm_mm *mm,
+		      unsigned long size,
+		      unsigned alignment,
+		      unsigned long color)
 {
+	mm->scan_color = color;
 	mm->scan_alignment = alignment;
 	mm->scan_size = size;
 	mm->scanned_blocks = 0;
@@ -474,11 +504,14 @@ EXPORT_SYMBOL(drm_mm_init_scan);
  * Warning: As long as the scan list is non-empty, no other operations than
  * adding/removing nodes to/from the scan list are allowed.
  */
-void drm_mm_init_scan_with_range(struct drm_mm *mm, unsigned long size,
+void drm_mm_init_scan_with_range(struct drm_mm *mm,
+				 unsigned long size,
 				 unsigned alignment,
+				 unsigned long color,
 				 unsigned long start,
 				 unsigned long end)
 {
+	mm->scan_color = color;
 	mm->scan_alignment = alignment;
 	mm->scan_size = size;
 	mm->scanned_blocks = 0;
@@ -522,17 +555,21 @@ int drm_mm_scan_add_block(struct drm_mm_node *node)
 
 	hole_start = drm_mm_hole_node_start(prev_node);
 	hole_end = drm_mm_hole_node_end(prev_node);
+
+	adj_start = hole_start;
+	adj_end = hole_end;
+
+	if (mm->color_adjust)
+		mm->color_adjust(prev_node, mm->scan_color, &adj_start, &adj_end);
+
 	if (mm->scan_check_range) {
-		adj_start = hole_start < mm->scan_start ?
-			mm->scan_start : hole_start;
-		adj_end = hole_end > mm->scan_end ?
-			mm->scan_end : hole_end;
-	} else {
-		adj_start = hole_start;
-		adj_end = hole_end;
+		if (adj_start < mm->scan_start)
+			adj_start = mm->scan_start;
+		if (adj_end > mm->scan_end)
+			adj_end = mm->scan_end;
 	}
 
-	if (check_free_hole(adj_start , adj_end,
+	if (check_free_hole(adj_start, adj_end,
 			    mm->scan_size, mm->scan_alignment)) {
 		mm->scan_hit_start = hole_start;
 		mm->scan_hit_size = hole_end;
@@ -616,6 +653,8 @@ int drm_mm_init(struct drm_mm * mm, unsigned long start, unsigned long size)
 	mm->head_node.size = start - mm->head_node.start;
 	list_add_tail(&mm->head_node.hole_stack, &mm->hole_stack);
 
+	mm->color_adjust = NULL;
+
 	return 0;
 }
 EXPORT_SYMBOL(drm_mm_init);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index db438f0..8a34061 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2755,8 +2755,8 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 	if (map_and_fenceable)
 		free_space =
 			drm_mm_search_free_in_range(&dev_priv->mm.gtt_space,
-						    size, alignment, 0,
-						    dev_priv->mm.gtt_mappable_end,
+						    size, alignment,
+						    0, dev_priv->mm.gtt_mappable_end,
 						    0);
 	else
 		free_space = drm_mm_search_free(&dev_priv->mm.gtt_space,
@@ -2767,7 +2767,7 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 			obj->gtt_space =
 				drm_mm_get_block_range_generic(free_space,
 							       size, alignment, 0,
-							       dev_priv->mm.gtt_mappable_end,
+							       0, dev_priv->mm.gtt_mappable_end,
 							       0);
 		else
 			obj->gtt_space =
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index ae7c24e..eba0308 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -78,11 +78,12 @@ i915_gem_evict_something(struct drm_device *dev, int min_size,
 
 	INIT_LIST_HEAD(&unwind_list);
 	if (mappable)
-		drm_mm_init_scan_with_range(&dev_priv->mm.gtt_space, min_size,
-					    alignment, 0,
-					    dev_priv->mm.gtt_mappable_end);
+		drm_mm_init_scan_with_range(&dev_priv->mm.gtt_space,
+					    min_size, alignment, 0,
+					    0, dev_priv->mm.gtt_mappable_end);
 	else
-		drm_mm_init_scan(&dev_priv->mm.gtt_space, min_size, alignment);
+		drm_mm_init_scan(&dev_priv->mm.gtt_space,
+				 min_size, alignment, 0);
 
 	/* First see if there is a large enough contiguous idle region... */
 	list_for_each_entry(obj, &dev_priv->mm.inactive_list, mm_list) {
diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
index 564b14a..06d7f79 100644
--- a/include/drm/drm_mm.h
+++ b/include/drm/drm_mm.h
@@ -50,6 +50,7 @@ struct drm_mm_node {
 	unsigned scanned_next_free : 1;
 	unsigned scanned_preceeds_hole : 1;
 	unsigned allocated : 1;
+	unsigned long color;
 	unsigned long start;
 	unsigned long size;
 	struct drm_mm *mm;
@@ -66,6 +67,7 @@ struct drm_mm {
 	spinlock_t unused_lock;
 	unsigned int scan_check_range : 1;
 	unsigned scan_alignment;
+	unsigned long scan_color;
 	unsigned long scan_size;
 	unsigned long scan_hit_start;
 	unsigned scan_hit_size;
@@ -73,6 +75,9 @@ struct drm_mm {
 	unsigned long scan_start;
 	unsigned long scan_end;
 	struct drm_mm_node *prev_scanned_node;
+
+	void (*color_adjust)(struct drm_mm_node *node, unsigned long color,
+			     unsigned long *start, unsigned long *end);
 };
 
 static inline bool drm_mm_node_allocated(struct drm_mm_node *node)
@@ -100,11 +105,13 @@ static inline bool drm_mm_initialized(struct drm_mm *mm)
 extern struct drm_mm_node *drm_mm_get_block_generic(struct drm_mm_node *node,
 						    unsigned long size,
 						    unsigned alignment,
+						    unsigned long color,
 						    int atomic);
 extern struct drm_mm_node *drm_mm_get_block_range_generic(
 						struct drm_mm_node *node,
 						unsigned long size,
 						unsigned alignment,
+						unsigned long color,
 						unsigned long start,
 						unsigned long end,
 						int atomic);
@@ -112,13 +119,13 @@ static inline struct drm_mm_node *drm_mm_get_block(struct drm_mm_node *parent,
 						   unsigned long size,
 						   unsigned alignment)
 {
-	return drm_mm_get_block_generic(parent, size, alignment, 0);
+	return drm_mm_get_block_generic(parent, size, alignment, 0, 0);
 }
 static inline struct drm_mm_node *drm_mm_get_block_atomic(struct drm_mm_node *parent,
 							  unsigned long size,
 							  unsigned alignment)
 {
-	return drm_mm_get_block_generic(parent, size, alignment, 1);
+	return drm_mm_get_block_generic(parent, size, alignment, 0, 1);
 }
 static inline struct drm_mm_node *drm_mm_get_block_range(
 						struct drm_mm_node *parent,
@@ -127,8 +134,19 @@ static inline struct drm_mm_node *drm_mm_get_block_range(
 						unsigned long start,
 						unsigned long end)
 {
-	return drm_mm_get_block_range_generic(parent, size, alignment,
-						start, end, 0);
+	return drm_mm_get_block_range_generic(parent, size, alignment, 0,
+					      start, end, 0);
+}
+static inline struct drm_mm_node *drm_mm_get_color_block_range(
+						struct drm_mm_node *parent,
+						unsigned long size,
+						unsigned alignment,
+						unsigned long color,
+						unsigned long start,
+						unsigned long end)
+{
+	return drm_mm_get_block_range_generic(parent, size, alignment, color,
+					      start, end, 0);
 }
 static inline struct drm_mm_node *drm_mm_get_block_atomic_range(
 						struct drm_mm_node *parent,
@@ -137,7 +155,7 @@ static inline struct drm_mm_node *drm_mm_get_block_atomic_range(
 						unsigned long start,
 						unsigned long end)
 {
-	return drm_mm_get_block_range_generic(parent, size, alignment,
+	return drm_mm_get_block_range_generic(parent, size, alignment, 0,
 						start, end, 1);
 }
 extern int drm_mm_insert_node(struct drm_mm *mm, struct drm_mm_node *node,
@@ -149,18 +167,59 @@ extern int drm_mm_insert_node_in_range(struct drm_mm *mm,
 extern void drm_mm_put_block(struct drm_mm_node *cur);
 extern void drm_mm_remove_node(struct drm_mm_node *node);
 extern void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new);
-extern struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
-					      unsigned long size,
-					      unsigned alignment,
-					      int best_match);
-extern struct drm_mm_node *drm_mm_search_free_in_range(
+extern struct drm_mm_node *drm_mm_search_free_generic(const struct drm_mm *mm,
+						      unsigned long size,
+						      unsigned alignment,
+						      unsigned long color,
+						      bool best_match);
+extern struct drm_mm_node *drm_mm_search_free_in_range_generic(
+						const struct drm_mm *mm,
+						unsigned long size,
+						unsigned alignment,
+						unsigned long color,
+						unsigned long start,
+						unsigned long end,
+						bool best_match);
+static inline struct drm_mm_node *drm_mm_search_free(const struct drm_mm *mm,
+						     unsigned long size,
+						     unsigned alignment,
+						     bool best_match)
+{
+	return drm_mm_search_free_generic(mm,size, alignment, 0, best_match);
+}
+static inline  struct drm_mm_node *drm_mm_search_free_in_range(
 						const struct drm_mm *mm,
 						unsigned long size,
 						unsigned alignment,
 						unsigned long start,
 						unsigned long end,
-						int best_match);
-extern int drm_mm_init(struct drm_mm *mm, unsigned long start,
+						bool best_match)
+{
+	return drm_mm_search_free_in_range_generic(mm, size, alignment, 0,
+						   start, end, best_match);
+}
+static inline struct drm_mm_node *drm_mm_search_free_color(const struct drm_mm *mm,
+							   unsigned long size,
+							   unsigned alignment,
+							   unsigned long color,
+							   bool best_match)
+{
+	return drm_mm_search_free_generic(mm,size, alignment, color, best_match);
+}
+static inline  struct drm_mm_node *drm_mm_search_free_in_range_color(
+						const struct drm_mm *mm,
+						unsigned long size,
+						unsigned alignment,
+						unsigned long color,
+						unsigned long start,
+						unsigned long end,
+						bool best_match)
+{
+	return drm_mm_search_free_in_range_generic(mm, size, alignment, color,
+						   start, end, best_match);
+}
+extern int drm_mm_init(struct drm_mm *mm,
+		       unsigned long start,
 		       unsigned long size);
 extern void drm_mm_takedown(struct drm_mm *mm);
 extern int drm_mm_clean(struct drm_mm *mm);
@@ -171,10 +230,14 @@ static inline struct drm_mm *drm_get_mm(struct drm_mm_node *block)
 	return block->mm;
 }
 
-void drm_mm_init_scan(struct drm_mm *mm, unsigned long size,
-		      unsigned alignment);
-void drm_mm_init_scan_with_range(struct drm_mm *mm, unsigned long size,
+void drm_mm_init_scan(struct drm_mm *mm,
+		      unsigned long size,
+		      unsigned alignment,
+		      unsigned long color);
+void drm_mm_init_scan_with_range(struct drm_mm *mm,
+				 unsigned long size,
 				 unsigned alignment,
+				 unsigned long color,
 				 unsigned long start,
 				 unsigned long end);
 int drm_mm_scan_add_block(struct drm_mm_node *node);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] drm/i915: Segregate memory domains in the GTT using coloring
  2012-07-10 10:15         ` [PATCH 1/2] " Chris Wilson
@ 2012-07-10 10:15           ` Chris Wilson
  2012-07-12 12:19             ` Daniel Vetter
  2012-07-12 12:15           ` [PATCH 1/2] drm: Add colouring to the range allocator Daniel Vetter
  1 sibling, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2012-07-10 10:15 UTC (permalink / raw)
  To: intel-gfx

Several functions of the GPU have the restriction that differing memory
domains cannot be placed next to each other (as the GPU may prefetch
beyond the end of one domain and hang as it crosses into the other
domain). We use the facility of the drm_mm to mark ranges with a
particular color that corresponds to the cache attributes of those pages
in order to prevent allocating adjacent blocks of differing memory
types.

v2: Rebase ontop of drm_mm coloring v2.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h       |    4 +-
 drivers/gpu/drm/i915/i915_gem.c       |   67 ++++++++++++++++++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_evict.c |    7 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.c   |   19 ++++++++++
 4 files changed, 83 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b53bd8f..4fb358e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1422,7 +1422,9 @@ void i915_gem_init_global_gtt(struct drm_device *dev,
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev, int min_size,
-					  unsigned alignment, bool mappable);
+					  unsigned alignment,
+					  unsigned cache_level,
+					  bool mappable);
 int i915_gem_evict_everything(struct drm_device *dev, bool purgeable_only);
 
 /* i915_gem_stolen.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8a34061..e1ab236 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2700,6 +2700,36 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
+static bool i915_gem_valid_gtt_space(struct drm_device *dev,
+				     struct drm_mm_node *gtt_space,
+				     unsigned long cache_level)
+{
+	struct drm_mm_node *other;
+
+	/* On non-LLC machines we have to be careful when putting differing
+	 * types of snoopable memory together to avoid the prefetcher
+	 * crossing memory domains and dieing.
+	 */
+	if (HAS_LLC(dev))
+		return true;
+
+	if (gtt_space == NULL)
+		return true;
+
+	if (list_empty(&gtt_space->node_list))
+		return true;
+
+	other = list_entry(gtt_space->node_list.prev, struct drm_mm_node, node_list);
+	if (other->allocated && !other->hole_follows && other->color != cache_level)
+		return false;
+
+	other = list_entry(gtt_space->node_list.next, struct drm_mm_node, node_list);
+	if (other->allocated && !gtt_space->hole_follows && other->color != cache_level)
+		return false;
+
+	return true;
+}
+
 /**
  * Finds free space in the GTT aperture and binds the object there.
  */
@@ -2754,36 +2784,47 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
  search_free:
 	if (map_and_fenceable)
 		free_space =
-			drm_mm_search_free_in_range(&dev_priv->mm.gtt_space,
-						    size, alignment,
-						    0, dev_priv->mm.gtt_mappable_end,
-						    0);
+			drm_mm_search_free_in_range_color(&dev_priv->mm.gtt_space,
+							  size, alignment, obj->cache_level,
+							  0, dev_priv->mm.gtt_mappable_end,
+							  false);
 	else
-		free_space = drm_mm_search_free(&dev_priv->mm.gtt_space,
-						size, alignment, 0);
+		free_space = drm_mm_search_free_color(&dev_priv->mm.gtt_space,
+						      size, alignment, obj->cache_level,
+						      false);
 
 	if (free_space != NULL) {
 		if (map_and_fenceable)
 			obj->gtt_space =
 				drm_mm_get_block_range_generic(free_space,
-							       size, alignment, 0,
+							       size, alignment, obj->cache_level,
 							       0, dev_priv->mm.gtt_mappable_end,
-							       0);
+							       false);
 		else
 			obj->gtt_space =
-				drm_mm_get_block(free_space, size, alignment);
+				drm_mm_get_block_generic(free_space,
+							 size, alignment, obj->cache_level,
+							 false);
 	}
 	if (obj->gtt_space == NULL) {
 		/* If the gtt is empty and we're still having trouble
 		 * fitting our object in, we're out of memory.
 		 */
 		ret = i915_gem_evict_something(dev, size, alignment,
+					       obj->cache_level,
 					       map_and_fenceable);
 		if (ret)
 			return ret;
 
 		goto search_free;
 	}
+	if (WARN_ON(!i915_gem_valid_gtt_space(dev,
+					      obj->gtt_space,
+					      obj->cache_level))) {
+		drm_mm_put_block(obj->gtt_space);
+		obj->gtt_space = NULL;
+		return -EINVAL;
+	}
 
 	ret = i915_gem_object_get_pages_gtt(obj, gfpmask);
 	if (ret) {
@@ -3004,6 +3045,12 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		return -EBUSY;
 	}
 
+	if (!i915_gem_valid_gtt_space(dev, obj->gtt_space, cache_level)) {
+		ret = i915_gem_object_unbind(obj);
+		if (ret)
+			return ret;
+	}
+
 	if (obj->gtt_space) {
 		ret = i915_gem_object_finish_gpu(obj);
 		if (ret)
@@ -3015,7 +3062,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		 * registers with snooped memory, so relinquish any fences
 		 * currently pointing to our region in the aperture.
 		 */
-		if (INTEL_INFO(obj->base.dev)->gen < 6) {
+		if (INTEL_INFO(dev)->gen < 6) {
 			ret = i915_gem_object_put_fence(obj);
 			if (ret)
 				return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index eba0308..9c5fb08 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -44,7 +44,8 @@ mark_free(struct drm_i915_gem_object *obj, struct list_head *unwind)
 
 int
 i915_gem_evict_something(struct drm_device *dev, int min_size,
-			 unsigned alignment, bool mappable)
+			 unsigned alignment, unsigned cache_level,
+			 bool mappable)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct list_head eviction_list, unwind_list;
@@ -79,11 +80,11 @@ i915_gem_evict_something(struct drm_device *dev, int min_size,
 	INIT_LIST_HEAD(&unwind_list);
 	if (mappable)
 		drm_mm_init_scan_with_range(&dev_priv->mm.gtt_space,
-					    min_size, alignment, 0,
+					    min_size, alignment, cache_level,
 					    0, dev_priv->mm.gtt_mappable_end);
 	else
 		drm_mm_init_scan(&dev_priv->mm.gtt_space,
-				 min_size, alignment, 0);
+				 min_size, alignment, cache_level);
 
 	/* First see if there is a large enough contiguous idle region... */
 	list_for_each_entry(obj, &dev_priv->mm.inactive_list, mm_list) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9fd25a4..4584f7f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -422,6 +422,23 @@ void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj)
 	undo_idling(dev_priv, interruptible);
 }
 
+static void i915_gtt_color_adjust(struct drm_mm_node *node,
+				  unsigned long color,
+				  unsigned long *start,
+				  unsigned long *end)
+{
+	if (node->color != color)
+		*start += 4096;
+
+	if (!list_empty(&node->node_list)) {
+		node = list_entry(node->node_list.next,
+				  struct drm_mm_node,
+				  node_list);
+		if (node->allocated && node->color != color)
+			*end -= 4096;
+	}
+}
+
 void i915_gem_init_global_gtt(struct drm_device *dev,
 			      unsigned long start,
 			      unsigned long mappable_end,
@@ -431,6 +448,8 @@ void i915_gem_init_global_gtt(struct drm_device *dev,
 
 	/* Substract the guard page ... */
 	drm_mm_init(&dev_priv->mm.gtt_space, start, end - start - PAGE_SIZE);
+	if (!HAS_LLC(dev))
+		dev_priv->mm.gtt_space.color_adjust = i915_gtt_color_adjust;
 
 	dev_priv->mm.gtt_start = start;
 	dev_priv->mm.gtt_mappable_end = mappable_end;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] drm: Add colouring to the range allocator
  2012-07-10 10:15         ` [PATCH 1/2] " Chris Wilson
  2012-07-10 10:15           ` [PATCH 2/2] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
@ 2012-07-12 12:15           ` Daniel Vetter
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel Vetter @ 2012-07-12 12:15 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Benjamin Herrenschmidt, intel-gfx, dri-devel, Jerome Glisse,
	Ben Skeggs, Daniel Vetter, Alex Deucher

On Tue, Jul 10, 2012 at 11:15:23AM +0100, Chris Wilson wrote:
> In order to support snoopable memory on non-LLC architectures (so that
> we can bind vgem objects into the i915 GATT for example), we have to
> avoid the prefetcher on the GPU from crossing memory domains and so
> prevent allocation of a snoopable PTE immediately following an uncached
> PTE. To do that, we need to extend the range allocator with support for
> tracking and segregating different node colours.
> 
> This will be used by i915 to segregate memory domains within the GTT.
> 
> v2: Now with more drm_mm helpers and less driver interference.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Dave Airlie <airlied@redhat.com
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Ben Skeggs <bskeggs@redhat.com>
> Cc: Jerome Glisse <jglisse@redhat.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: dri-devel@lists.freedesktop.org

Imo we should ditch the rather useless best_match and maybe also fold the
_range variants into the generic ones for most cases, but that's stuff for
other patches. So

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] drm/i915: Segregate memory domains in the GTT using coloring
  2012-07-10 10:15           ` [PATCH 2/2] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
@ 2012-07-12 12:19             ` Daniel Vetter
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Vetter @ 2012-07-12 12:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Jul 10, 2012 at 11:15:24AM +0100, Chris Wilson wrote:
> Several functions of the GPU have the restriction that differing memory
> domains cannot be placed next to each other (as the GPU may prefetch
> beyond the end of one domain and hang as it crosses into the other
> domain). We use the facility of the drm_mm to mark ranges with a
> particular color that corresponds to the cache attributes of those pages
> in order to prevent allocating adjacent blocks of differing memory
> types.
> 
> v2: Rebase ontop of drm_mm coloring v2.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/i915: Export ability of changing cache levels to userspace
  2012-07-10  9:27       ` [PATCH] " Chris Wilson
@ 2012-07-18 18:06         ` Daniel Vetter
  2012-07-26 10:34         ` Daniel Vetter
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel Vetter @ 2012-07-18 18:06 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Jul 10, 2012 at 10:27:08AM +0100, Chris Wilson wrote:
> By selecting the cache level (essentially whether or not the CPU snoops
> any updates to the bo, and on more recent machines whether it resides
> inside the CPU's last-level-cache) a userspace driver is able to then
> manage all of its memory within buffer objects, if it so desires. This
> enables the userspace driver to accelerate uploads and more importantly
> downloads from the GPU and to able to mix CPU and GPU rendering/activity
> efficiently.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I've merged the interface header bits of this patch to reserve the
ioctl number. I'd like to play around some more with the implementation
and merge it in about a week for 3.7.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/i915: Export ability of changing cache levels to userspace
  2012-07-10  9:27       ` [PATCH] " Chris Wilson
  2012-07-18 18:06         ` Daniel Vetter
@ 2012-07-26 10:34         ` Daniel Vetter
  2012-07-26 10:49           ` [PATCH] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2012-07-26 10:34 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Tue, Jul 10, 2012 at 10:27:08AM +0100, Chris Wilson wrote:
> By selecting the cache level (essentially whether or not the CPU snoops
> any updates to the bo, and on more recent machines whether it resides
> inside the CPU's last-level-cache) a userspace driver is able to then
> manage all of its memory within buffer objects, if it so desires. This
> enables the userspace driver to accelerate uploads and more importantly
> downloads from the GPU and to able to mix CPU and GPU rendering/activity
> efficiently.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Ok, I've merged this one and the prep patch to dinq, with the little
comment added that bits 16-31 are reserved for platform madness. Thanks
for the patches.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] drm/i915: Segregate memory domains in the GTT using coloring
  2012-07-26 10:34         ` Daniel Vetter
@ 2012-07-26 10:49           ` Chris Wilson
  2012-07-26 11:00             ` Daniel Vetter
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2012-07-26 10:49 UTC (permalink / raw)
  To: intel-gfx

Several functions of the GPU have the restriction that differing memory
domains cannot be placed next to each other (as the GPU may prefetch
beyond the end of one domain and hang as it crosses into the other
domain). We use the facility of the drm_mm to mark ranges with a
particular color that corresponds to the cache attributes of those pages
in order to prevent allocating adjacent blocks of differing memory
types.

v2: Rebase ontop of drm_mm coloring v2.
v3: Fix rebinding existing gtt_space and add a verification routine.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---

So I found another bug in testing that I forgot to forward on in a timely
manner.
-Chris

---
 drivers/gpu/drm/i915/i915_drv.h       |    5 +-
 drivers/gpu/drm/i915/i915_gem.c       |  111 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_evict.c |    7 ++-
 drivers/gpu/drm/i915/i915_gem_gtt.c   |   19 ++++++
 4 files changed, 128 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f176589..d6c0d0e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -109,6 +109,7 @@ struct intel_pch_pll {
 
 #define WATCH_COHERENCY	0
 #define WATCH_LISTS	0
+#define WATCH_GTT	0
 
 #define I915_GEM_PHYS_CURSOR_0 1
 #define I915_GEM_PHYS_CURSOR_1 2
@@ -1404,7 +1405,9 @@ void i915_gem_init_global_gtt(struct drm_device *dev,
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev, int min_size,
-					  unsigned alignment, bool mappable);
+					  unsigned alignment,
+					  unsigned cache_level,
+					  bool mappable);
 int i915_gem_evict_everything(struct drm_device *dev, bool purgeable_only);
 
 /* i915_gem_stolen.c */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b274810..19bdc24 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2586,6 +2586,76 @@ i915_gem_object_get_fence(struct drm_i915_gem_object *obj)
 	return 0;
 }
 
+static bool i915_gem_valid_gtt_space(struct drm_device *dev,
+				     struct drm_mm_node *gtt_space,
+				     unsigned long cache_level)
+{
+	struct drm_mm_node *other;
+
+	/* On non-LLC machines we have to be careful when putting differing
+	 * types of snoopable memory together to avoid the prefetcher
+	 * crossing memory domains and dieing.
+	 */
+	if (HAS_LLC(dev))
+		return true;
+
+	if (gtt_space == NULL)
+		return true;
+
+	if (list_empty(&gtt_space->node_list))
+		return true;
+
+	other = list_entry(gtt_space->node_list.prev, struct drm_mm_node, node_list);
+	if (other->allocated && !other->hole_follows && other->color != cache_level)
+		return false;
+
+	other = list_entry(gtt_space->node_list.next, struct drm_mm_node, node_list);
+	if (other->allocated && !gtt_space->hole_follows && other->color != cache_level)
+		return false;
+
+	return true;
+}
+
+static void i915_gem_verify_gtt(struct drm_device *dev)
+{
+#if WATCH_GTT
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_object *obj;
+	int err = 0;
+
+	list_for_each_entry(obj, &dev_priv->mm.gtt_list, gtt_list) {
+		if (obj->gtt_space == NULL) {
+			printk(KERN_ERR "object found on GTT list with no space reserved\n");
+			err++;
+			continue;
+		}
+
+		if (obj->cache_level != obj->gtt_space->color) {
+			printk(KERN_ERR "object reserved space [%08lx, %08lx] with wrong color, cache_level=%x, color=%lx\n",
+			       obj->gtt_space->start,
+			       obj->gtt_space->start + obj->gtt_space->size,
+			       obj->cache_level,
+			       obj->gtt_space->color);
+			err++;
+			continue;
+		}
+
+		if (!i915_gem_valid_gtt_space(dev,
+					      obj->gtt_space,
+					      obj->cache_level)) {
+			printk(KERN_ERR "invalid GTT space found at [%08lx, %08lx] - color=%x\n",
+			       obj->gtt_space->start,
+			       obj->gtt_space->start + obj->gtt_space->size,
+			       obj->cache_level);
+			err++;
+			continue;
+		}
+	}
+
+	WARN_ON(err);
+#endif
+}
+
 /**
  * Finds free space in the GTT aperture and binds the object there.
  */
@@ -2640,36 +2710,47 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
  search_free:
 	if (map_and_fenceable)
 		free_space =
-			drm_mm_search_free_in_range(&dev_priv->mm.gtt_space,
-						    size, alignment,
-						    0, dev_priv->mm.gtt_mappable_end,
-						    0);
+			drm_mm_search_free_in_range_color(&dev_priv->mm.gtt_space,
+							  size, alignment, obj->cache_level,
+							  0, dev_priv->mm.gtt_mappable_end,
+							  false);
 	else
-		free_space = drm_mm_search_free(&dev_priv->mm.gtt_space,
-						size, alignment, 0);
+		free_space = drm_mm_search_free_color(&dev_priv->mm.gtt_space,
+						      size, alignment, obj->cache_level,
+						      false);
 
 	if (free_space != NULL) {
 		if (map_and_fenceable)
 			obj->gtt_space =
 				drm_mm_get_block_range_generic(free_space,
-							       size, alignment, 0,
+							       size, alignment, obj->cache_level,
 							       0, dev_priv->mm.gtt_mappable_end,
-							       0);
+							       false);
 		else
 			obj->gtt_space =
-				drm_mm_get_block(free_space, size, alignment);
+				drm_mm_get_block_generic(free_space,
+							 size, alignment, obj->cache_level,
+							 false);
 	}
 	if (obj->gtt_space == NULL) {
 		/* If the gtt is empty and we're still having trouble
 		 * fitting our object in, we're out of memory.
 		 */
 		ret = i915_gem_evict_something(dev, size, alignment,
+					       obj->cache_level,
 					       map_and_fenceable);
 		if (ret)
 			return ret;
 
 		goto search_free;
 	}
+	if (WARN_ON(!i915_gem_valid_gtt_space(dev,
+					      obj->gtt_space,
+					      obj->cache_level))) {
+		drm_mm_put_block(obj->gtt_space);
+		obj->gtt_space = NULL;
+		return -EINVAL;
+	}
 
 	ret = i915_gem_object_get_pages_gtt(obj, gfpmask);
 	if (ret) {
@@ -2732,6 +2813,7 @@ i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 	obj->map_and_fenceable = mappable && fenceable;
 
 	trace_i915_gem_object_bind(obj, map_and_fenceable);
+	i915_gem_verify_gtt(dev);
 	return 0;
 }
 
@@ -2873,6 +2955,12 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		return -EBUSY;
 	}
 
+	if (!i915_gem_valid_gtt_space(dev, obj->gtt_space, cache_level)) {
+		ret = i915_gem_object_unbind(obj);
+		if (ret)
+			return ret;
+	}
+
 	if (obj->gtt_space) {
 		ret = i915_gem_object_finish_gpu(obj);
 		if (ret)
@@ -2884,7 +2972,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		 * registers with snooped memory, so relinquish any fences
 		 * currently pointing to our region in the aperture.
 		 */
-		if (INTEL_INFO(obj->base.dev)->gen < 6) {
+		if (INTEL_INFO(dev)->gen < 6) {
 			ret = i915_gem_object_put_fence(obj);
 			if (ret)
 				return ret;
@@ -2895,6 +2983,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 		if (obj->has_aliasing_ppgtt_mapping)
 			i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt,
 					       obj, cache_level);
+
+		obj->gtt_space->color = cache_level;
 	}
 
 	if (cache_level == I915_CACHE_NONE) {
@@ -2921,6 +3011,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 	}
 
 	obj->cache_level = cache_level;
+	i915_gem_verify_gtt(dev);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 51e547c..7279c31 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -44,7 +44,8 @@ mark_free(struct drm_i915_gem_object *obj, struct list_head *unwind)
 
 int
 i915_gem_evict_something(struct drm_device *dev, int min_size,
-			 unsigned alignment, bool mappable)
+			 unsigned alignment, unsigned cache_level,
+			 bool mappable)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct list_head eviction_list, unwind_list;
@@ -79,11 +80,11 @@ i915_gem_evict_something(struct drm_device *dev, int min_size,
 	INIT_LIST_HEAD(&unwind_list);
 	if (mappable)
 		drm_mm_init_scan_with_range(&dev_priv->mm.gtt_space,
-					    min_size, alignment, 0,
+					    min_size, alignment, cache_level,
 					    0, dev_priv->mm.gtt_mappable_end);
 	else
 		drm_mm_init_scan(&dev_priv->mm.gtt_space,
-				 min_size, alignment, 0);
+				 min_size, alignment, cache_level);
 
 	/* First see if there is a large enough contiguous idle region... */
 	list_for_each_entry(obj, &dev_priv->mm.inactive_list, mm_list) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 9fd25a4..4584f7f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -422,6 +422,23 @@ void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj)
 	undo_idling(dev_priv, interruptible);
 }
 
+static void i915_gtt_color_adjust(struct drm_mm_node *node,
+				  unsigned long color,
+				  unsigned long *start,
+				  unsigned long *end)
+{
+	if (node->color != color)
+		*start += 4096;
+
+	if (!list_empty(&node->node_list)) {
+		node = list_entry(node->node_list.next,
+				  struct drm_mm_node,
+				  node_list);
+		if (node->allocated && node->color != color)
+			*end -= 4096;
+	}
+}
+
 void i915_gem_init_global_gtt(struct drm_device *dev,
 			      unsigned long start,
 			      unsigned long mappable_end,
@@ -431,6 +448,8 @@ void i915_gem_init_global_gtt(struct drm_device *dev,
 
 	/* Substract the guard page ... */
 	drm_mm_init(&dev_priv->mm.gtt_space, start, end - start - PAGE_SIZE);
+	if (!HAS_LLC(dev))
+		dev_priv->mm.gtt_space.color_adjust = i915_gtt_color_adjust;
 
 	dev_priv->mm.gtt_start = start;
 	dev_priv->mm.gtt_mappable_end = mappable_end;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/i915: Segregate memory domains in the GTT using coloring
  2012-07-26 10:49           ` [PATCH] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
@ 2012-07-26 11:00             ` Daniel Vetter
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Vetter @ 2012-07-26 11:00 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Thu, Jul 26, 2012 at 11:49:32AM +0100, Chris Wilson wrote:
> Several functions of the GPU have the restriction that differing memory
> domains cannot be placed next to each other (as the GPU may prefetch
> beyond the end of one domain and hang as it crosses into the other
> domain). We use the facility of the drm_mm to mark ranges with a
> particular color that corresponds to the cache attributes of those pages
> in order to prevent allocating adjacent blocks of differing memory
> types.
> 
> v2: Rebase ontop of drm_mm coloring v2.
> v3: Fix rebinding existing gtt_space and add a verification routine.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Ok, history fixed. Can you please double-check that I haven't botched it?

/me hangs head in shame over not noticing that nothing assings anything to
obj->color ...
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-07-26 11:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-09 11:34 [RFC] Set cache level ioctl Chris Wilson
2012-07-09 11:34 ` [PATCH 1/3] drm: Add colouring to the range allocator Chris Wilson
2012-07-10  9:21   ` Daniel Vetter
2012-07-10  9:29     ` Chris Wilson
2012-07-10  9:40       ` Daniel Vetter
2012-07-10 10:15         ` [PATCH 1/2] " Chris Wilson
2012-07-10 10:15           ` [PATCH 2/2] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
2012-07-12 12:19             ` Daniel Vetter
2012-07-12 12:15           ` [PATCH 1/2] drm: Add colouring to the range allocator Daniel Vetter
2012-07-09 11:34 ` [PATCH 2/3] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
2012-07-09 11:34 ` [PATCH 3/3] drm/i915: Export ability of changing cache levels to userspace Chris Wilson
2012-07-10  8:54   ` Daniel Vetter
2012-07-10  9:00     ` Chris Wilson
2012-07-10  9:27       ` [PATCH] " Chris Wilson
2012-07-18 18:06         ` Daniel Vetter
2012-07-26 10:34         ` Daniel Vetter
2012-07-26 10:49           ` [PATCH] drm/i915: Segregate memory domains in the GTT using coloring Chris Wilson
2012-07-26 11:00             ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.