All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Broaden application of set-domain(GTT)
@ 2014-10-13 13:20 Chris Wilson
  2014-10-14 12:47 ` Chris Wilson
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2014-10-13 13:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

Previously, this was restricted to only operate on bound objects - to
make pointer access through the GTT to the object coherent with writes
to and from the GPU. A second usecase is drm_intel_bo_wait_rendering()
which at present does not function unless the object also happens to
be bound into the GGTT (on current systems that is becoming increasingly
rare, especially for the typical requests from mesa). A third usecase is
a future patch wishing to extend the coverage of the GTT domain to
include objects not bound into the GGTT but still in its coherent cache
domain. For the latter pair of requests, we need to operate on the
object regardless of its bind state.

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a2251d9dc134..cee6fdfe8e05 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1506,18 +1506,10 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto unref;
 
-	if (read_domains & I915_GEM_DOMAIN_GTT) {
+	if (read_domains & I915_GEM_DOMAIN_GTT)
 		ret = i915_gem_object_set_to_gtt_domain(obj, write_domain != 0);
-
-		/* Silently promote "you're not bound, there was nothing to do"
-		 * to success, since the client was just asking us to
-		 * make sure everything was done.
-		 */
-		if (ret == -EINVAL)
-			ret = 0;
-	} else {
+	else
 		ret = i915_gem_object_set_to_cpu_domain(obj, write_domain != 0);
-	}
 
 unref:
 	drm_gem_object_unreference(&obj->base);
@@ -3474,14 +3466,10 @@ i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
 int
 i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 {
-	struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
 	uint32_t old_write_domain, old_read_domains;
+	struct i915_vma *vma;
 	int ret;
 
-	/* Not valid to be called on unbound objects. */
-	if (vma == NULL)
-		return -EINVAL;
-
 	if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
 		return 0;
 
@@ -3518,7 +3506,10 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 					    old_write_domain);
 
 	/* And bump the LRU for this access */
-	if (i915_gem_object_is_inactive(obj))
+	vma = i915_gem_obj_to_ggtt(obj);
+	if (vma &&
+	    drm_mm_node_allocated(&vma->node) &&
+	    i915_gem_object_is_inactive(obj))
 		list_move_tail(&vma->mm_list,
 			       &to_i915(obj->base.dev)->gtt.base.inactive_list);
 
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] drm/i915: Broaden application of set-domain(GTT)
  2014-10-13 13:20 [PATCH] drm/i915: Broaden application of set-domain(GTT) Chris Wilson
@ 2014-10-14 12:47 ` Chris Wilson
  2014-10-23 10:33   ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object akash.goel
  2014-11-05 12:49   ` [PATCH] drm/i915: Broaden application of set-domain(GTT) Chris Wilson
  0 siblings, 2 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-14 12:47 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

Previously, this was restricted to only operate on bound objects - to
make pointer access through the GTT to the object coherent with writes
to and from the GPU. A second usecase is drm_intel_bo_wait_rendering()
which at present does not function unless the object also happens to
be bound into the GGTT (on current systems that is becoming increasingly
rare, especially for the typical requests from mesa). A third usecase is
a future patch wishing to extend the coverage of the GTT domain to
include objects not bound into the GGTT but still in its coherent cache
domain. For the latter pair of requests, we need to operate on the
object regardless of its bind state.

v2: After discussion with Akash, we came to the conclusion that the
get-pages was required in order for accurate domain tracking in the
corner cases (like the shrinker) and also useful for ensuring memory
coherency with earlier cached CPU mmaps in case userspace uses exotic
cache bypass (non-temporal) instructions.

Cc: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cdd0f105f670..cae4f28b642b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1503,18 +1503,10 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto unref;
 
-	if (read_domains & I915_GEM_DOMAIN_GTT) {
+	if (read_domains & I915_GEM_DOMAIN_GTT)
 		ret = i915_gem_object_set_to_gtt_domain(obj, write_domain != 0);
-
-		/* Silently promote "you're not bound, there was nothing to do"
-		 * to success, since the client was just asking us to
-		 * make sure everything was done.
-		 */
-		if (ret == -EINVAL)
-			ret = 0;
-	} else {
+	else
 		ret = i915_gem_object_set_to_cpu_domain(obj, write_domain != 0);
-	}
 
 unref:
 	drm_gem_object_unreference(&obj->base);
@@ -3442,14 +3434,10 @@ i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj,
 int
 i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 {
-	struct i915_vma *vma = i915_gem_obj_to_ggtt(obj);
 	uint32_t old_write_domain, old_read_domains;
+	struct i915_vma *vma;
 	int ret;
 
-	/* Not valid to be called on unbound objects. */
-	if (vma == NULL)
-		return -EINVAL;
-
 	if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
 		return 0;
 
@@ -3457,6 +3445,18 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 	if (ret)
 		return ret;
 
+	/* Flush and acquire obj->pages so that we are coherent through
+	 * direct access in memory with previous cached writes through
+	 * shmemfs and that our cache domain tracking remains valid.
+	 * For example, if the obj->filp was moved to swap without us
+	 * being notified and releasing the pages, we would mistakenly
+	 * continue to assume that the obj remained out of the CPU cached
+	 * domain.
+	 */
+	ret = i915_gem_object_get_pages(obj);
+	if (ret)
+		return ret;
+
 	i915_gem_object_flush_cpu_write_domain(obj, false);
 
 	/* Serialise direct access to this object with the barriers for
@@ -3486,7 +3486,10 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 					    old_write_domain);
 
 	/* And bump the LRU for this access */
-	if (i915_gem_object_is_inactive(obj))
+	vma = i915_gem_obj_to_ggtt(obj);
+	if (vma &&
+	    drm_mm_node_allocated(&vma->node) &&
+	    i915_gem_object_is_inactive(obj))
 		list_move_tail(&vma->mm_list,
 			       &to_i915(obj->base.dev)->gtt.base.inactive_list);
 
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] drm/i915: Support to create uncached user mapping for a Gem object
  2014-10-14 12:47 ` Chris Wilson
@ 2014-10-23 10:33   ` akash.goel
  2014-10-23 10:37     ` [PATCH] intel: New libdrm interface to create uncached CPU mapping akash.goel
                       ` (5 more replies)
  2014-11-05 12:49   ` [PATCH] drm/i915: Broaden application of set-domain(GTT) Chris Wilson
  1 sibling, 6 replies; 46+ messages in thread
From: akash.goel @ 2014-10-23 10:33 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

This patch provides support to create uncached virtual mappings for a Gem
object. It intends to provide the same funtionality of 'mmap_gtt' interface
without the constraints of a limited aperture space, but provided clients
handles the linear to tile conversion on their own.
This is for improving the CPU write operation performance, as with such
mapping, writes are almost 50% faster than with mmap_gtt. Also it avoids the
Cache flush after update from CPU side, when object is passed onto GPU, which
will be the case if regular mmap ioctl interface is used.
This type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated.
To ensure the cache coherency, before using this mapping, the GTT domain has
been reused here. This provides the required Cache flush if the object is in
CPU domain or synchronization against the concurrent rendering. Although the
access through an uncached mmap shall automatically invalidate the cache lines,
but this may not be true for non temporal write instructions and also not all
pages of the object be updated at any given point of time through this mapping.
Having a call to get_pages in set_to_gtt_domain function, as added by Chris in
the earlier patch, would guarantee the clflush and so there will be no cache-
lines holding the data for the object before it is accessed through this map.
A new field 'flags' has been added to 'drm_i915_gem_mmap' structure, used in
gem_mmap ioctl, which allows to convey the required mapping type as uncached.
User can query the driver for the support of this mapping through the
get_params. For that a new define I915_PARAM_HAS_UC_MMAP has been added.

Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c |  3 +++
 drivers/gpu/drm/i915/i915_gem.c | 17 +++++++++++++++++
 include/uapi/drm/i915_drm.h     |  4 ++++
 3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 1b39807..2d8191a 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1027,6 +1027,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_CMD_PARSER_VERSION:
 		value = i915_cmd_parser_get_version();
 		break;
+	case I915_PARAM_HAS_UC_MMAP:
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1b192d4..16b267b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1492,6 +1492,23 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 	addr = vm_mmap(obj->filp, 0, args->size,
 		       PROT_READ | PROT_WRITE, MAP_SHARED,
 		       args->offset);
+
+	if (args->flags & I915_GEM_USE_UNCACHED_MMAP) {
+		struct mm_struct *mm = current->mm;
+		struct vm_area_struct *vma;
+		down_write(&mm->mmap_sem);
+		vma = find_vma(mm, addr);
+		if (!vma) {
+			drm_gem_object_unreference_unlocked(obj);
+			return -EINVAL;
+		}
+		/* Change the page attribute to uncached (along with
+		 * write-combinning to get better performance) */
+		vma->vm_page_prot =
+			pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+		up_write(&mm->mmap_sem);
+	}
+
 	drm_gem_object_unreference_unlocked(obj);
 	if (IS_ERR((void *)addr))
 		return addr;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ff57f07..3d0b1c0 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_HAS_UC_MMAP  	 29
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -487,6 +488,9 @@ struct drm_i915_gem_mmap {
 	 * This is a fixed-size type for 32/64 compatibility.
 	 */
 	__u64 addr_ptr;
+
+#define I915_GEM_USE_UNCACHED_MMAP (1<<0)
+	__u64 flags;
 };
 
 struct drm_i915_gem_mmap_gtt {
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH] intel: New libdrm interface to create uncached CPU mapping
  2014-10-23 10:33   ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object akash.goel
@ 2014-10-23 10:37     ` akash.goel
  2014-10-23 10:54     ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object Chris Wilson
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 46+ messages in thread
From: akash.goel @ 2014-10-23 10:37 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

A new libdrm interface 'drm_intel_gem_bo_map_uc' is provided
by this patch. Through this interface Gfx clients can create
uncached virtual mappings of the Gem object. It will provide the
same funtionality of 'mmap_gtt' interface without the constraints
of limited aperture space, but provided clients handles the linear
to tile conversion on their own.
This patch is intended for improving the CPU write operation performance,
as with such mapping, writes are almost 50% faster than with mmap_gtt.
Also it avoids the Cache flush after update from CPU side, when object is
passed on to GPU, which will be the case if regular mmap interface is used.
This type of mapping is specially useful in case of sub-region
update, i.e. when only a portion of the object is to be updated.
Also there is a support for the unsynchronized version of this interface
named 'drm_intel_gem_bo_map_uc_unsynchronized', through which uncached
virtual mappings, but unsynchronized one, can be created of the Gem object.
To ensure the cache coherency, before using this mapping, the GTT domain has
been reused here. This provides the required Cache flush if the object is in
CPU domain or synchronization against the concurrent rendering
A new field 'flags' has been added to 'drm_i915_gem_mmap' structure,
used in gem_mmap ioctl, which allows to convey the required mapping
type as uncached type. User can query the driver for the support of this
mapping through the get_params. For that a new define
I915_PARAM_HAS_UC_MMAP has been added.

Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 include/drm/i915_drm.h   |   4 ++
 intel/intel_bufmgr.h     |   3 ++
 intel/intel_bufmgr_gem.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 131 insertions(+)

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 15dd01d..cd97ff9 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_HAS_UC_MMAP		 29
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -487,6 +488,9 @@ struct drm_i915_gem_mmap {
 	 * This is a fixed-size type for 32/64 compatibility.
 	 */
 	__u64 addr_ptr;
+
+#define I915_GEM_USE_UNCACHED_MMAP (1<<0)
+	__u64 flags;
 };
 
 struct drm_i915_gem_mmap_gtt {
diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index be83a56..0a7c43a 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -177,6 +177,9 @@ void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
 int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
 int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
 int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_uc_unsynchronized(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_uc(drm_intel_bo *bo);
+int drm_intel_gem_bo_unmap_uc(drm_intel_bo *bo);
 
 int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
 void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index f2f4fea..f094ce8 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -184,6 +184,8 @@ struct _drm_intel_bo_gem {
 	int reloc_count;
 	/** Mapped address for the buffer, saved across map/unmap cycles */
 	void *mem_virtual;
+	/** Uncached Mapped address for the buffer, saved across map/unmap cycles */
+	void *mem_uc_virtual;
 	/** GTT virtual address for the buffer, saved across map/unmap cycles */
 	void *gtt_virtual;
 	/**
@@ -1267,6 +1269,121 @@ static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
+static int
+map_uncached(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	/* Get a mapping of the buffer if we haven't before. */
+	if (bo_gem->mem_uc_virtual == NULL) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map_uc: mmap %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		VG_CLEAR(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		/* To indicate the uncached virtual mapping to KMD */
+		mmap_arg.flags = I915_GEM_USE_UNCACHED_MMAP;
+		mmap_arg.offset = 0;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			pthread_mutex_unlock(&bufmgr_gem->lock);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_uc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+
+	bo->virtual = bo_gem->mem_uc_virtual;
+
+	DBG("bo_map_uc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_uc_virtual);
+
+	return 0;
+}
+
+/* To be used in a similar way to mmap_gtt */
+drm_public int
+drm_intel_gem_bo_map_uc(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_uncached(bo);
+	if (ret) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
+	/* Now move it to the GTT domain so that the GPU and CPU
+	 * caches are flushed and the GPU isn't actively using the
+	 * buffer.
+	 *
+	 * The domain change is done even for the objects which
+	 * are not bounded. For them first the pages are acquired,
+	 * before the domain change.
+	 */
+	VG_CLEAR(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+drm_public int
+drm_intel_gem_bo_map_uc_unsynchronized(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+#ifdef HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+#endif
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_uncached(bo);
+	if (ret == 0) {
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->gtt_virtual, bo->size));
+	}
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
 static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
@@ -1293,6 +1410,7 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 
 		VG_CLEAR(mmap_arg);
 		mmap_arg.handle = bo_gem->gem_handle;
+		mmap_arg.flags = 0;
 		mmap_arg.offset = 0;
 		mmap_arg.size = bo->size;
 		ret = drmIoctl(bufmgr_gem->fd,
@@ -1553,6 +1671,12 @@ static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
 }
 
 drm_public int
+drm_intel_gem_bo_unmap_uc(drm_intel_bo *bo)
+{
+	return drm_intel_gem_bo_unmap(bo);
+}
+
+drm_public int
 drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
 {
 	return drm_intel_gem_bo_unmap(bo);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support to create uncached user mapping for a Gem object
  2014-10-23 10:33   ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object akash.goel
  2014-10-23 10:37     ` [PATCH] intel: New libdrm interface to create uncached CPU mapping akash.goel
@ 2014-10-23 10:54     ` Chris Wilson
  2014-10-23 11:03     ` Chris Wilson
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 10:54 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Thu, Oct 23, 2014 at 04:03:56PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> This patch provides support to create uncached virtual mappings for a Gem
> object. It intends to provide the same funtionality of 'mmap_gtt' interface
> without the constraints of a limited aperture space, but provided clients
> handles the linear to tile conversion on their own.
> This is for improving the CPU write operation performance, as with such
> mapping, writes are almost 50% faster than with mmap_gtt.

That seems like it should be easy to demonstrate with igt/gem_gtt_speed.
You should also copy igt/gem_mmap_gtt to test the interface, and extend
igt/gem_concurrent_blt to make sure that mmap(wc)+domain vs the GPU is
correct.

> Also it avoids the
> Cache flush after update from CPU side, when object is passed onto GPU, which
> will be the case if regular mmap ioctl interface is used.
                             ^CPU

> This type of mapping is specially useful in case of sub-region update,
> i.e. when only a portion of the object is to be updated.
> To ensure the cache coherency, before using this mapping, the GTT domain has
> been reused here. This provides the required Cache flush if the object is in
> CPU domain or synchronization against the concurrent rendering. Although the
> access through an uncached mmap shall automatically invalidate the cache lines,
> but this may not be true for non temporal write instructions and also not all
> pages of the object be updated at any given point of time through this mapping.
> Having a call to get_pages in set_to_gtt_domain function, as added by Chris in
> the earlier patch, would guarantee the clflush and so there will be no cache-
> lines holding the data for the object before it is accessed through this map.
> A new field 'flags' has been added to 'drm_i915_gem_mmap' structure, used in
> gem_mmap ioctl, which allows to convey the required mapping type as uncached.
> User can query the driver for the support of this mapping through the
> get_params. For that a new define I915_PARAM_HAS_UC_MMAP has been added.

I quibble over using the term uncached here, as what we mean is
write-combining.  They are quite different in terms of memory access!

I've put together a usecase in
http://cgit.freedesktop.org/~ickle/xf86-video-intel/commit/?h=wc-mmap&id=118e98a2d448469064cdcedf2013a91c3a69dab4
which I'll test shortly. It uses the new mmapping to allow direct access
to larger than aperture buffers, prefers the unbound access for linear
buffers and allows manual detiled uploads on !llc platforms.

> Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_dma.c |  3 +++
>  drivers/gpu/drm/i915/i915_gem.c | 17 +++++++++++++++++
>  include/uapi/drm/i915_drm.h     |  4 ++++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 1b39807..2d8191a 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1027,6 +1027,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  	case I915_PARAM_CMD_PARSER_VERSION:
>  		value = i915_cmd_parser_get_version();
>  		break;
> +	case I915_PARAM_HAS_UC_MMAP:
> +		value = 1;
> +		break;
>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
>  		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1b192d4..16b267b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1492,6 +1492,23 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
>  	addr = vm_mmap(obj->filp, 0, args->size,
>  		       PROT_READ | PROT_WRITE, MAP_SHARED,
>  		       args->offset);
> +
> +	if (args->flags & I915_GEM_USE_UNCACHED_MMAP) {
> +		struct mm_struct *mm = current->mm;
> +		struct vm_area_struct *vma;

Newline please between variables and code.

> +		down_write(&mm->mmap_sem);
> +		vma = find_vma(mm, addr);
> +		if (!vma) {
> +			drm_gem_object_unreference_unlocked(obj);
> +			return -EINVAL;
Eek returning with mmap_sem held.

> +		}
> +		/* Change the page attribute to uncached (along with
> +		 * write-combinning to get better performance) */
> +		vma->vm_page_prot =
> +			pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> +		up_write(&mm->mmap_sem);

Perhaps,

down_write();
vma = find_vma(mm, addr);
if (vma && cpu_has_pat) 
	vma->vm_page_prot = pgprot_writecombing();
else
	addr = ERR_PTR(-ENODEV);
up_write();

I am pretty sure we can't do this trick on really old CPUs!
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support to create uncached user mapping for a Gem object
  2014-10-23 10:33   ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object akash.goel
  2014-10-23 10:37     ` [PATCH] intel: New libdrm interface to create uncached CPU mapping akash.goel
  2014-10-23 10:54     ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object Chris Wilson
@ 2014-10-23 11:03     ` Chris Wilson
  2014-10-23 11:56     ` Chris Wilson
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 11:03 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Thu, Oct 23, 2014 at 04:03:56PM +0530, akash.goel@intel.com wrote:
>  drivers/gpu/drm/i915/i915_dma.c |  3 +++
>  drivers/gpu/drm/i915/i915_gem.c | 17 +++++++++++++++++
>  include/uapi/drm/i915_drm.h     |  4 ++++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 1b39807..2d8191a 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1027,6 +1027,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>  	case I915_PARAM_CMD_PARSER_VERSION:
>  		value = i915_cmd_parser_get_version();
>  		break;
> +	case I915_PARAM_HAS_UC_MMAP:
> +		value = 1;
> +		break;
>  	default:
>  		DRM_DEBUG("Unknown parameter %d\n", param->param);
>  		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1b192d4..16b267b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1492,6 +1492,23 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
>  	addr = vm_mmap(obj->filp, 0, args->size,
>  		       PROT_READ | PROT_WRITE, MAP_SHARED,
>  		       args->offset);

Forgot to mutter about:

if (args->flags & INVALID_FLAGS)
	return -EINVAL;

and the param then makes better sense as
I915_PARAM_HAS_EXT_MMAP (extended mmap interface, which we can then
version)

In future we can test for EXT_MMAP support and
mmap(RANDOM_FLAG) returning EINVAL for any new extension
flags.

> +
> +	if (args->flags & I915_GEM_USE_UNCACHED_MMAP) {
> +		struct mm_struct *mm = current->mm;
> +		struct vm_area_struct *vma;
> +		down_write(&mm->mmap_sem);
> +		vma = find_vma(mm, addr);
> +		if (!vma) {
> +			drm_gem_object_unreference_unlocked(obj);
> +			return -EINVAL;
> +		}
> +		/* Change the page attribute to uncached (along with
> +		 * write-combinning to get better performance) */
> +		vma->vm_page_prot =
> +			pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> +		up_write(&mm->mmap_sem);
> +	}
> +
>  	drm_gem_object_unreference_unlocked(obj);
>  	if (IS_ERR((void *)addr))
>  		return addr;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index ff57f07..3d0b1c0 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
>  #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
>  #define I915_PARAM_HAS_WT     	 	 27
>  #define I915_PARAM_CMD_PARSER_VERSION	 28
> +#define I915_PARAM_HAS_UC_MMAP  	 29
>  
>  typedef struct drm_i915_getparam {
>  	int param;
> @@ -487,6 +488,9 @@ struct drm_i915_gem_mmap {
>  	 * This is a fixed-size type for 32/64 compatibility.
>  	 */
>  	__u64 addr_ptr;
> +
> +#define I915_GEM_USE_UNCACHED_MMAP (1<<0)

#define I915_MMAP_WC is shorter
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support to create uncached user mapping for a Gem object
  2014-10-23 10:33   ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object akash.goel
                       ` (2 preceding siblings ...)
  2014-10-23 11:03     ` Chris Wilson
@ 2014-10-23 11:56     ` Chris Wilson
  2014-10-23 13:23       ` Chris Wilson
  2014-10-23 16:55     ` [PATCH] drm/i915: Support creation of unbound wc user mappings for objects Chris Wilson
  2014-10-24  8:40     ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object Daniel Vetter
  5 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 11:56 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Thu, Oct 23, 2014 at 04:03:56PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 

> This is for improving the CPU write operation performance, as with such
> mapping, writes are almost 50% faster than with mmap_gtt. Also it avoids the

I doubt it is the actual write that is faster. For example,

   gtt       wc    Operation
--------  ------   ---------
424000.0    1.30   ShmPutImage 10x10 square 
 29500.0    1.42   ShmPutImage 100x100 square 
  1510.0    0.95   ShmPutImage 500x500 square 

It seems to reduce the overhead for small transfers (with an already
mmaped pointer). That's interesting certainly, and probably a Clue for
further improving performance. But it looks like peak throughput is
limited by memory bandwidth, which has been my experience with the GTT
mmap thus far.

I have some doubts as to whether it is coherent with the display though,
and so whether it is truly write-combining...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support to create uncached user mapping for a Gem object
  2014-10-23 11:56     ` Chris Wilson
@ 2014-10-23 13:23       ` Chris Wilson
  2014-12-10  4:48         ` Chad Versace
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 13:23 UTC (permalink / raw)
  To: akash.goel, intel-gfx

On Thu, Oct 23, 2014 at 12:56:46PM +0100, Chris Wilson wrote:
> On Thu, Oct 23, 2014 at 04:03:56PM +0530, akash.goel@intel.com wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> 
> > This is for improving the CPU write operation performance, as with such
> > mapping, writes are almost 50% faster than with mmap_gtt. Also it avoids the
> 
> I doubt it is the actual write that is faster. For example,
> 
>    gtt       wc    Operation
> --------  ------   ---------
> 424000.0    1.30   ShmPutImage 10x10 square 
>  29500.0    1.42   ShmPutImage 100x100 square 
>   1510.0    0.95   ShmPutImage 500x500 square 

Oops, bugs beware. I wasn't calling the ioctl correctly.
Having re-run the test, I now get:

   gtt       wc    Operation
--------  ------   ---------
424000.0    1.30   ShmPutImage 10x10 square 
 29500.0    1.44   ShmPutImage 100x100 square 
  1510.0    1.38   ShmPutImage 500x500 square 

Supporting your claims. Very nice.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] drm/i915: Support creation of unbound wc user mappings for objects
  2014-10-23 10:33   ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object akash.goel
                       ` (3 preceding siblings ...)
  2014-10-23 11:56     ` Chris Wilson
@ 2014-10-23 16:55     ` Chris Wilson
  2014-10-23 19:05       ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
                         ` (4 more replies)
  2014-10-24  8:40     ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object Daniel Vetter
  5 siblings, 5 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 16:55 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Akash Goel

From: Akash Goel <akash.goel@intel.com>

This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU.  This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.

To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping.  Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.

v2: Fix error handling, invalid flag detection, renaming (ickle)

The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.

Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_dma.c |  3 +++
 drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++++++++++++
 include/uapi/drm/i915_drm.h     |  9 +++++++++
 3 files changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 1b398070b230..496fb72e25c8 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1027,6 +1027,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_CMD_PARSER_VERSION:
 		value = i915_cmd_parser_get_version();
 		break;
+	case I915_PARAM_MMAP_VERSION:
+		value = 1;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 895f9881f0aa..7d7265fcee95 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1485,6 +1485,12 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 	struct drm_gem_object *obj;
 	unsigned long addr;
 
+	if (args->flags & ~(I915_MMAP_WC))
+		return -EINVAL;
+
+	if (args->flags & I915_MMAP_WC && !cpu_has_pat)
+		return -ENODEV;
+
 	obj = drm_gem_object_lookup(dev, file, args->handle);
 	if (obj == NULL)
 		return -ENOENT;
@@ -1500,6 +1506,19 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 	addr = vm_mmap(obj->filp, 0, args->size,
 		       PROT_READ | PROT_WRITE, MAP_SHARED,
 		       args->offset);
+	if (args->flags & I915_MMAP_WC) {
+		struct mm_struct *mm = current->mm;
+		struct vm_area_struct *vma;
+
+		down_write(&mm->mmap_sem);
+		vma = find_vma(mm, addr);
+		if (vma)
+			vma->vm_page_prot =
+				pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+		else
+			addr = -ENOMEM;
+		up_write(&mm->mmap_sem);
+	}
 	drm_gem_object_unreference_unlocked(obj);
 	if (IS_ERR((void *)addr))
 		return addr;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index ff57f07c3249..9b3762724a2b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_MMAP_VERSION          29
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -487,6 +488,14 @@ struct drm_i915_gem_mmap {
 	 * This is a fixed-size type for 32/64 compatibility.
 	 */
 	__u64 addr_ptr;
+
+	/**
+	 * Flags for extended behaviour.
+	 *
+	 * Added in version 2.
+	 */
+	__u64 flags;
+#define I915_MMAP_WC 0x1
 };
 
 struct drm_i915_gem_mmap_gtt {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs
  2014-10-23 16:55     ` [PATCH] drm/i915: Support creation of unbound wc user mappings for objects Chris Wilson
@ 2014-10-23 19:05       ` Chris Wilson
  2014-10-23 19:05         ` [PATCH 2/3] lib/core: Check for kernel error messages and WARN if any are found Chris Wilson
                           ` (2 more replies)
  2014-10-23 19:11       ` [PATCH 1/3] igt/gem_mmap_wc: Exercise mmap(wc) interface Chris Wilson
                         ` (3 subsequent siblings)
  4 siblings, 3 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 19:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_aux.c                | 69 ++++++++++++++++++++++++++++++++++++++++++++
 lib/igt_aux.h                |  4 +++
 tests/gem_evict_alignment.c  | 15 ++++++++++
 tests/gem_evict_everything.c | 15 ++++++++++
 4 files changed, 103 insertions(+)

diff --git a/lib/igt_aux.c b/lib/igt_aux.c
index 0e1eeea..b78ca2f 100644
--- a/lib/igt_aux.c
+++ b/lib/igt_aux.c
@@ -53,6 +53,7 @@
 #include "intel_chipset.h"
 #include "igt_aux.h"
 #include "igt_debugfs.h"
+#include "igt_gt.h"
 #include "config.h"
 #include "intel_reg.h"
 #include "ioctl_wrappers.h"
@@ -131,6 +132,74 @@ void igt_stop_signal_helper(void)
 	sig_stat = 0;
 }
 
+/* GPU abusers */
+static struct igt_helper_process hang_helper;
+static void __attribute__((noreturn))
+hang_helper_process(pid_t pid, int fd, int gen)
+{
+	while (1) {
+		if (kill(pid, 0)) /* Parent has died, so must we. */
+			exit(0);
+
+		igt_post_hang_ring(fd,
+				   igt_hang_ring(fd, gen, I915_EXEC_DEFAULT));
+
+		sleep(1);
+	}
+}
+
+/**
+ * igt_fork_hang_helper:
+ *
+ * Fork a child process using #igt_fork_helper to hang the default engine
+ * of the GPU at regular intervals.
+ *
+ * This is useful to exercise slow running code (such as aperture placement)
+ * which needs to be robust against a GPU reset.
+ *
+ * In tests with subtests this function can be called outside of failure
+ * catching code blocks like #igt_fixture or #igt_subtest.
+ */
+int igt_fork_hang_helper(void)
+{
+	int fd, gen;
+
+	if (igt_only_list_subtests())
+		return 1;
+
+	fd = drm_open_any();
+	if (fd == -1)
+		return 0;
+
+	gen = intel_gen(intel_get_drm_devid(fd));
+	if (gen < 5) {
+		close(fd);
+		return 0;
+	}
+
+	igt_fork_helper(&hang_helper)
+		hang_helper_process(getppid(), fd, gen);
+
+	close(fd);
+	return 1;
+}
+
+/**
+ * igt_stop_hang_helper:
+ *
+ * Stops the child process spawned with igt_fork_hang_helper().
+ *
+ * In tests with subtests this function can be called outside of failure
+ * catching code blocks like #igt_fixture or #igt_subtest.
+ */
+void igt_stop_hang_helper(void)
+{
+	if (igt_only_list_subtests())
+		return;
+
+	igt_stop_helper(&hang_helper);
+}
+
 /**
  * igt_check_boolean_env_var:
  * @env_var: environment variable name
diff --git a/lib/igt_aux.h b/lib/igt_aux.h
index 9b42918..6056575 100644
--- a/lib/igt_aux.h
+++ b/lib/igt_aux.h
@@ -35,6 +35,10 @@
 /* generally useful helpers */
 void igt_fork_signal_helper(void);
 void igt_stop_signal_helper(void);
+
+int igt_fork_hang_helper(void);
+void igt_stop_hang_helper(void);
+
 void igt_exchange_int(void *array, unsigned i, unsigned j);
 void igt_permute_array(void *array, unsigned size,
 			   void (*exchange_func)(void *array,
diff --git a/tests/gem_evict_alignment.c b/tests/gem_evict_alignment.c
index e814f36..3ee23b9 100644
--- a/tests/gem_evict_alignment.c
+++ b/tests/gem_evict_alignment.c
@@ -221,6 +221,21 @@ igt_main
 		count = 4;
 		major_evictions(fd, size, count);
 	}
+
+	if (igt_fork_hang_helper()) {
+		igt_subtest("minor-hang") {
+			size = 1024 * 1024;
+			count = 3*gem_aperture_size(fd) / size / 4;
+			minor_evictions(fd, size, count);
+		}
+
+		igt_subtest("major-hang") {
+			size = 3*gem_aperture_size(fd) / 4;
+			count = 4;
+			major_evictions(fd, size, count);
+		}
+		igt_stop_hang_helper();
+	}
 	igt_stop_signal_helper();
 
 	igt_fixture
diff --git a/tests/gem_evict_everything.c b/tests/gem_evict_everything.c
index c1eb3bd..596891c 100644
--- a/tests/gem_evict_everything.c
+++ b/tests/gem_evict_everything.c
@@ -234,6 +234,21 @@ igt_main
 		test_major_evictions(fd, size, count);
 	}
 
+	if (igt_fork_hang_helper()) {
+		igt_subtest("swapping-hang")
+			test_swapping_evictions(fd, size, count);
+
+		igt_subtest("minor-hang")
+			test_minor_evictions(fd, size, count);
+
+		igt_subtest("major-hang") {
+			size = 3*gem_aperture_size(fd) / 4;
+			count = 4;
+			test_major_evictions(fd, size, count);
+		}
+
+		igt_stop_hang_helper();
+	}
 	igt_stop_signal_helper();
 
 	igt_fixture {
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/3] lib/core: Check for kernel error messages and WARN if any are found
  2014-10-23 19:05       ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
@ 2014-10-23 19:05         ` Chris Wilson
  2014-10-23 19:05         ` [PATCH 3/3] reg-read-8 Chris Wilson
  2014-10-23 19:12         ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
  2 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 19:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

At the end of a subtest, check for any WARNs or ERRORs (or worse!)
emitted since the start of our test and FAIL the subtest if any are
found. This will prevent silent false test positives due to oops being
missed or being falsely reported as TIMEOUTs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/igt_core.c | 116 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 106 insertions(+), 10 deletions(-)

diff --git a/lib/igt_core.c b/lib/igt_core.c
index e3d5fb0..946a46a 100644
--- a/lib/igt_core.c
+++ b/lib/igt_core.c
@@ -208,6 +208,7 @@ static char *run_single_subtest = NULL;
 static bool run_single_subtest_found = false;
 static const char *in_subtest = NULL;
 static struct timespec subtest_time;
+static int subtest_kmsg = -1;
 static bool in_fixture = false;
 static bool test_with_subtests = false;
 static enum {
@@ -229,14 +230,14 @@ enum {
 
 __attribute__((format(printf, 1, 2)))
 static void kmsg(const char *format, ...)
-#define KERN_EMER	"<0>"
-#define KERN_ALERT	"<1>"
-#define KERN_CRIT	"<2>"
-#define KERN_ERR	"<3>"
-#define KERN_WARNING	"<4>"
-#define KERN_NOTICE	"<5>"
-#define KERN_INFO	"<6>"
-#define KERN_DEBUG	"<7>"
+#define KMSG_EMER	"<0>"
+#define KMSG_ALERT	"<1>"
+#define KMSG_CRIT	"<2>"
+#define KMSG_ERR	"<3>"
+#define KMSG_WARNING	"<4>"
+#define KMSG_NOTICE	"<5>"
+#define KMSG_INFO	"<6>"
+#define KMSG_DEBUG	"<7>"
 {
 	va_list ap;
 	FILE *file;
@@ -545,7 +546,7 @@ out:
 		exit(ret == -1 ? 0 : IGT_EXIT_INVALID);
 
 	if (!list_subtests) {
-		kmsg(KERN_INFO "%s: executing\n", command_str);
+		kmsg(KMSG_INFO "%s: executing\n", command_str);
 		print_version();
 
 		oom_adjust_for_doom();
@@ -679,7 +680,11 @@ bool __igt_run_subtest(const char *subtest_name)
 		return false;
 	}
 
-	kmsg(KERN_INFO "%s: starting subtest %s\n", command_str, subtest_name);
+	kmsg(KMSG_INFO "%s: starting subtest %s\n", command_str, subtest_name);
+
+	subtest_kmsg = open("/dev/kmsg", O_RDONLY | O_NONBLOCK);
+	if (subtest_kmsg != -1)
+		lseek(subtest_kmsg, 0, SEEK_END);
 
 	gettime(&subtest_time);
 	return (in_subtest = subtest_name);
@@ -712,6 +717,82 @@ static bool succeeded_one = false;
 static bool failed_one = false;
 static int igt_exitcode = IGT_EXIT_SUCCESS;
 
+static char *readfile(int fd)
+{
+	char *buf;
+	int len, end;
+
+	end = 0;
+	len = 4096;
+	buf = malloc(4096);
+	if (buf == NULL)
+		return NULL;
+
+	do {
+		char *newbuf;
+		int rem = len - end - 1;
+		int ret = read(fd, buf + end, rem);
+		if (ret <= 0)
+			break;
+
+		end += ret;
+		if (ret < rem)
+			break;
+
+		newbuf = realloc(buf, len *= 2);
+		if (newbuf == NULL)
+			break;
+
+		buf = newbuf;
+	} while (1);
+
+	if (end == 0) {
+		free(buf);
+		return NULL;
+	}
+
+	buf[end] = '\0';
+	return buf;
+}
+
+static int print_kernlog(char *log, int prio)
+#define PRIO_EMER	0
+#define PRIO_ALERT	1
+#define PRIO_CRIT	2
+#define PRIO_ERR	3
+#define PRIO_WARNING	4
+#define PRIO_NOTICE	5
+#define PRIO_INFO	6
+#define PRIO_DEBUG	7
+{
+	int count = 0;
+
+	do {
+		char *s, *t;
+		int lvl;
+
+		s = strchr(log, ';');
+		if (s == NULL)
+			break;
+
+		t = strchr(s + 1, '\n');
+		if (t == NULL)
+			break;
+
+		*t = '\0';
+
+		lvl = atoi(log);
+		if (lvl <= prio) {
+			puts(s+1);
+			count++;
+		}
+
+		log = t + 1;
+	} while (*log);
+
+	return count;
+}
+
 static void exit_subtest(const char *) __attribute__((noreturn));
 static void exit_subtest(const char *result)
 {
@@ -722,7 +803,22 @@ static void exit_subtest(const char *result)
 	elapsed = now.tv_sec - subtest_time.tv_sec;
 	elapsed += (now.tv_nsec - subtest_time.tv_nsec) * 1e-9;
 
+	if (subtest_kmsg != -1) {
+		char *log;
+
+		log = readfile(subtest_kmsg);
+		if (log) {
+			if (print_kernlog(log, PRIO_WARNING) &&
+			    strcmp(result, "FAIL"))
+				result = "WARN";
+			free(log);
+		}
+		close(subtest_kmsg);
+		subtest_kmsg = -1;
+	}
+
 	printf("Subtest %s: %s (%.3fs)\n", in_subtest, result, elapsed);
+
 	in_subtest = NULL;
 	longjmp(igt_subtest_jmpbuf, 1);
 }
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 3/3] reg-read-8
  2014-10-23 19:05       ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
  2014-10-23 19:05         ` [PATCH 2/3] lib/core: Check for kernel error messages and WARN if any are found Chris Wilson
@ 2014-10-23 19:05         ` Chris Wilson
  2014-10-23 19:12         ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
  2 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 19:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

---
 tools/intel_reg_read.c | 50 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/tools/intel_reg_read.c b/tools/intel_reg_read.c
index 3b91291..77fd21b 100644
--- a/tools/intel_reg_read.c
+++ b/tools/intel_reg_read.c
@@ -46,13 +46,19 @@ static void bit_decode(uint32_t reg)
 	printf("\n");
 }
 
-static void dump_range(uint32_t start, uint32_t end)
+static void dump_range(uint32_t start, uint32_t end, int readq)
 {
 	int i;
 
-	for (i = start; i < end; i += 4)
-		printf("0x%X : 0x%X\n", i,
-		       *(volatile uint32_t *)((volatile char*)mmio + i));
+	if (readq) {
+		for (i = start; i < end; i += 8)
+			printf("0x%X : 0x%llX\n", i,
+			       (long long unsigned)*(volatile uint64_t *)((volatile char*)mmio + i));
+	} else {
+		for (i = start; i < end; i += 4)
+			printf("0x%X : 0x%X\n", i,
+			       *(volatile uint32_t *)((volatile char*)mmio + i));
+	}
 }
 
 static void usage(char *cmdname)
@@ -74,9 +80,13 @@ int main(int argc, char** argv)
 	int full_dump = 0;
 	int decode_bits = 0;
 	int dwords = 1;
+	int readq = 0;
 
-	while ((ch = getopt(argc, argv, "dfhc:")) != -1) {
+	while ((ch = getopt(argc, argv, "8dfhc:")) != -1) {
 		switch(ch) {
+		case '8':
+			readq = 1;
+			break;
 		case 'd':
 			decode_bits = 1;
 			break;
@@ -110,24 +120,24 @@ int main(int argc, char** argv)
 	intel_register_access_init(intel_get_pci_device(), 0);
 
 	if (full_dump) {
-		dump_range(0x00000, 0x00fff);   /* VGA registers */
-		dump_range(0x02000, 0x02fff);   /* instruction, memory, interrupt control registers */
-		dump_range(0x03000, 0x031ff);   /* FENCE and PPGTT control registers */
-		dump_range(0x03200, 0x03fff);   /* frame buffer compression registers */
-		dump_range(0x05000, 0x05fff);   /* I/O control registers */
-		dump_range(0x06000, 0x06fff);   /* clock control registers */
-		dump_range(0x07000, 0x07fff);   /* 3D internal debug registers */
-		dump_range(0x07400, 0x088ff);   /* GPE debug registers */
-		dump_range(0x0a000, 0x0afff);   /* display palette registers */
-		dump_range(0x10000, 0x13fff);   /* MMIO MCHBAR */
-		dump_range(0x30000, 0x3ffff);   /* overlay registers */
-		dump_range(0x60000, 0x6ffff);   /* display engine pipeline registers */
-		dump_range(0x70000, 0x72fff);   /* display and cursor registers */
-		dump_range(0x73000, 0x73fff);   /* performance counters */
+		dump_range(0x00000, 0x00fff, 0);   /* VGA registers */
+		dump_range(0x02000, 0x02fff, 0);   /* instruction, memory, interrupt control registers */
+		dump_range(0x03000, 0x031ff, 0);   /* FENCE and PPGTT control registers */
+		dump_range(0x03200, 0x03fff, 0);   /* frame buffer compression registers */
+		dump_range(0x05000, 0x05fff, 0);   /* I/O control registers */
+		dump_range(0x06000, 0x06fff, 0);   /* clock control registers */
+		dump_range(0x07000, 0x07fff, 0);   /* 3D internal debug registers */
+		dump_range(0x07400, 0x088ff, 0);   /* GPE debug registers */
+		dump_range(0x0a000, 0x0afff, 0);   /* display palette registers */
+		dump_range(0x10000, 0x13fff, 0);   /* MMIO MCHBAR */
+		dump_range(0x30000, 0x3ffff, 0);   /* overlay registers */
+		dump_range(0x60000, 0x6ffff, 0);   /* display engine pipeline registers */
+		dump_range(0x70000, 0x72fff, 0);   /* display and cursor registers */
+		dump_range(0x73000, 0x73fff, 0);   /* performance counters */
 	} else {
 		for (i=0; i < argc; i++) {
 			sscanf(argv[i], "0x%x", &reg);
-			dump_range(reg, reg + (dwords * 4));
+			dump_range(reg, reg + dwords * (readq ? 8 : 4), readq);
 
 			if (decode_bits)
 				bit_decode(*(volatile uint32_t *)((volatile char*)mmio + reg));
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 1/3] igt/gem_mmap_wc: Exercise mmap(wc) interface
  2014-10-23 16:55     ` [PATCH] drm/i915: Support creation of unbound wc user mappings for objects Chris Wilson
  2014-10-23 19:05       ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
@ 2014-10-23 19:11       ` Chris Wilson
  2014-10-23 19:11         ` [PATCH 2/3] igt/gem_gtt_speed: compare against WC mmaps Chris Wilson
  2014-10-23 19:11         ` [PATCH 3/3] igt/gem_concurrent_blit: Exercise wc mappings Chris Wilson
  2014-10-25 11:51       ` [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects akash.goel
                         ` (2 subsequent siblings)
  4 siblings, 2 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 19:11 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=a, Size: 13117 bytes --]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 lib/ioctl_wrappers.c   |  76 ++++++++++
 lib/ioctl_wrappers.h   |   7 +-
 tests/.gitignore       |   1 +
 tests/Makefile.am      |   2 +
 tests/Makefile.sources |   1 +
 tests/gem_mmap_wc.c    | 365 +++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 451 insertions(+), 1 deletion(-)
 create mode 100644 tests/gem_mmap_wc.c

diff --git a/lib/ioctl_wrappers.c b/lib/ioctl_wrappers.c
index ab1a82e..ddb20d1 100644
--- a/lib/ioctl_wrappers.c
+++ b/lib/ioctl_wrappers.c
@@ -467,6 +467,82 @@ void *gem_mmap__gtt(int fd, uint32_t handle, int size, int prot)
 	return ptr;
 }
 
+struct local_i915_gem_mmap_v2 {
+	uint32_t handle;
+	uint32_t pad;
+	uint64_t offset;
+	uint64_t size;
+	uint64_t addr_ptr;
+	uint64_t flags;
+#define I915_MMAP_WC 0x1
+};
+#define LOCAL_IOCTL_I915_GEM_MMAP_v2 DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_MMAP, struct local_i915_gem_mmap_v2)
+
+bool gem_mmap__has_wc(int fd)
+{
+	static int has_wc = -1;
+
+	if (has_wc == -1) {
+		struct drm_i915_getparam gp;
+		int val = -1;
+
+		has_wc = 0;
+
+		memset(&gp, 0, sizeof(gp));
+		gp.param = 29; /* MMAP_VERSION */
+		gp.value = &val;
+
+		/* Do we have the new mmap_ioctl? */
+		ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp);
+		if (val >= 1) {
+			struct local_i915_gem_mmap_v2 arg;
+
+			/* Does this device support wc-mmaps ? */
+			memset(&arg, 0, sizeof(arg));
+			arg.handle = gem_create(fd, 4096);
+			arg.offset = 0;
+			arg.size = 4096;
+			arg.flags = I915_MMAP_WC;
+			has_wc = drmIoctl(fd, LOCAL_IOCTL_I915_GEM_MMAP_v2, &arg) == 0;
+			gem_close(fd, arg.handle);
+		}
+	}
+
+	return has_wc > 0;
+}
+
+/**
+ * gem_mmap__wc:
+ * @fd: open i915 drm file descriptor
+ * @handle: gem buffer object handle
+ * @size: size of the gem buffer
+ * @prot: memory protection bits as used by mmap()
+ *
+ * This functions wraps up procedure to establish a memory mapping through
+ * direct cpu access, bypassing the gpu and cpu caches completely.
+ *
+ * Returns: A pointer to the created memory mapping.
+ */
+void *gem_mmap__wc(int fd, uint32_t handle, int size, int prot)
+{
+	struct local_i915_gem_mmap_v2 arg;
+
+	errno = ENOSYS;
+	if (!gem_mmap__has_wc(fd))
+		return NULL;
+
+	memset(&arg, 0, sizeof(arg));
+	arg.handle = handle;
+	arg.offset = 0;
+	arg.size = size;
+	arg.flags = I915_MMAP_WC;
+	if (drmIoctl(fd, LOCAL_IOCTL_I915_GEM_MMAP_v2, &arg))
+		return NULL;
+
+	errno = 0;
+	return (void *)(uintptr_t)arg.addr_ptr;
+}
+
 /**
  * gem_mmap__cpu:
  * @fd: open i915 drm file descriptor
diff --git a/lib/ioctl_wrappers.h b/lib/ioctl_wrappers.h
index b1deedf..864c35a 100644
--- a/lib/ioctl_wrappers.h
+++ b/lib/ioctl_wrappers.h
@@ -60,8 +60,13 @@ uint32_t __gem_create(int fd, int size);
 uint32_t gem_create(int fd, int size);
 void gem_execbuf(int fd, struct drm_i915_gem_execbuffer2 *execbuf);
 
-void *gem_mmap__gtt(int fd, uint32_t handle, int size, int prot);
 void *gem_mmap__cpu(int fd, uint32_t handle, int size, int prot);
+void *gem_mmap__gtt(int fd, uint32_t handle, int size, int prot);
+
+bool gem_mmap__has_wc(int fd);
+void *gem_mmap__wc(int fd, uint32_t handle, int size, int prot);
+#define igt_require_mmap_wc(x) igt_require(gem_mmap__has_wc(fd))
+
 /**
  * gem_mmap:
  *
diff --git a/tests/.gitignore b/tests/.gitignore
index 3b1fb8b..5a46f3e 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -59,6 +59,7 @@ gem_madvise
 gem_media_fill
 gem_mmap
 gem_mmap_gtt
+gem_mmap_wc
 gem_mmap_offset_exhaustion
 gem_multi_bsd_sync_loop
 gem_non_secure_batch
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 5734002..17c675f 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -58,6 +58,8 @@ gem_flink_race_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_flink_race_LDADD = $(LDADD) -lpthread
 gem_mmap_gtt_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_mmap_gtt_LDADD = $(LDADD) -lpthread
+gem_mmap_wc_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
+gem_mmap_wc_LDADD = $(LDADD) -lpthread
 gem_threaded_access_tiled_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
 gem_threaded_access_tiled_LDADD = $(LDADD) -lpthread
 gem_tiled_swapping_CFLAGS = $(AM_CFLAGS) $(THREAD_CFLAGS)
diff --git a/tests/Makefile.sources b/tests/Makefile.sources
index 88bb854..4bbe2f0 100644
--- a/tests/Makefile.sources
+++ b/tests/Makefile.sources
@@ -44,6 +44,7 @@ TESTS_progs_M = \
 	gem_madvise \
 	gem_mmap \
 	gem_mmap_gtt \
+	gem_mmap_wc \
 	gem_partial_pwrite_pread \
 	gem_persistent_relocs \
 	gem_pipe_control_store_loop \
diff --git a/tests/gem_mmap_wc.c b/tests/gem_mmap_wc.c
new file mode 100644
index 0000000..9ab58ff
--- /dev/null
+++ b/tests/gem_mmap_wc.c
@@ -0,0 +1,365 @@
+/*
+ * Copyright © 2011 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chris Wilson <chris@chris-wilson.co.uk>
+ *
+ */
+
+#define _GNU_SOURCE
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <pthread.h>
+#include <errno.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include "drm.h"
+#include "ioctl_wrappers.h"
+#include "drmtest.h"
+#include "igt_debugfs.h"
+
+static int OBJECT_SIZE = 16*1024*1024;
+
+static void set_domain(int fd, uint32_t handle)
+{
+	gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
+}
+
+static void *
+mmap_bo(int fd, uint32_t handle)
+{
+	void *ptr;
+
+	ptr = gem_mmap__wc(fd, handle, OBJECT_SIZE, PROT_READ | PROT_WRITE);
+	igt_assert(ptr && ptr != MAP_FAILED);
+
+	return ptr;
+}
+
+static void *
+create_pointer(int fd)
+{
+	uint32_t handle;
+	void *ptr;
+
+	handle = gem_create(fd, OBJECT_SIZE);
+
+	ptr = mmap_bo(fd, handle);
+
+	gem_close(fd, handle);
+
+	return ptr;
+}
+
+static void
+test_copy(int fd)
+{
+	void *src, *dst;
+
+	igt_require_mmap_wc(fd);
+
+	/* copy from a fresh src to fresh dst to force pagefault on both */
+	src = create_pointer(fd);
+	dst = create_pointer(fd);
+
+	memcpy(dst, src, OBJECT_SIZE);
+	memcpy(src, dst, OBJECT_SIZE);
+
+	munmap(dst, OBJECT_SIZE);
+	munmap(src, OBJECT_SIZE);
+}
+
+enum test_read_write {
+	READ_BEFORE_WRITE,
+	READ_AFTER_WRITE,
+};
+
+static void
+test_read_write(int fd, enum test_read_write order)
+{
+	uint32_t handle;
+	void *ptr;
+	volatile uint32_t val = 0;
+
+	handle = gem_create(fd, OBJECT_SIZE);
+
+	ptr = mmap_bo(fd, handle);
+	igt_assert(ptr != MAP_FAILED);
+
+	if (order == READ_BEFORE_WRITE) {
+		val = *(uint32_t *)ptr;
+		*(uint32_t *)ptr = val;
+	} else {
+		*(uint32_t *)ptr = val;
+		val = *(uint32_t *)ptr;
+	}
+
+	gem_close(fd, handle);
+	munmap(ptr, OBJECT_SIZE);
+}
+
+static void
+test_read_write2(int fd, enum test_read_write order)
+{
+	uint32_t handle;
+	void *r, *w;
+	volatile uint32_t val = 0;
+
+	igt_require_mmap_wc(fd);
+
+	handle = gem_create(fd, OBJECT_SIZE);
+
+	r = gem_mmap__wc(fd, handle, OBJECT_SIZE, PROT_READ);
+	igt_assert(r != MAP_FAILED);
+
+	w = gem_mmap__wc(fd, handle, OBJECT_SIZE, PROT_READ | PROT_WRITE);
+	igt_assert(w != MAP_FAILED);
+
+	if (order == READ_BEFORE_WRITE) {
+		val = *(uint32_t *)r;
+		*(uint32_t *)w = val;
+	} else {
+		*(uint32_t *)w = val;
+		val = *(uint32_t *)r;
+	}
+
+	gem_close(fd, handle);
+	munmap(r, OBJECT_SIZE);
+	munmap(w, OBJECT_SIZE);
+}
+
+static void
+test_write(int fd)
+{
+	void *src;
+	uint32_t dst;
+
+	igt_require_mmap_wc(fd);
+
+	/* copy from a fresh src to fresh dst to force pagefault on both */
+	src = create_pointer(fd);
+	dst = gem_create(fd, OBJECT_SIZE);
+
+	gem_write(fd, dst, 0, src, OBJECT_SIZE);
+
+	gem_close(fd, dst);
+	munmap(src, OBJECT_SIZE);
+}
+
+static void
+test_write_gtt(int fd)
+{
+	uint32_t dst;
+	char *dst_gtt;
+	void *src;
+
+	igt_require_mmap_wc(fd);
+
+	dst = gem_create(fd, OBJECT_SIZE);
+
+	/* prefault object into gtt */
+	dst_gtt = mmap_bo(fd, dst);
+	set_domain(fd, dst);
+	memset(dst_gtt, 0, OBJECT_SIZE);
+	munmap(dst_gtt, OBJECT_SIZE);
+
+	src = create_pointer(fd);
+
+	gem_write(fd, dst, 0, src, OBJECT_SIZE);
+
+	gem_close(fd, dst);
+	munmap(src, OBJECT_SIZE);
+}
+
+static void
+test_read(int fd)
+{
+	void *dst;
+	uint32_t src;
+
+	igt_require_mmap_wc(fd);
+
+	/* copy from a fresh src to fresh dst to force pagefault on both */
+	dst = create_pointer(fd);
+	src = gem_create(fd, OBJECT_SIZE);
+
+	gem_read(fd, src, 0, dst, OBJECT_SIZE);
+
+	gem_close(fd, src);
+	munmap(dst, OBJECT_SIZE);
+}
+
+static void
+test_write_cpu_read_wc(int fd)
+{
+	uint32_t handle;
+	uint32_t *src, *dst;
+
+	igt_require_mmap_wc(fd);
+
+	handle = gem_create(fd, OBJECT_SIZE);
+
+	dst = gem_mmap__wc(fd, handle, OBJECT_SIZE, PROT_READ);
+	igt_assert(dst != (uint32_t *)MAP_FAILED);
+
+	src = gem_mmap__cpu(fd, handle, OBJECT_SIZE, PROT_WRITE);
+	igt_assert(src != (uint32_t *)MAP_FAILED);
+
+	gem_close(fd, handle);
+
+	memset(src, 0xaa, OBJECT_SIZE);
+	igt_assert(memcmp(dst, src, OBJECT_SIZE) == 0);
+
+	munmap(src, OBJECT_SIZE);
+	munmap(dst, OBJECT_SIZE);
+}
+
+static void
+test_write_gtt_read_wc(int fd)
+{
+	uint32_t handle;
+	uint32_t *src, *dst;
+
+	igt_require_mmap_wc(fd);
+
+	handle = gem_create(fd, OBJECT_SIZE);
+
+	dst = gem_mmap__wc(fd, handle, OBJECT_SIZE, PROT_READ);
+	igt_assert(dst != (uint32_t *)MAP_FAILED);
+
+	src = gem_mmap__gtt(fd, handle, OBJECT_SIZE, PROT_WRITE);
+	igt_assert(src != (uint32_t *)MAP_FAILED);
+
+	gem_close(fd, handle);
+
+	memset(src, 0xaa, OBJECT_SIZE);
+	igt_assert(memcmp(dst, src, OBJECT_SIZE) == 0);
+
+	munmap(src, OBJECT_SIZE);
+	munmap(dst, OBJECT_SIZE);
+}
+
+struct thread_fault_concurrent {
+	pthread_t thread;
+	int id;
+	uint32_t **ptr;
+};
+
+static void *
+thread_fault_concurrent(void *closure)
+{
+	struct thread_fault_concurrent *t = closure;
+	uint32_t val = 0;
+	int n;
+
+	for (n = 0; n < 32; n++) {
+		if (n & 1)
+			*t->ptr[(n + t->id) % 32] = val;
+		else
+			val = *t->ptr[(n + t->id) % 32];
+	}
+
+	return NULL;
+}
+
+static void
+test_fault_concurrent(int fd)
+{
+	uint32_t *ptr[32];
+	struct thread_fault_concurrent thread[64];
+	int n;
+
+	igt_require_mmap_wc(fd);
+
+	for (n = 0; n < 32; n++) {
+		ptr[n] = create_pointer(fd);
+	}
+
+	for (n = 0; n < 64; n++) {
+		thread[n].ptr = ptr;
+		thread[n].id = n;
+		pthread_create(&thread[n].thread, NULL, thread_fault_concurrent, &thread[n]);
+	}
+
+	for (n = 0; n < 64; n++)
+		pthread_join(thread[n].thread, NULL);
+
+	for (n = 0; n < 32; n++) {
+		munmap(ptr[n], OBJECT_SIZE);
+	}
+}
+
+static void
+run_without_prefault(int fd,
+			void (*func)(int fd))
+{
+	igt_disable_prefault();
+	func(fd);
+	igt_enable_prefault();
+}
+
+int fd;
+
+igt_main
+{
+	if (igt_run_in_simulation())
+		OBJECT_SIZE = 1 * 1024 * 1024;
+
+	igt_fixture
+		fd = drm_open_any();
+
+	igt_subtest("copy")
+		test_copy(fd);
+	igt_subtest("read")
+		test_read(fd);
+	igt_subtest("write")
+		test_write(fd);
+	igt_subtest("write-gtt")
+		test_write_gtt(fd);
+	igt_subtest("read-write")
+		test_read_write(fd, READ_BEFORE_WRITE);
+	igt_subtest("write-read")
+		test_read_write(fd, READ_AFTER_WRITE);
+	igt_subtest("read-write-distinct")
+		test_read_write2(fd, READ_BEFORE_WRITE);
+	igt_subtest("write-read-distinct")
+		test_read_write2(fd, READ_AFTER_WRITE);
+	igt_subtest("fault-concurrent")
+		test_fault_concurrent(fd);
+	igt_subtest("read-no-prefault")
+		run_without_prefault(fd, test_read);
+	igt_subtest("write-no-prefault")
+		run_without_prefault(fd, test_write);
+	igt_subtest("write-gtt-no-prefault")
+		run_without_prefault(fd, test_write_gtt);
+	igt_subtest("write-cpu-read-wc")
+		test_write_cpu_read_wc(fd);
+	igt_subtest("write-gtt-read-wc")
+		test_write_gtt_read_wc(fd);
+
+	igt_fixture
+		close(fd);
+}
-- 
2.1.1


[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 2/3] igt/gem_gtt_speed: compare against WC mmaps
  2014-10-23 19:11       ` [PATCH 1/3] igt/gem_mmap_wc: Exercise mmap(wc) interface Chris Wilson
@ 2014-10-23 19:11         ` Chris Wilson
  2014-10-23 19:11         ` [PATCH 3/3] igt/gem_concurrent_blit: Exercise wc mappings Chris Wilson
  1 sibling, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 19:11 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=a, Size: 3497 bytes --]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/gem_gtt_speed.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/tests/gem_gtt_speed.c b/tests/gem_gtt_speed.c
index c858103..7e1e5f9 100644
--- a/tests/gem_gtt_speed.c
+++ b/tests/gem_gtt_speed.c
@@ -206,6 +206,27 @@ int main(int argc, char **argv)
 		igt_info("Time to read %dk through a GTT map:		%7.3fµs\n",
 			 size/1024, elapsed(&start, &end, loop));
 
+		if (gem_mmap__has_wc(fd)) {
+			gettimeofday(&start, NULL);
+			for (loop = 0; loop < 1000; loop++) {
+				uint32_t *base = gem_mmap__wc(fd, handle, size, PROT_READ | PROT_WRITE);
+				volatile uint32_t *ptr = base;
+				int x = 0;
+
+				for (i = 0; i < size/sizeof(*ptr); i++)
+					x += ptr[i];
+
+				/* force overtly clever gcc to actually compute x */
+				ptr[0] = x;
+
+				munmap(base, size);
+			}
+			gettimeofday(&end, NULL);
+			igt_info("Time to read %dk through a WC map:		%7.3fµs\n",
+					size/1024, elapsed(&start, &end, loop));
+		}
+
+
 		/* mmap write */
 		gettimeofday(&start, NULL);
 		for (loop = 0; loop < 1000; loop++) {
@@ -221,6 +242,23 @@ int main(int argc, char **argv)
 		igt_info("Time to write %dk through a GTT map:		%7.3fµs\n",
 			 size/1024, elapsed(&start, &end, loop));
 
+		if (gem_mmap__has_wc(fd)) {
+			/* mmap write */
+			gettimeofday(&start, NULL);
+			for (loop = 0; loop < 1000; loop++) {
+				uint32_t *base = gem_mmap__wc(fd, handle, size, PROT_READ | PROT_WRITE);
+				volatile uint32_t *ptr = base;
+
+				for (i = 0; i < size/sizeof(*ptr); i++)
+					ptr[i] = i;
+
+				munmap(base, size);
+			}
+			gettimeofday(&end, NULL);
+			igt_info("Time to write %dk through a WC map:		%7.3fµs\n",
+					size/1024, elapsed(&start, &end, loop));
+		}
+
 		/* mmap clear */
 		gettimeofday(&start, NULL);
 		for (loop = 0; loop < 1000; loop++) {
@@ -232,6 +270,19 @@ int main(int argc, char **argv)
 		igt_info("Time to clear %dk through a GTT map:		%7.3fµs\n",
 			 size/1024, elapsed(&start, &end, loop));
 
+		if (gem_mmap__has_wc(fd)) {
+			/* mmap clear */
+			gettimeofday(&start, NULL);
+			for (loop = 0; loop < 1000; loop++) {
+				uint32_t *base = gem_mmap__wc(fd, handle, size, PROT_READ | PROT_WRITE);
+				memset(base, 0, size);
+				munmap(base, size);
+			}
+			gettimeofday(&end, NULL);
+			igt_info("Time to clear %dk through a WC map:		%7.3fµs\n",
+					size/1024, elapsed(&start, &end, loop));
+		}
+
 		gettimeofday(&start, NULL);{
 			uint32_t *base = gem_mmap(fd, handle, size, PROT_READ | PROT_WRITE);
 			for (loop = 0; loop < 1000; loop++)
@@ -241,6 +292,17 @@ int main(int argc, char **argv)
 		igt_info("Time to clear %dk through a cached GTT map:	%7.3fµs\n",
 			 size/1024, elapsed(&start, &end, loop));
 
+		if (gem_mmap__has_wc(fd)) {
+			gettimeofday(&start, NULL);{
+				uint32_t *base = gem_mmap__wc(fd, handle, size, PROT_READ | PROT_WRITE);
+				for (loop = 0; loop < 1000; loop++)
+					memset(base, 0, size);
+				munmap(base, size);
+			} gettimeofday(&end, NULL);
+			igt_info("Time to clear %dk through a cached WC map:	%7.3fµs\n",
+					size/1024, elapsed(&start, &end, loop));
+		}
+
 		/* mmap read */
 		gettimeofday(&start, NULL);
 		for (loop = 0; loop < 1000; loop++) {
@@ -323,7 +385,6 @@ int main(int argc, char **argv)
 
 			size *= 4;
 		}
-
 	}
 
 	gem_close(fd, handle);
-- 
2.1.1


[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 3/3] igt/gem_concurrent_blit: Exercise wc mappings
  2014-10-23 19:11       ` [PATCH 1/3] igt/gem_mmap_wc: Exercise mmap(wc) interface Chris Wilson
  2014-10-23 19:11         ` [PATCH 2/3] igt/gem_gtt_speed: compare against WC mmaps Chris Wilson
@ 2014-10-23 19:11         ` Chris Wilson
  1 sibling, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 19:11 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 tests/gem_concurrent_blit.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tests/gem_concurrent_blit.c b/tests/gem_concurrent_blit.c
index 7d8d628..9a9169e 100644
--- a/tests/gem_concurrent_blit.c
+++ b/tests/gem_concurrent_blit.c
@@ -159,6 +159,18 @@ gttX_create_bo(drm_intel_bufmgr *bufmgr, int width, int height)
 }
 
 static drm_intel_bo *
+wc_create_bo(drm_intel_bufmgr *bufmgr, int width, int height)
+{
+	drm_intel_bo *bo;
+
+	igt_require_mmap_wc(fd);
+
+	bo = unmapped_create_bo(bufmgr, width, height);
+	bo->virtual = gem_mmap__wc(fd, bo->handle, bo->size, PROT_READ | PROT_WRITE);
+	return bo;
+}
+
+static drm_intel_bo *
 gpu_create_bo(drm_intel_bufmgr *bufmgr, int width, int height)
 {
 	return unmapped_create_bo(bufmgr, width, height);
@@ -287,6 +299,8 @@ struct access_mode access_modes[] = {
 		.create_bo = gtt_create_bo, .name = "gtt" },
 	{ .set_bo = gtt_set_bo, .cmp_bo = gtt_cmp_bo,
 		.create_bo = gttX_create_bo, .name = "gttX" },
+	{ .set_bo = gtt_set_bo, .cmp_bo = gtt_cmp_bo,
+		.create_bo = wc_create_bo, .name = "wc" },
 	{ .set_bo = gpu_set_bo, .cmp_bo = gpu_cmp_bo,
 		.create_bo = gpu_create_bo, .name = "gpu" },
 	{ .set_bo = gpu_set_bo, .cmp_bo = gpu_cmp_bo,
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs
  2014-10-23 19:05       ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
  2014-10-23 19:05         ` [PATCH 2/3] lib/core: Check for kernel error messages and WARN if any are found Chris Wilson
  2014-10-23 19:05         ` [PATCH 3/3] reg-read-8 Chris Wilson
@ 2014-10-23 19:12         ` Chris Wilson
  2 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-23 19:12 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

These are not the igt I was looking for.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support to create uncached user mapping for a Gem object
  2014-10-23 10:33   ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object akash.goel
                       ` (4 preceding siblings ...)
  2014-10-23 16:55     ` [PATCH] drm/i915: Support creation of unbound wc user mappings for objects Chris Wilson
@ 2014-10-24  8:40     ` Daniel Vetter
  2014-10-24  9:23       ` Chris Wilson
  5 siblings, 1 reply; 46+ messages in thread
From: Daniel Vetter @ 2014-10-24  8:40 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Thu, Oct 23, 2014 at 04:03:56PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> This patch provides support to create uncached virtual mappings for a Gem
> object. It intends to provide the same funtionality of 'mmap_gtt' interface
> without the constraints of a limited aperture space, but provided clients
> handles the linear to tile conversion on their own.
> This is for improving the CPU write operation performance, as with such
> mapping, writes are almost 50% faster than with mmap_gtt. Also it avoids the
> Cache flush after update from CPU side, when object is passed onto GPU, which
> will be the case if regular mmap ioctl interface is used.
> This type of mapping is specially useful in case of sub-region update,
> i.e. when only a portion of the object is to be updated.
> To ensure the cache coherency, before using this mapping, the GTT domain has
> been reused here. This provides the required Cache flush if the object is in
> CPU domain or synchronization against the concurrent rendering. Although the
> access through an uncached mmap shall automatically invalidate the cache lines,
> but this may not be true for non temporal write instructions and also not all
> pages of the object be updated at any given point of time through this mapping.
> Having a call to get_pages in set_to_gtt_domain function, as added by Chris in
> the earlier patch, would guarantee the clflush and so there will be no cache-
> lines holding the data for the object before it is accessed through this map.
> A new field 'flags' has been added to 'drm_i915_gem_mmap' structure, used in
> gem_mmap ioctl, which allows to convey the required mapping type as uncached.
> User can query the driver for the support of this mapping through the
> get_params. For that a new define I915_PARAM_HAS_UC_MMAP has been added.
> 
> Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
> Signed-off-by: Akash Goel <akash.goel@intel.com>

Yeah, I like this. And Chris has already gone ahead and added all the
functional tests. So here's what seems to still be missing:
- Reviewing Chris' patches. Akash, can you please do that? This kind of
  cross-review usually works best.
- Polish for Akash' patch according to Chris' review. Commit message also
  needs to gain the performance data. And we need to have a "does PAT
  work" test I think like Chris suggested.
- ioctl argument check tests for the cpu mmap ioctl. We currently have
  absotutely nothing here (one of the very few cases left). So Akash, can
  you please take the AR to write a new igt testcase which does all the
  usual argument checking test like invalid buffer, invalid flags?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support to create uncached user mapping for a Gem object
  2014-10-24  8:40     ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object Daniel Vetter
@ 2014-10-24  9:23       ` Chris Wilson
  0 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-24  9:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: akash.goel, intel-gfx

On Fri, Oct 24, 2014 at 10:40:10AM +0200, Daniel Vetter wrote:
> - Polish for Akash' patch according to Chris' review. Commit message also
>   needs to gain the performance data. And we need to have a "does PAT
>   work" test I think like Chris suggested.

Done, see v2 I sent. Commit message contains relative performance data,
and that holds true for snb,ivb,byt,bdw. pnv didn't show improvement
and I suspect neither will gen5-. Performance in igt/gem_gtt_speed and
x11perf show similar trends.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-10-23 16:55     ` [PATCH] drm/i915: Support creation of unbound wc user mappings for objects Chris Wilson
  2014-10-23 19:05       ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
  2014-10-23 19:11       ` [PATCH 1/3] igt/gem_mmap_wc: Exercise mmap(wc) interface Chris Wilson
@ 2014-10-25 11:51       ` akash.goel
  2014-10-25 12:45         ` Damien Lespiau
  2014-10-28 13:09         ` [PATCH v3] " akash.goel
  2014-11-05 12:48       ` [PATCH] drm/i915: Support creation of " Chris Wilson
  2014-12-17 12:46       ` Tvrtko Ursulin
  4 siblings, 2 replies; 46+ messages in thread
From: akash.goel @ 2014-10-25 11:51 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
patch. Through this interface Gfx clients can create write combining
virtual mappings of the Gem object. It will provide the same funtionality
of 'mmap_gtt' interface without the constraints of limited aperture space,
but provided clients handles the linear to tile conversion on their own.
This patch is intended for improving the CPU write operation performance,
as with such mapping, writes are almost 50% faster than with mmap_gtt.
Also it avoids the Cache flush after update from CPU side, when object is
passed on to GPU, which will be the case if regular mmap interface is used.
This type of mapping is specially useful in case of sub-region
update, i.e. when only a portion of the object is to be updated.
Also there is a support for the unsynchronized version of this interface
named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
mappings, but unsynchronized one, can be created of the Gem object.
To ensure the cache coherency, before using this mapping, the GTT domain has
been reused here. This provides the required Cache flush if the object is in
CPU domain or synchronization against the concurrent rendering

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.

v2: Aligned with the v2 of the corresponding kernel patch (Chris)

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 include/drm/i915_drm.h   |   9 ++++
 intel/intel_bufmgr.h     |   3 ++
 intel/intel_bufmgr_gem.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 136 insertions(+)

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 15dd01d..a91a1d0 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_MMAP_VERSION		 29
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -487,6 +488,14 @@ struct drm_i915_gem_mmap {
 	 * This is a fixed-size type for 32/64 compatibility.
 	 */
 	__u64 addr_ptr;
+
+	/**
+	 * Flags for extended behaviour.
+	 *
+	 * Added in version 2.
+	*/
+	__u64 flags;
+#define I915_MMAP_WC 0x1
 };
 
 struct drm_i915_gem_mmap_gtt {
diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index be83a56..bda4115 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -177,6 +177,9 @@ void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
 int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
 int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
 int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc(drm_intel_bo *bo);
+int drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo);
 
 int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
 void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index f2f4fea..95c588f 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -184,6 +184,8 @@ struct _drm_intel_bo_gem {
 	int reloc_count;
 	/** Mapped address for the buffer, saved across map/unmap cycles */
 	void *mem_virtual;
+	/** Uncached Mapped address for the buffer, saved across map/unmap cycles */
+	void *mem_wc_virtual;
 	/** GTT virtual address for the buffer, saved across map/unmap cycles */
 	void *gtt_virtual;
 	/**
@@ -1267,6 +1269,121 @@ static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
+static int
+map_wc(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	/* Get a mapping of the buffer if we haven't before. */
+	if (bo_gem->mem_wc_virtual == NULL) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		VG_CLEAR(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		/* To indicate the uncached virtual mapping to KMD */
+		mmap_arg.flags = I915_MMAP_WC;
+		mmap_arg.offset = 0;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			pthread_mutex_unlock(&bufmgr_gem->lock);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+
+	bo->virtual = bo_gem->mem_wc_virtual;
+
+	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_wc_virtual);
+
+	return 0;
+}
+
+/* To be used in a similar way to mmap_gtt */
+drm_public int
+drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
+	/* Now move it to the GTT domain so that the GPU and CPU
+	 * caches are flushed and the GPU isn't actively using the
+	 * buffer.
+	 *
+	 * The domain change is done even for the objects which
+	 * are not bounded. For them first the pages are acquired,
+	 * before the domain change.
+	 */
+	VG_CLEAR(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+drm_public int
+drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+#ifdef HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+#endif
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret == 0) {
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->gtt_virtual, bo->size));
+	}
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
 static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
@@ -1293,6 +1410,7 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 
 		VG_CLEAR(mmap_arg);
 		mmap_arg.handle = bo_gem->gem_handle;
+		mmap_arg.flags = 0;
 		mmap_arg.offset = 0;
 		mmap_arg.size = bo->size;
 		ret = drmIoctl(bufmgr_gem->fd,
@@ -1553,6 +1671,12 @@ static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
 }
 
 drm_public int
+drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
+{
+	return drm_intel_gem_bo_unmap(bo);
+}
+
+drm_public int
 drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
 {
 	return drm_intel_gem_bo_unmap(bo);
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-10-25 11:51       ` [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects akash.goel
@ 2014-10-25 12:45         ` Damien Lespiau
  2014-10-26  8:36           ` Akash Goel
  2014-10-28 13:09         ` [PATCH v3] " akash.goel
  1 sibling, 1 reply; 46+ messages in thread
From: Damien Lespiau @ 2014-10-25 12:45 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Sat, Oct 25, 2014 at 05:21:53PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
> patch. Through this interface Gfx clients can create write combining
> virtual mappings of the Gem object. It will provide the same funtionality
> of 'mmap_gtt' interface without the constraints of limited aperture space,
> but provided clients handles the linear to tile conversion on their own.
> This patch is intended for improving the CPU write operation performance,
> as with such mapping, writes are almost 50% faster than with mmap_gtt.
> Also it avoids the Cache flush after update from CPU side, when object is
> passed on to GPU, which will be the case if regular mmap interface is used.
> This type of mapping is specially useful in case of sub-region
> update, i.e. when only a portion of the object is to be updated.
> Also there is a support for the unsynchronized version of this interface
> named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
> mappings, but unsynchronized one, can be created of the Gem object.
> To ensure the cache coherency, before using this mapping, the GTT domain has
> been reused here. This provides the required Cache flush if the object is in
> CPU domain or synchronization against the concurrent rendering
> 
> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> extended with a new flags field (defaulting to 0 for existent users). In
> order for userspace to detect the extended ioctl, a new parameter
> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
> 
> v2: Aligned with the v2 of the corresponding kernel patch (Chris)
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

>From a quick glance at the patch:

It looks like you copy/pasted the map() implementation and changed a few
things here and there instead of adding flag to drm_intel_gem_bo_map()
and reusing the current code.

Can we expose another version of map that takes flags (_WC,
_UNSYNCHRONIZED, ...) instead of starting to have every single
combination possible?

Do we really need a separate mem_wc_virtual? Using _map() or _map_wc()
in an exclusively way makes some sense to me.

With the introduction of mem_wc_virtual, you left aside the vma cache
handling and so we'll never end up unmapping it, drm_intel_gem_bo_free()
doesn't unmap it either, ...

I'd just expect users to use _unmap().

-- 
Damien

> ---
>  include/drm/i915_drm.h   |   9 ++++
>  intel/intel_bufmgr.h     |   3 ++
>  intel/intel_bufmgr_gem.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 136 insertions(+)
> 
> diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
> index 15dd01d..a91a1d0 100644
> --- a/include/drm/i915_drm.h
> +++ b/include/drm/i915_drm.h
> @@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
>  #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
>  #define I915_PARAM_HAS_WT     	 	 27
>  #define I915_PARAM_CMD_PARSER_VERSION	 28
> +#define I915_PARAM_MMAP_VERSION		 29
>  
>  typedef struct drm_i915_getparam {
>  	int param;
> @@ -487,6 +488,14 @@ struct drm_i915_gem_mmap {
>  	 * This is a fixed-size type for 32/64 compatibility.
>  	 */
>  	__u64 addr_ptr;
> +
> +	/**
> +	 * Flags for extended behaviour.
> +	 *
> +	 * Added in version 2.
> +	*/
> +	__u64 flags;
> +#define I915_MMAP_WC 0x1
>  };
>  
>  struct drm_i915_gem_mmap_gtt {
> diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
> index be83a56..bda4115 100644
> --- a/intel/intel_bufmgr.h
> +++ b/intel/intel_bufmgr.h
> @@ -177,6 +177,9 @@ void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
>  int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
>  int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
>  int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
> +int drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo);
> +int drm_intel_gem_bo_map_wc(drm_intel_bo *bo);
> +int drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo);
>  
>  int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
>  void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
> diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
> index f2f4fea..95c588f 100644
> --- a/intel/intel_bufmgr_gem.c
> +++ b/intel/intel_bufmgr_gem.c
> @@ -184,6 +184,8 @@ struct _drm_intel_bo_gem {
>  	int reloc_count;
>  	/** Mapped address for the buffer, saved across map/unmap cycles */
>  	void *mem_virtual;
> +	/** Uncached Mapped address for the buffer, saved across map/unmap cycles */
> +	void *mem_wc_virtual;
>  	/** GTT virtual address for the buffer, saved across map/unmap cycles */
>  	void *gtt_virtual;
>  	/**
> @@ -1267,6 +1269,121 @@ static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
>  	}
>  }
>  
> +static int
> +map_wc(drm_intel_bo *bo)
> +{
> +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> +	int ret;
> +
> +	if (bo_gem->is_userptr)
> +		return -EINVAL;
> +
> +	if (bo_gem->map_count++ == 0)
> +		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
> +
> +	/* Get a mapping of the buffer if we haven't before. */
> +	if (bo_gem->mem_wc_virtual == NULL) {
> +		struct drm_i915_gem_mmap mmap_arg;
> +
> +		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
> +		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
> +
> +		VG_CLEAR(mmap_arg);
> +		mmap_arg.handle = bo_gem->gem_handle;
> +		/* To indicate the uncached virtual mapping to KMD */
> +		mmap_arg.flags = I915_MMAP_WC;
> +		mmap_arg.offset = 0;
> +		mmap_arg.size = bo->size;
> +		ret = drmIoctl(bufmgr_gem->fd,
> +			       DRM_IOCTL_I915_GEM_MMAP,
> +			       &mmap_arg);
> +		if (ret != 0) {
> +			ret = -errno;
> +			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
> +			    __FILE__, __LINE__, bo_gem->gem_handle,
> +			    bo_gem->name, strerror(errno));
> +			if (--bo_gem->map_count == 0)
> +				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
> +			pthread_mutex_unlock(&bufmgr_gem->lock);
> +			return ret;
> +		}
> +		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
> +		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
> +	}
> +
> +	bo->virtual = bo_gem->mem_wc_virtual;
> +
> +	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
> +	    bo_gem->mem_wc_virtual);
> +
> +	return 0;
> +}
> +
> +/* To be used in a similar way to mmap_gtt */
> +drm_public int
> +drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
> +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> +	struct drm_i915_gem_set_domain set_domain;
> +	int ret;
> +
> +	pthread_mutex_lock(&bufmgr_gem->lock);
> +
> +	ret = map_wc(bo);
> +	if (ret) {
> +		pthread_mutex_unlock(&bufmgr_gem->lock);
> +		return ret;
> +	}
> +
> +	/* Now move it to the GTT domain so that the GPU and CPU
> +	 * caches are flushed and the GPU isn't actively using the
> +	 * buffer.
> +	 *
> +	 * The domain change is done even for the objects which
> +	 * are not bounded. For them first the pages are acquired,
> +	 * before the domain change.
> +	 */
> +	VG_CLEAR(set_domain);
> +	set_domain.handle = bo_gem->gem_handle;
> +	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
> +	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
> +	ret = drmIoctl(bufmgr_gem->fd,
> +		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
> +		       &set_domain);
> +	if (ret != 0) {
> +		DBG("%s:%d: Error setting domain %d: %s\n",
> +		    __FILE__, __LINE__, bo_gem->gem_handle,
> +		    strerror(errno));
> +	}
> +	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
> +	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_virtual, bo->size));
> +	pthread_mutex_unlock(&bufmgr_gem->lock);
> +
> +	return 0;
> +}
> +
> +drm_public int
> +drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
> +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> +#ifdef HAVE_VALGRIND
> +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> +#endif
> +	int ret;
> +
> +	pthread_mutex_lock(&bufmgr_gem->lock);
> +
> +	ret = map_wc(bo);
> +	if (ret == 0) {
> +		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
> +		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->gtt_virtual, bo->size));
> +	}
> +
> +	pthread_mutex_unlock(&bufmgr_gem->lock);
> +
> +	return ret;
> +}
> +
>  static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
>  {
>  	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> @@ -1293,6 +1410,7 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
>  
>  		VG_CLEAR(mmap_arg);
>  		mmap_arg.handle = bo_gem->gem_handle;
> +		mmap_arg.flags = 0;
>  		mmap_arg.offset = 0;
>  		mmap_arg.size = bo->size;
>  		ret = drmIoctl(bufmgr_gem->fd,
> @@ -1553,6 +1671,12 @@ static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
>  }
>  
>  drm_public int
> +drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
> +{
> +	return drm_intel_gem_bo_unmap(bo);
> +}
> +
> +drm_public int
>  drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
>  {
>  	return drm_intel_gem_bo_unmap(bo);
> -- 
> 1.9.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-10-25 12:45         ` Damien Lespiau
@ 2014-10-26  8:36           ` Akash Goel
  2014-10-26  8:41             ` Chris Wilson
  0 siblings, 1 reply; 46+ messages in thread
From: Akash Goel @ 2014-10-26  8:36 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: intel-gfx

On Sat, 2014-10-25 at 13:45 +0100, Damien Lespiau wrote:
> On Sat, Oct 25, 2014 at 05:21:53PM +0530, akash.goel@intel.com wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
> > patch. Through this interface Gfx clients can create write combining
> > virtual mappings of the Gem object. It will provide the same funtionality
> > of 'mmap_gtt' interface without the constraints of limited aperture space,
> > but provided clients handles the linear to tile conversion on their own.
> > This patch is intended for improving the CPU write operation performance,
> > as with such mapping, writes are almost 50% faster than with mmap_gtt.
> > Also it avoids the Cache flush after update from CPU side, when object is
> > passed on to GPU, which will be the case if regular mmap interface is used.
> > This type of mapping is specially useful in case of sub-region
> > update, i.e. when only a portion of the object is to be updated.
> > Also there is a support for the unsynchronized version of this interface
> > named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
> > mappings, but unsynchronized one, can be created of the Gem object.
> > To ensure the cache coherency, before using this mapping, the GTT domain has
> > been reused here. This provides the required Cache flush if the object is in
> > CPU domain or synchronization against the concurrent rendering
> > 
> > The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> > extended with a new flags field (defaulting to 0 for existent users). In
> > order for userspace to detect the extended ioctl, a new parameter
> > I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
> > 
> > v2: Aligned with the v2 of the corresponding kernel patch (Chris)
> > 
> > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> From a quick glance at the patch:
> 
> It looks like you copy/pasted the map() implementation and changed a few
> things here and there instead of adding flag to drm_intel_gem_bo_map()
> and reusing the current code.
> 
> Can we expose another version of map that takes flags (_WC,
> _UNSYNCHRONIZED, ...) instead of starting to have every single
> combination possible?
> 

Thanks for the review. Yes its almost a copy-paste of the 'mmap_gtt'
implementation.
Yes adding a new flag parameter could be an alternative.

> Do we really need a separate mem_wc_virtual? Using _map() or _map_wc()
> in an exclusively way makes some sense to me.
> 

Initially made _map() & _map_wc() exclusive to each other. Had a
discussion with Chris, he suggested to maintain all 3 mappings i.e allow
them to co-exist.  

> With the introduction of mem_wc_virtual, you left aside the vma cache
> handling and so we'll never end up unmapping it, drm_intel_gem_bo_free()
> doesn't unmap it either, ...
> 
Sorry I completely missed this. Thanks for spotting this. Will do the
needful. 

Best Regards
Akash

> I'd just expect users to use _unmap().
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-10-26  8:36           ` Akash Goel
@ 2014-10-26  8:41             ` Chris Wilson
  0 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-10-26  8:41 UTC (permalink / raw)
  To: Akash Goel; +Cc: intel-gfx

On Sun, Oct 26, 2014 at 02:06:57PM +0530, Akash Goel wrote:
> On Sat, 2014-10-25 at 13:45 +0100, Damien Lespiau wrote:
> > On Sat, Oct 25, 2014 at 05:21:53PM +0530, akash.goel@intel.com wrote:
> > > From: Akash Goel <akash.goel@intel.com>
> > > 
> > > A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
> > > patch. Through this interface Gfx clients can create write combining
> > > virtual mappings of the Gem object. It will provide the same funtionality
> > > of 'mmap_gtt' interface without the constraints of limited aperture space,
> > > but provided clients handles the linear to tile conversion on their own.
> > > This patch is intended for improving the CPU write operation performance,
> > > as with such mapping, writes are almost 50% faster than with mmap_gtt.
> > > Also it avoids the Cache flush after update from CPU side, when object is
> > > passed on to GPU, which will be the case if regular mmap interface is used.
> > > This type of mapping is specially useful in case of sub-region
> > > update, i.e. when only a portion of the object is to be updated.
> > > Also there is a support for the unsynchronized version of this interface
> > > named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
> > > mappings, but unsynchronized one, can be created of the Gem object.
> > > To ensure the cache coherency, before using this mapping, the GTT domain has
> > > been reused here. This provides the required Cache flush if the object is in
> > > CPU domain or synchronization against the concurrent rendering
> > > 
> > > The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> > > extended with a new flags field (defaulting to 0 for existent users). In
> > > order for userspace to detect the extended ioctl, a new parameter
> > > I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
> > > 
> > > v2: Aligned with the v2 of the corresponding kernel patch (Chris)
> > > 
> > > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I don't get s-o-b for header contributions ;-)

> > From a quick glance at the patch:
> > 
> > It looks like you copy/pasted the map() implementation and changed a few
> > things here and there instead of adding flag to drm_intel_gem_bo_map()
> > and reusing the current code.
> > 
> > Can we expose another version of map that takes flags (_WC,
> > _UNSYNCHRONIZED, ...) instead of starting to have every single
> > combination possible?
> > 
> 
> Thanks for the review. Yes its almost a copy-paste of the 'mmap_gtt'
> implementation.
> Yes adding a new flag parameter could be an alternative.
> 
> > Do we really need a separate mem_wc_virtual? Using _map() or _map_wc()
> > in an exclusively way makes some sense to me.
> > 
> 
> Initially made _map() & _map_wc() exclusive to each other. Had a
> discussion with Chris, he suggested to maintain all 3 mappings i.e allow
> them to co-exist.  

For example, in the ddx we may want to use either GTT or WC mapping
depending on the actual path. WC if we know we can use manual detiling,
otherwise we use GTT and benefit from the linear view (or we have to
wrap all tiled access with much slower accessors). We may even mix
WC/CPU mappings - e.g. first write through the CPU mapping whilst we
know the object is still in the CPU domain, and then switch to updating
the object when it is on the GPU using WC. (Again see examples in the
ddx). I can foresee that we can benefit from keeping each of the
mmappings cached.

> > With the introduction of mem_wc_virtual, you left aside the vma cache
> > handling and so we'll never end up unmapping it, drm_intel_gem_bo_free()
> > doesn't unmap it either, ...
> > 
> Sorry I completely missed this. Thanks for spotting this. Will do the
> needful. 

Also, before trying to use the mmap(wc), you should check that it
exists! Otherwise you just return an ordinary mmap(wb) to the user and
corruption ensues.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-10-25 11:51       ` [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects akash.goel
  2014-10-25 12:45         ` Damien Lespiau
@ 2014-10-28 13:09         ` akash.goel
  2014-10-28 16:11           ` Damien Lespiau
  2014-12-03 14:13           ` Damien Lespiau
  1 sibling, 2 replies; 46+ messages in thread
From: akash.goel @ 2014-10-28 13:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
patch. Through this interface Gfx clients can create write combining
virtual mappings of the Gem object. It will provide the same funtionality
of 'mmap_gtt' interface without the constraints of limited aperture space,
but provided clients handles the linear to tile conversion on their own.
This patch is intended for improving the CPU write operation performance,
as with such mapping, writes are almost 50% faster than with mmap_gtt.
Also it avoids the Cache flush after update from CPU side, when object is
passed on to GPU, which will be the case if regular mmap interface is used.
This type of mapping is specially useful in case of sub-region
update, i.e. when only a portion of the object is to be updated.
Also there is a support for the unsynchronized version of this interface
named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
mappings, but unsynchronized one, can be created of the Gem object.
To ensure the cache coherency, before using this mapping, the GTT domain has
been reused here. This provides the required Cache flush if the object is in
CPU domain or synchronization against the concurrent rendering

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.

v2: Aligned with the v2 of the corresponding kernel patch (Chris)
v3: Added the unmap calls for the wc mapping (Damien)
    Added the param feature check before creating the wc mapping & reduced
    the vma limit (Chris)

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 include/drm/i915_drm.h   |   9 +++
 intel/intel_bufmgr.h     |   3 +
 intel/intel_bufmgr_gem.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 162 insertions(+)

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 15dd01d..a91a1d0 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_MMAP_VERSION		 29
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -487,6 +488,14 @@ struct drm_i915_gem_mmap {
 	 * This is a fixed-size type for 32/64 compatibility.
 	 */
 	__u64 addr_ptr;
+
+	/**
+	 * Flags for extended behaviour.
+	 *
+	 * Added in version 2.
+	*/
+	__u64 flags;
+#define I915_MMAP_WC 0x1
 };
 
 struct drm_i915_gem_mmap_gtt {
diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index be83a56..bda4115 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -177,6 +177,9 @@ void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
 int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
 int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
 int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc(drm_intel_bo *bo);
+int drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo);
 
 int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
 void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index f2f4fea..7b53a23 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -130,6 +130,7 @@ typedef struct _drm_intel_bufmgr_gem {
 	unsigned int bo_reuse : 1;
 	unsigned int no_exec : 1;
 	unsigned int has_vebox : 1;
+	unsigned int has_ext_mmap : 1;
 	bool fenced_relocs;
 
 	char *aub_filename;
@@ -184,6 +185,8 @@ struct _drm_intel_bo_gem {
 	int reloc_count;
 	/** Mapped address for the buffer, saved across map/unmap cycles */
 	void *mem_virtual;
+	/** Uncached Mapped address for the buffer, saved across map/unmap cycles */
+	void *mem_wc_virtual;
 	/** GTT virtual address for the buffer, saved across map/unmap cycles */
 	void *gtt_virtual;
 	/**
@@ -1057,6 +1060,10 @@ drm_intel_gem_bo_free(drm_intel_bo *bo)
 		drm_munmap(bo_gem->gtt_virtual, bo_gem->bo.size);
 		bufmgr_gem->vma_count--;
 	}
+	if (bo_gem->mem_wc_virtual) {
+		drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+		bufmgr_gem->vma_count--;
+	}
 
 	/* Close this object */
 	VG_CLEAR(close);
@@ -1081,6 +1088,9 @@ drm_intel_gem_bo_mark_mmaps_incoherent(drm_intel_bo *bo)
 
 	if (bo_gem->gtt_virtual)
 		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->gtt_virtual, bo->size);
+
+	if (bo_gem->mem_wc_virtual)
+		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->mem_wc_virtual, bo->size);
 #endif
 }
 
@@ -1126,6 +1136,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 
 	/* We may need to evict a few entries in order to create new mmaps */
 	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
+	if (bufmgr_gem->has_ext_mmap)
+		limit -= bufmgr_gem->vma_open;
 	if (limit < 0)
 		limit = 0;
 
@@ -1148,6 +1160,11 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 			bo_gem->gtt_virtual = NULL;
 			bufmgr_gem->vma_count--;
 		}
+		if (bo_gem->mem_wc_virtual) {
+			drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+			bo_gem->mem_wc_virtual = NULL;
+			bufmgr_gem->vma_count--;
+		}
 	}
 }
 
@@ -1160,6 +1177,8 @@ static void drm_intel_gem_bo_close_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count++;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count++;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count++;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1172,6 +1191,8 @@ static void drm_intel_gem_bo_open_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count--;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count--;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count--;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1267,6 +1288,124 @@ static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
+static int
+map_wc(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	if (!bufmgr_gem->has_ext_mmap)
+		return -EINVAL;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	/* Get a mapping of the buffer if we haven't before. */
+	if (bo_gem->mem_wc_virtual == NULL) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		VG_CLEAR(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		/* To indicate the uncached virtual mapping to KMD */
+		mmap_arg.flags = I915_MMAP_WC;
+		mmap_arg.offset = 0;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			pthread_mutex_unlock(&bufmgr_gem->lock);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+
+	bo->virtual = bo_gem->mem_wc_virtual;
+
+	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_wc_virtual);
+
+	return 0;
+}
+
+/* To be used in a similar way to mmap_gtt */
+drm_public int
+drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
+	/* Now move it to the GTT domain so that the GPU and CPU
+	 * caches are flushed and the GPU isn't actively using the
+	 * buffer.
+	 *
+	 * The domain change is done even for the objects which
+	 * are not bounded. For them first the pages are acquired,
+	 * before the domain change.
+	 */
+	VG_CLEAR(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+drm_public int
+drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+#ifdef HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+#endif
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret == 0) {
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	}
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
 static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
@@ -1293,6 +1432,7 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 
 		VG_CLEAR(mmap_arg);
 		mmap_arg.handle = bo_gem->gem_handle;
+		mmap_arg.flags = 0;
 		mmap_arg.offset = 0;
 		mmap_arg.size = bo->size;
 		ret = drmIoctl(bufmgr_gem->fd,
@@ -1553,6 +1693,12 @@ static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
 }
 
 drm_public int
+drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
+{
+	return drm_intel_gem_bo_unmap(bo);
+}
+
+drm_public int
 drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
 {
 	return drm_intel_gem_bo_unmap(bo);
@@ -3538,6 +3684,10 @@ drm_intel_bufmgr_gem_init(int fd, int batch_size)
 	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
 	bufmgr_gem->has_vebox = (ret == 0) & (*gp.value > 0);
 
+	gp.param = I915_PARAM_MMAP_VERSION;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_ext_mmap = (ret == 0) & (*gp.value > 0);
+
 	if (bufmgr_gem->gen < 4) {
 		gp.param = I915_PARAM_NUM_FENCES_AVAIL;
 		gp.value = &bufmgr_gem->available_fences;
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-10-28 13:09         ` [PATCH v3] " akash.goel
@ 2014-10-28 16:11           ` Damien Lespiau
  2014-12-03 14:13           ` Damien Lespiau
  1 sibling, 0 replies; 46+ messages in thread
From: Damien Lespiau @ 2014-10-28 16:11 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Tue, Oct 28, 2014 at 06:39:27PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
> patch. Through this interface Gfx clients can create write combining
> virtual mappings of the Gem object. It will provide the same funtionality
> of 'mmap_gtt' interface without the constraints of limited aperture space,
> but provided clients handles the linear to tile conversion on their own.
> This patch is intended for improving the CPU write operation performance,
> as with such mapping, writes are almost 50% faster than with mmap_gtt.
> Also it avoids the Cache flush after update from CPU side, when object is
> passed on to GPU, which will be the case if regular mmap interface is used.
> This type of mapping is specially useful in case of sub-region
> update, i.e. when only a portion of the object is to be updated.
> Also there is a support for the unsynchronized version of this interface
> named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
> mappings, but unsynchronized one, can be created of the Gem object.
> To ensure the cache coherency, before using this mapping, the GTT domain has
> been reused here. This provides the required Cache flush if the object is in
> CPU domain or synchronization against the concurrent rendering
> 
> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> extended with a new flags field (defaulting to 0 for existent users). In
> order for userspace to detect the extended ioctl, a new parameter
> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
> 
> v2: Aligned with the v2 of the corresponding kernel patch (Chris)
> v3: Added the unmap calls for the wc mapping (Damien)
>     Added the param feature check before creating the wc mapping & reduced
>     the vma limit (Chris)
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I'm still not super convinced by the full copy and paste of the function
where all I can see changing is the pointer storing the virtual address
and the flags. Something I'm missing? What do others think (I'm not
the maintainer of anything in libdrm).

-- 
Damien
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support creation of unbound wc user mappings for objects
  2014-10-23 16:55     ` [PATCH] drm/i915: Support creation of unbound wc user mappings for objects Chris Wilson
                         ` (2 preceding siblings ...)
  2014-10-25 11:51       ` [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects akash.goel
@ 2014-11-05 12:48       ` Chris Wilson
  2014-11-06 14:50         ` Daniel Vetter
  2014-12-17 12:46       ` Tvrtko Ursulin
  4 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2014-11-05 12:48 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Akash Goel

On Thu, Oct 23, 2014 at 05:55:47PM +0100, Chris Wilson wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> This patch provides support to create write-combining virtual mappings of
> GEM object. It intends to provide the same funtionality of 'mmap_gtt'
> interface without the constraints and contention of a limited aperture
> space, but requires clients handles the linear to tile conversion on their
> own. This is for improving the CPU write operation performance, as with such
> mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
> to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
> flush after update from CPU side, when object is passed onto GPU.  This
> type of mapping is specially useful in case of sub-region update,
> i.e. when only a portion of the object is to be updated. Using a CPU mmap
> in such cases would normally incur a clflush of the whole object, and
> using a GTT mmapping would likely require eviction of an active object or
> fence and thus stall. The write-combining CPU mmap avoids both.
> 
> To ensure the cache coherency, before using this mapping, the GTT domain
> has been reused here. This provides the required cache flush if the object
> is in CPU domain or synchronization against the concurrent rendering.
> Although the access through an uncached mmap should automatically
> invalidate the cache lines, this may not be true for non-temporal write
> instructions and also not all pages of the object may be updated at any
> given point of time through this mapping.  Having a call to get_pages in
> set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
> Broaden application of set-domain(GTT)', would guarantee the clflush and
> so there will be no cachelines holding the data for the object before it
> is accessed through this map.
> 
> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> extended with a new flags field (defaulting to 0 for existent users). In
> order for userspace to detect the extended ioctl, a new parameter
> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
> 
> v2: Fix error handling, invalid flag detection, renaming (ickle)
> 
> The new mmapping is exercised by igt/gem_mmap_wc,
> igt/gem_concurrent_blit and igt/gem_gtt_speed.
> 
> Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Daniel, since both Akash and myself developed this, we need a third body
for the review. Tag.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Broaden application of set-domain(GTT)
  2014-10-14 12:47 ` Chris Wilson
  2014-10-23 10:33   ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object akash.goel
@ 2014-11-05 12:49   ` Chris Wilson
  1 sibling, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-11-05 12:49 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Akash Goel

On Tue, Oct 14, 2014 at 01:47:36PM +0100, Chris Wilson wrote:
> Previously, this was restricted to only operate on bound objects - to
> make pointer access through the GTT to the object coherent with writes
> to and from the GPU. A second usecase is drm_intel_bo_wait_rendering()
> which at present does not function unless the object also happens to
> be bound into the GGTT (on current systems that is becoming increasingly
> rare, especially for the typical requests from mesa). A third usecase is
> a future patch wishing to extend the coverage of the GTT domain to
> include objects not bound into the GGTT but still in its coherent cache
> domain. For the latter pair of requests, we need to operate on the
> object regardless of its bind state.
> 
> v2: After discussion with Akash, we came to the conclusion that the
> get-pages was required in order for accurate domain tracking in the
> corner cases (like the shrinker) and also useful for ensuring memory
> coherency with earlier cached CPU mmaps in case userspace uses exotic
> cache bypass (non-temporal) instructions.
> 
> Cc: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Akash, are you happy with this? Could you bless this with an r-b?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support creation of unbound wc user mappings for objects
  2014-11-05 12:48       ` [PATCH] drm/i915: Support creation of " Chris Wilson
@ 2014-11-06 14:50         ` Daniel Vetter
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Vetter @ 2014-11-06 14:50 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, Akash Goel, Daniel Vetter

On Wed, Nov 05, 2014 at 12:48:35PM +0000, Chris Wilson wrote:
> On Thu, Oct 23, 2014 at 05:55:47PM +0100, Chris Wilson wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > This patch provides support to create write-combining virtual mappings of
> > GEM object. It intends to provide the same funtionality of 'mmap_gtt'
> > interface without the constraints and contention of a limited aperture
> > space, but requires clients handles the linear to tile conversion on their
> > own. This is for improving the CPU write operation performance, as with such
> > mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
> > to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
> > flush after update from CPU side, when object is passed onto GPU.  This
> > type of mapping is specially useful in case of sub-region update,
> > i.e. when only a portion of the object is to be updated. Using a CPU mmap
> > in such cases would normally incur a clflush of the whole object, and
> > using a GTT mmapping would likely require eviction of an active object or
> > fence and thus stall. The write-combining CPU mmap avoids both.
> > 
> > To ensure the cache coherency, before using this mapping, the GTT domain
> > has been reused here. This provides the required cache flush if the object
> > is in CPU domain or synchronization against the concurrent rendering.
> > Although the access through an uncached mmap should automatically
> > invalidate the cache lines, this may not be true for non-temporal write
> > instructions and also not all pages of the object may be updated at any
> > given point of time through this mapping.  Having a call to get_pages in
> > set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
> > Broaden application of set-domain(GTT)', would guarantee the clflush and
> > so there will be no cachelines holding the data for the object before it
> > is accessed through this map.
> > 
> > The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> > extended with a new flags field (defaulting to 0 for existent users). In
> > order for userspace to detect the extended ioctl, a new parameter
> > I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
> > 
> > v2: Fix error handling, invalid flag detection, renaming (ickle)
> > 
> > The new mmapping is exercised by igt/gem_mmap_wc,
> > igt/gem_concurrent_blit and igt/gem_gtt_speed.
> > 
> > Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
> > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Daniel, since both Akash and myself developed this, we need a third body
> for the review. Tag.

Oh, patch itself looks good imo. It's the testcases that need review
(which Akash could do since you've written them), plus the missing bits
(like nasty invalid args checking for the mmap ioctl), which also Akash or
you could supply. Once that's done (plus the review from Akash on your
patch) I'll vavuum up all the kernel parts.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-10-28 13:09         ` [PATCH v3] " akash.goel
  2014-10-28 16:11           ` Damien Lespiau
@ 2014-12-03 14:13           ` Damien Lespiau
  2014-12-03 18:18             ` Chris Wilson
  2014-12-09 10:54             ` Chris Wilson
  1 sibling, 2 replies; 46+ messages in thread
From: Damien Lespiau @ 2014-12-03 14:13 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Tue, Oct 28, 2014 at 06:39:27PM +0530, akash.goel@intel.com wrote:
> @@ -1126,6 +1136,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
>  
>  	/* We may need to evict a few entries in order to create new mmaps */
>  	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
> +	if (bufmgr_gem->has_ext_mmap)
> +		limit -= bufmgr_gem->vma_open;
>  	if (limit < 0)
>  		limit = 0;
>  

Aren't we being overly aggressive in purging the vma cache with this new
code? I guess it depends on the workload but I don't think the WC
mapping is likely to be used in addition to the other two often enough
to warrant a max - 3 * open;

> +static int
> +map_wc(drm_intel_bo *bo)
> +{
> +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> +	int ret;
> +
> +	if (bo_gem->is_userptr)
> +		return -EINVAL;
> +
> +	if (!bufmgr_gem->has_ext_mmap)
> +		return -EINVAL;
> +
> +	if (bo_gem->map_count++ == 0)
> +		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
> +
> +	/* Get a mapping of the buffer if we haven't before. */
> +	if (bo_gem->mem_wc_virtual == NULL) {
> +		struct drm_i915_gem_mmap mmap_arg;
> +
> +		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
> +		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
> +
> +		VG_CLEAR(mmap_arg);
> +		mmap_arg.handle = bo_gem->gem_handle;
> +		/* To indicate the uncached virtual mapping to KMD */
> +		mmap_arg.flags = I915_MMAP_WC;
> +		mmap_arg.offset = 0;
> +		mmap_arg.size = bo->size;
> +		ret = drmIoctl(bufmgr_gem->fd,
> +			       DRM_IOCTL_I915_GEM_MMAP,
> +			       &mmap_arg);
> +		if (ret != 0) {
> +			ret = -errno;
> +			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
> +			    __FILE__, __LINE__, bo_gem->gem_handle,
> +			    bo_gem->name, strerror(errno));
> +			if (--bo_gem->map_count == 0)
> +				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
> +			pthread_mutex_unlock(&bufmgr_gem->lock);


There's an unlock() here (and no lock()!), but the locking is hanled by
the higher level functions.

> +			return ret;
> +		}
> +		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
> +		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
> +	}
> +
> +	bo->virtual = bo_gem->mem_wc_virtual;
> +
> +	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
> +	    bo_gem->mem_wc_virtual);
> +
> +	return 0;
> +}
> +
> +/* To be used in a similar way to mmap_gtt */
> +drm_public int
> +drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
> +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> +	struct drm_i915_gem_set_domain set_domain;
> +	int ret;
> +
> +	pthread_mutex_lock(&bufmgr_gem->lock);
> +
> +	ret = map_wc(bo);
> +	if (ret) {
> +		pthread_mutex_unlock(&bufmgr_gem->lock);
> +		return ret;
> +	}
> +
> +	/* Now move it to the GTT domain so that the GPU and CPU
> +	 * caches are flushed and the GPU isn't actively using the
> +	 * buffer.
> +	 *
> +	 * The domain change is done even for the objects which
> +	 * are not bounded. For them first the pages are acquired,
> +	 * before the domain change.
> +	 */
> +	VG_CLEAR(set_domain);
> +	set_domain.handle = bo_gem->gem_handle;
> +	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
> +	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
> +	ret = drmIoctl(bufmgr_gem->fd,
> +		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
> +		       &set_domain);
> +	if (ret != 0) {
> +		DBG("%s:%d: Error setting domain %d: %s\n",
> +		    __FILE__, __LINE__, bo_gem->gem_handle,
> +		    strerror(errno));
> +	}

Why move the buffer to the GTT domain and not the CPU domain here?

> +	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
> +	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
> +	pthread_mutex_unlock(&bufmgr_gem->lock);
> +
> +	return 0;
> +}
> +
> +drm_public int
> +drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
> +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> +#ifdef HAVE_VALGRIND
> +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> +#endif
> +	int ret;
> +
> +	pthread_mutex_lock(&bufmgr_gem->lock);
> +
> +	ret = map_wc(bo);
> +	if (ret == 0) {
> +		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
> +		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
> +	}
> +
> +	pthread_mutex_unlock(&bufmgr_gem->lock);
> +
> +	return ret;
> +}
> +
>  static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
>  {
>  	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> @@ -1293,6 +1432,7 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
>  
>  		VG_CLEAR(mmap_arg);
>  		mmap_arg.handle = bo_gem->gem_handle;
> +		mmap_arg.flags = 0;
>  		mmap_arg.offset = 0;
>  		mmap_arg.size = bo->size;
>  		ret = drmIoctl(bufmgr_gem->fd,
> @@ -1553,6 +1693,12 @@ static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
>  }
>  
>  drm_public int
> +drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
> +{
> +	return drm_intel_gem_bo_unmap(bo);
> +}
> +
> +drm_public int
>  drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
>  {
>  	return drm_intel_gem_bo_unmap(bo);
> @@ -3538,6 +3684,10 @@ drm_intel_bufmgr_gem_init(int fd, int batch_size)
>  	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
>  	bufmgr_gem->has_vebox = (ret == 0) & (*gp.value > 0);
>  
> +	gp.param = I915_PARAM_MMAP_VERSION;
> +	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
> +	bufmgr_gem->has_ext_mmap = (ret == 0) & (*gp.value > 0);

It looks like this works, but can we have && instead?

-- 
Damien
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-12-03 14:13           ` Damien Lespiau
@ 2014-12-03 18:18             ` Chris Wilson
  2014-12-09 10:54             ` Chris Wilson
  1 sibling, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-12-03 18:18 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: akash.goel, intel-gfx

On Wed, Dec 03, 2014 at 02:13:58PM +0000, Damien Lespiau wrote:
> > +/* To be used in a similar way to mmap_gtt */
> > +drm_public int
> > +drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
> > +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> > +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> > +	struct drm_i915_gem_set_domain set_domain;
> > +	int ret;
> > +
> > +	pthread_mutex_lock(&bufmgr_gem->lock);
> > +
> > +	ret = map_wc(bo);
> > +	if (ret) {
> > +		pthread_mutex_unlock(&bufmgr_gem->lock);
> > +		return ret;
> > +	}
> > +
> > +	/* Now move it to the GTT domain so that the GPU and CPU
> > +	 * caches are flushed and the GPU isn't actively using the
> > +	 * buffer.
> > +	 *
> > +	 * The domain change is done even for the objects which
> > +	 * are not bounded. For them first the pages are acquired,
> > +	 * before the domain change.
> > +	 */
> > +	VG_CLEAR(set_domain);
> > +	set_domain.handle = bo_gem->gem_handle;
> > +	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
> > +	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
> > +	ret = drmIoctl(bufmgr_gem->fd,
> > +		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
> > +		       &set_domain);
> > +	if (ret != 0) {
> > +		DBG("%s:%d: Error setting domain %d: %s\n",
> > +		    __FILE__, __LINE__, bo_gem->gem_handle,
> > +		    strerror(errno));
> > +	}
> 
> Why move the buffer to the GTT domain and not the CPU domain here?

The entire point of the mmap(wc) interface is to access the buffer
outside of the CPU cache domain, and not using the GTT indirection.

We have 3 cache domains: in GPU, in CPU, neither (aka GTT).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-12-03 14:13           ` Damien Lespiau
  2014-12-03 18:18             ` Chris Wilson
@ 2014-12-09 10:54             ` Chris Wilson
  2014-12-09 16:53               ` Damien Lespiau
  1 sibling, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2014-12-09 10:54 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: akash.goel, intel-gfx

On Wed, Dec 03, 2014 at 02:13:58PM +0000, Damien Lespiau wrote:
> On Tue, Oct 28, 2014 at 06:39:27PM +0530, akash.goel@intel.com wrote:
> > @@ -1126,6 +1136,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
> >  
> >  	/* We may need to evict a few entries in order to create new mmaps */
> >  	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
> > +	if (bufmgr_gem->has_ext_mmap)
> > +		limit -= bufmgr_gem->vma_open;
> >  	if (limit < 0)
> >  		limit = 0;
> >  
> 
> Aren't we being overly aggressive in purging the vma cache with this new
> code? I guess it depends on the workload but I don't think the WC
> mapping is likely to be used in addition to the other two often enough
> to warrant a max - 3 * open;

And if they do, you've just exceeded their limit. And it conceivably
that you will have bo with all 3 vma mappings, though I only expect 2 to
be active in any single lifetime. The patch is correct regarding the code
as it exists, you may want to argue for finer tracking of vma_open and
vma_cached, but that would require a different approach.
 
> > +	/* Now move it to the GTT domain so that the GPU and CPU
> > +	 * caches are flushed and the GPU isn't actively using the
> > +	 * buffer.
> > +	 *
> > +	 * The domain change is done even for the objects which
> > +	 * are not bounded. For them first the pages are acquired,
> > +	 * before the domain change.
> > +	 */
> > +	VG_CLEAR(set_domain);
> > +	set_domain.handle = bo_gem->gem_handle;
> > +	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
> > +	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
> > +	ret = drmIoctl(bufmgr_gem->fd,
> > +		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
> > +		       &set_domain);
> > +	if (ret != 0) {
> > +		DBG("%s:%d: Error setting domain %d: %s\n",
> > +		    __FILE__, __LINE__, bo_gem->gem_handle,
> > +		    strerror(errno));
> > +	}
> 
> Why move the buffer to the GTT domain and not the CPU domain here?

There are 3 cache domains: GPU, CPU and NONE (aka GTT). The purpose of
mmap(wc) is to be able to access the buffer without going through the CPU
cache (and it should be out of the GPU cache to maintain coherency). The
NONE cache domain is exactly what we want, or else you end up in clflush
misery.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-12-09 10:54             ` Chris Wilson
@ 2014-12-09 16:53               ` Damien Lespiau
  2014-12-14  8:41                 ` [PATCH v4] " akash.goel
  2015-03-03 14:20                 ` [PATCH v3] " Damien Lespiau
  0 siblings, 2 replies; 46+ messages in thread
From: Damien Lespiau @ 2014-12-09 16:53 UTC (permalink / raw)
  To: Chris Wilson, akash.goel, intel-gfx

On Tue, Dec 09, 2014 at 10:54:25AM +0000, Chris Wilson wrote:
> On Wed, Dec 03, 2014 at 02:13:58PM +0000, Damien Lespiau wrote:
> > On Tue, Oct 28, 2014 at 06:39:27PM +0530, akash.goel@intel.com wrote:
> > > @@ -1126,6 +1136,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
> > >  
> > >  	/* We may need to evict a few entries in order to create new mmaps */
> > >  	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
> > > +	if (bufmgr_gem->has_ext_mmap)
> > > +		limit -= bufmgr_gem->vma_open;
> > >  	if (limit < 0)
> > >  		limit = 0;
> > >  
> > 
> > Aren't we being overly aggressive in purging the vma cache with this new
> > code? I guess it depends on the workload but I don't think the WC
> > mapping is likely to be used in addition to the other two often enough
> > to warrant a max - 3 * open;
> 
> And if they do, you've just exceeded their limit. And it conceivably
> that you will have bo with all 3 vma mappings, though I only expect 2 to
> be active in any single lifetime. The patch is correct regarding the code
> as it exists, you may want to argue for finer tracking of vma_open and
> vma_cached, but that would require a different approach.
>  
> > > +	/* Now move it to the GTT domain so that the GPU and CPU
> > > +	 * caches are flushed and the GPU isn't actively using the
> > > +	 * buffer.
> > > +	 *
> > > +	 * The domain change is done even for the objects which
> > > +	 * are not bounded. For them first the pages are acquired,
> > > +	 * before the domain change.
> > > +	 */
> > > +	VG_CLEAR(set_domain);
> > > +	set_domain.handle = bo_gem->gem_handle;
> > > +	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
> > > +	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
> > > +	ret = drmIoctl(bufmgr_gem->fd,
> > > +		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
> > > +		       &set_domain);
> > > +	if (ret != 0) {
> > > +		DBG("%s:%d: Error setting domain %d: %s\n",
> > > +		    __FILE__, __LINE__, bo_gem->gem_handle,
> > > +		    strerror(errno));
> > > +	}
> > 
> > Why move the buffer to the GTT domain and not the CPU domain here?
> 
> There are 3 cache domains: GPU, CPU and NONE (aka GTT). The purpose of
> mmap(wc) is to be able to access the buffer without going through the CPU
> cache (and it should be out of the GPU cache to maintain coherency). The
> NONE cache domain is exactly what we want, or else you end up in clflush
> misery.

Right that leaves the last point in my answer to v3. With that addressed
this is:

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

--
Damien
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support to create uncached user mapping for a Gem object
  2014-10-23 13:23       ` Chris Wilson
@ 2014-12-10  4:48         ` Chad Versace
  2014-12-10  8:02           ` Chris Wilson
  0 siblings, 1 reply; 46+ messages in thread
From: Chad Versace @ 2014-12-10  4:48 UTC (permalink / raw)
  To: Chris Wilson, akash.goel, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 804 bytes --]



On 10/23/2014 06:23 AM, Chris Wilson wrote:
> On Thu, Oct 23, 2014 at 12:56:46PM +0100, Chris Wilson wrote:
>> On Thu, Oct 23, 2014 at 04:03:56PM +0530, akash.goel@intel.com wrote:
>>> From: Akash Goel <akash.goel@intel.com>
>>>
>>
>>> This is for improving the CPU write operation performance, as with such
>>> mapping, writes are almost 50% faster than with mmap_gtt. Also it avoids the


> Oops, bugs beware. I wasn't calling the ioctl correctly.
> Having re-run the test, I now get:
> 
>    gtt       wc    Operation
> --------  ------   ---------
> 424000.0    1.30   ShmPutImage 10x10 square 
>  29500.0    1.44   ShmPutImage 100x100 square 
>   1510.0    1.38   ShmPutImage 500x500 square 
> 
> Supporting your claims. Very nice.

What are the units for these numbers?


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support to create uncached user mapping for a Gem object
  2014-12-10  4:48         ` Chad Versace
@ 2014-12-10  8:02           ` Chris Wilson
  0 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-12-10  8:02 UTC (permalink / raw)
  To: Chad Versace; +Cc: akash.goel, intel-gfx

On Tue, Dec 09, 2014 at 08:48:38PM -0800, Chad Versace wrote:
> 
> 
> On 10/23/2014 06:23 AM, Chris Wilson wrote:
> > On Thu, Oct 23, 2014 at 12:56:46PM +0100, Chris Wilson wrote:
> >> On Thu, Oct 23, 2014 at 04:03:56PM +0530, akash.goel@intel.com wrote:
> >>> From: Akash Goel <akash.goel@intel.com>
> >>>
> >>
> >>> This is for improving the CPU write operation performance, as with such
> >>> mapping, writes are almost 50% faster than with mmap_gtt. Also it avoids the
> 
> 
> > Oops, bugs beware. I wasn't calling the ioctl correctly.
> > Having re-run the test, I now get:
> > 
> >    gtt       wc    Operation
> > --------  ------   ---------
> > 424000.0    1.30   ShmPutImage 10x10 square 
> >  29500.0    1.44   ShmPutImage 100x100 square 
> >   1510.0    1.38   ShmPutImage 500x500 square 
> > 
> > Supporting your claims. Very nice.
> 
> What are the units for these numbers?

ops/s, relative factor on a baby byt.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-12-09 16:53               ` Damien Lespiau
@ 2014-12-14  8:41                 ` akash.goel
  2016-03-09  9:09                   ` [PATCH v5] " akash.goel
  2015-03-03 14:20                 ` [PATCH v3] " Damien Lespiau
  1 sibling, 1 reply; 46+ messages in thread
From: akash.goel @ 2014-12-14  8:41 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
patch. Through this interface Gfx clients can create write combining
virtual mappings of the Gem object. It will provide the same funtionality
of 'mmap_gtt' interface without the constraints of limited aperture space,
but provided clients handles the linear to tile conversion on their own.
This patch is intended for improving the CPU write operation performance,
as with such mapping, writes are almost 50% faster than with mmap_gtt.
Also it avoids the Cache flush after update from CPU side, when object is
passed on to GPU, which will be the case if regular mmap interface is used.
This type of mapping is specially useful in case of sub-region
update, i.e. when only a portion of the object is to be updated.
Also there is a support for the unsynchronized version of this interface
named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
mappings, but unsynchronized one, can be created of the Gem object.
To ensure the cache coherency, before using this mapping, the GTT domain has
been reused here. This provides the required Cache flush if the object is in
CPU domain or synchronization against the concurrent rendering

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.

v2: Aligned with the v2 of the corresponding kernel patch (Chris)
v3: Added the unmap calls for the wc mapping (Damien)
    Added the param feature check before creating the wc mapping & reduced
    the vma limit (Chris)
v4: Removed the extraneous unlock call from map_wc function (Damien)

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
---
 include/drm/i915_drm.h   |   9 +++
 intel/intel_bufmgr.h     |   3 +
 intel/intel_bufmgr_gem.c | 149 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 161 insertions(+)

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 15dd01d..a91a1d0 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_MMAP_VERSION		 29
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -487,6 +488,14 @@ struct drm_i915_gem_mmap {
 	 * This is a fixed-size type for 32/64 compatibility.
 	 */
 	__u64 addr_ptr;
+
+	/**
+	 * Flags for extended behaviour.
+	 *
+	 * Added in version 2.
+	*/
+	__u64 flags;
+#define I915_MMAP_WC 0x1
 };
 
 struct drm_i915_gem_mmap_gtt {
diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index be83a56..bda4115 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -177,6 +177,9 @@ void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
 int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
 int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
 int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc(drm_intel_bo *bo);
+int drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo);
 
 int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
 void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 14e92c9..9109325 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -130,6 +130,7 @@ typedef struct _drm_intel_bufmgr_gem {
 	unsigned int bo_reuse : 1;
 	unsigned int no_exec : 1;
 	unsigned int has_vebox : 1;
+	unsigned int has_ext_mmap : 1;
 	bool fenced_relocs;
 
 	char *aub_filename;
@@ -184,6 +185,8 @@ struct _drm_intel_bo_gem {
 	int reloc_count;
 	/** Mapped address for the buffer, saved across map/unmap cycles */
 	void *mem_virtual;
+	/** Uncached Mapped address for the buffer, saved across map/unmap cycles */
+	void *mem_wc_virtual;
 	/** GTT virtual address for the buffer, saved across map/unmap cycles */
 	void *gtt_virtual;
 	/**
@@ -1058,6 +1061,10 @@ drm_intel_gem_bo_free(drm_intel_bo *bo)
 		drm_munmap(bo_gem->gtt_virtual, bo_gem->bo.size);
 		bufmgr_gem->vma_count--;
 	}
+	if (bo_gem->mem_wc_virtual) {
+		drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+		bufmgr_gem->vma_count--;
+	}
 
 	/* Close this object */
 	VG_CLEAR(close);
@@ -1082,6 +1089,9 @@ drm_intel_gem_bo_mark_mmaps_incoherent(drm_intel_bo *bo)
 
 	if (bo_gem->gtt_virtual)
 		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->gtt_virtual, bo->size);
+
+	if (bo_gem->mem_wc_virtual)
+		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->mem_wc_virtual, bo->size);
 #endif
 }
 
@@ -1127,6 +1137,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 
 	/* We may need to evict a few entries in order to create new mmaps */
 	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
+	if (bufmgr_gem->has_ext_mmap)
+		limit -= bufmgr_gem->vma_open;
 	if (limit < 0)
 		limit = 0;
 
@@ -1149,6 +1161,11 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 			bo_gem->gtt_virtual = NULL;
 			bufmgr_gem->vma_count--;
 		}
+		if (bo_gem->mem_wc_virtual) {
+			drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+			bo_gem->mem_wc_virtual = NULL;
+			bufmgr_gem->vma_count--;
+		}
 	}
 }
 
@@ -1161,6 +1178,8 @@ static void drm_intel_gem_bo_close_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count++;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count++;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count++;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1173,6 +1192,8 @@ static void drm_intel_gem_bo_open_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count--;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count--;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count--;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1268,6 +1289,123 @@ static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
+static int
+map_wc(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	if (!bufmgr_gem->has_ext_mmap)
+		return -EINVAL;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	/* Get a mapping of the buffer if we haven't before. */
+	if (bo_gem->mem_wc_virtual == NULL) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		VG_CLEAR(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		/* To indicate the uncached virtual mapping to KMD */
+		mmap_arg.flags = I915_MMAP_WC;
+		mmap_arg.offset = 0;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+
+	bo->virtual = bo_gem->mem_wc_virtual;
+
+	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_wc_virtual);
+
+	return 0;
+}
+
+/* To be used in a similar way to mmap_gtt */
+drm_public int
+drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
+	/* Now move it to the GTT domain so that the GPU and CPU
+	 * caches are flushed and the GPU isn't actively using the
+	 * buffer.
+	 *
+	 * The domain change is done even for the objects which
+	 * are not bounded. For them first the pages are acquired,
+	 * before the domain change.
+	 */
+	VG_CLEAR(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+drm_public int
+drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+#ifdef HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+#endif
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret == 0) {
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	}
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
 static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
@@ -1294,6 +1432,7 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 
 		VG_CLEAR(mmap_arg);
 		mmap_arg.handle = bo_gem->gem_handle;
+		mmap_arg.flags = 0;
 		mmap_arg.offset = 0;
 		mmap_arg.size = bo->size;
 		ret = drmIoctl(bufmgr_gem->fd,
@@ -1554,6 +1693,12 @@ static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
 }
 
 drm_public int
+drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
+{
+	return drm_intel_gem_bo_unmap(bo);
+}
+
+drm_public int
 drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
 {
 	return drm_intel_gem_bo_unmap(bo);
@@ -3542,6 +3687,10 @@ drm_intel_bufmgr_gem_init(int fd, int batch_size)
 	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
 	bufmgr_gem->has_vebox = (ret == 0) & (*gp.value > 0);
 
+	gp.param = I915_PARAM_MMAP_VERSION;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_ext_mmap = (ret == 0) & (*gp.value > 0);
+
 	if (bufmgr_gem->gen < 4) {
 		gp.param = I915_PARAM_NUM_FENCES_AVAIL;
 		gp.value = &bufmgr_gem->available_fences;
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support creation of unbound wc user mappings for objects
  2014-10-23 16:55     ` [PATCH] drm/i915: Support creation of unbound wc user mappings for objects Chris Wilson
                         ` (3 preceding siblings ...)
  2014-11-05 12:48       ` [PATCH] drm/i915: Support creation of " Chris Wilson
@ 2014-12-17 12:46       ` Tvrtko Ursulin
  2014-12-17 12:52         ` Chris Wilson
  4 siblings, 1 reply; 46+ messages in thread
From: Tvrtko Ursulin @ 2014-12-17 12:46 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter, Akash Goel


Hi,

On 10/23/2014 05:55 PM, Chris Wilson wrote:
> From: Akash Goel <akash.goel@intel.com>
>
> This patch provides support to create write-combining virtual mappings of
> GEM object. It intends to provide the same funtionality of 'mmap_gtt'
> interface without the constraints and contention of a limited aperture
> space, but requires clients handles the linear to tile conversion on their
> own. This is for improving the CPU write operation performance, as with such
> mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
> to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
> flush after update from CPU side, when object is passed onto GPU.  This
> type of mapping is specially useful in case of sub-region update,
> i.e. when only a portion of the object is to be updated. Using a CPU mmap
> in such cases would normally incur a clflush of the whole object, and
> using a GTT mmapping would likely require eviction of an active object or
> fence and thus stall. The write-combining CPU mmap avoids both.
>
> To ensure the cache coherency, before using this mapping, the GTT domain
> has been reused here. This provides the required cache flush if the object
> is in CPU domain or synchronization against the concurrent rendering.
> Although the access through an uncached mmap should automatically
> invalidate the cache lines, this may not be true for non-temporal write
> instructions and also not all pages of the object may be updated at any
> given point of time through this mapping.  Having a call to get_pages in
> set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
> Broaden application of set-domain(GTT)', would guarantee the clflush and
> so there will be no cachelines holding the data for the object before it
> is accessed through this map.
>
> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> extended with a new flags field (defaulting to 0 for existent users). In
> order for userspace to detect the extended ioctl, a new parameter
> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
>
> v2: Fix error handling, invalid flag detection, renaming (ickle)
>
> The new mmapping is exercised by igt/gem_mmap_wc,
> igt/gem_concurrent_blit and igt/gem_gtt_speed.
>
> Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>   drivers/gpu/drm/i915/i915_dma.c |  3 +++
>   drivers/gpu/drm/i915/i915_gem.c | 19 +++++++++++++++++++
>   include/uapi/drm/i915_drm.h     |  9 +++++++++
>   3 files changed, 31 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 1b398070b230..496fb72e25c8 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1027,6 +1027,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>   	case I915_PARAM_CMD_PARSER_VERSION:
>   		value = i915_cmd_parser_get_version();
>   		break;
> +	case I915_PARAM_MMAP_VERSION:
> +		value = 1;
> +		break;
>   	default:
>   		DRM_DEBUG("Unknown parameter %d\n", param->param);
>   		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 895f9881f0aa..7d7265fcee95 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1485,6 +1485,12 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
>   	struct drm_gem_object *obj;
>   	unsigned long addr;
>
> +	if (args->flags & ~(I915_MMAP_WC))
> +		return -EINVAL;
> +
> +	if (args->flags & I915_MMAP_WC && !cpu_has_pat)
> +		return -ENODEV;
> +
>   	obj = drm_gem_object_lookup(dev, file, args->handle);
>   	if (obj == NULL)
>   		return -ENOENT;
> @@ -1500,6 +1506,19 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
>   	addr = vm_mmap(obj->filp, 0, args->size,
>   		       PROT_READ | PROT_WRITE, MAP_SHARED,
>   		       args->offset);
> +	if (args->flags & I915_MMAP_WC) {
> +		struct mm_struct *mm = current->mm;
> +		struct vm_area_struct *vma;
> +
> +		down_write(&mm->mmap_sem);
> +		vma = find_vma(mm, addr);
> +		if (vma)
> +			vma->vm_page_prot =
> +				pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
> +		else
> +			addr = -ENOMEM;
> +		up_write(&mm->mmap_sem);
> +	}
>   	drm_gem_object_unreference_unlocked(obj);
>   	if (IS_ERR((void *)addr))
>   		return addr;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index ff57f07c3249..9b3762724a2b 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
>   #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
>   #define I915_PARAM_HAS_WT     	 	 27
>   #define I915_PARAM_CMD_PARSER_VERSION	 28
> +#define I915_PARAM_MMAP_VERSION          29
>
>   typedef struct drm_i915_getparam {
>   	int param;
> @@ -487,6 +488,14 @@ struct drm_i915_gem_mmap {
>   	 * This is a fixed-size type for 32/64 compatibility.
>   	 */
>   	__u64 addr_ptr;
> +
> +	/**
> +	 * Flags for extended behaviour.
> +	 *
> +	 * Added in version 2.
> +	 */
> +	__u64 flags;
> +#define I915_MMAP_WC 0x1
>   };
>
>   struct drm_i915_gem_mmap_gtt {

I had to spend some time reading about stuff but eventually this looks 
good to me. So:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Just one thing to clarify - the same write combining speed up effect is 
already possible using the mentioned non-temporal instructions, this 
just makes it easier to use for a wider group of users?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] drm/i915: Support creation of unbound wc user mappings for objects
  2014-12-17 12:46       ` Tvrtko Ursulin
@ 2014-12-17 12:52         ` Chris Wilson
  0 siblings, 0 replies; 46+ messages in thread
From: Chris Wilson @ 2014-12-17 12:52 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, intel-gfx, Akash Goel

On Wed, Dec 17, 2014 at 12:46:44PM +0000, Tvrtko Ursulin wrote:
> Just one thing to clarify - the same write combining speed up effect
> is already possible using the mentioned non-temporal instructions,
> this just makes it easier to use for a wider group of users?

Kind off. We did identify one corner case in the cache tracking that
required the previous patch to "broaden set-domain(GTT)" and for
userspace to remember to use set-domain(GTT) alongside movntaps.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-12-09 16:53               ` Damien Lespiau
  2014-12-14  8:41                 ` [PATCH v4] " akash.goel
@ 2015-03-03 14:20                 ` Damien Lespiau
  2015-03-03 17:05                   ` Chris Wilson
  1 sibling, 1 reply; 46+ messages in thread
From: Damien Lespiau @ 2015-03-03 14:20 UTC (permalink / raw)
  To: Chris Wilson, akash.goel, intel-gfx

On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> Right that leaves the last point in my answer to v3. With that addressed
> this is:
> 
> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

Hum it seems that we've upstreamed the kernel part without the libdrm
API. Is it time to fix this? Akash, Chris, any objection if I push this
v3? is there a more up-to-date patch that was exercised by some driver?

Thanks,

-- 
Damien
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2015-03-03 14:20                 ` [PATCH v3] " Damien Lespiau
@ 2015-03-03 17:05                   ` Chris Wilson
  2015-03-03 17:11                     ` Damien Lespiau
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2015-03-03 17:05 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: akash.goel, intel-gfx

On Tue, Mar 03, 2015 at 02:20:03PM +0000, Damien Lespiau wrote:
> On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> > Right that leaves the last point in my answer to v3. With that addressed
> > this is:
> > 
> > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> 
> Hum it seems that we've upstreamed the kernel part without the libdrm
> API. Is it time to fix this? Akash, Chris, any objection if I push this
> v3? is there a more up-to-date patch that was exercised by some driver?

There's no actual user of the proposed libdrm API afaik. It should make
a nice drop-in improvement for mesa, but we may need to better expose
bufmgr_gem->has_ext_mmap (I don't think mesa needs to care about failure
with old CPUs if it only focuses on mesa/i965).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2015-03-03 17:05                   ` Chris Wilson
@ 2015-03-03 17:11                     ` Damien Lespiau
  2015-03-03 17:13                       ` Chris Wilson
  0 siblings, 1 reply; 46+ messages in thread
From: Damien Lespiau @ 2015-03-03 17:11 UTC (permalink / raw)
  To: Chris Wilson, akash.goel, intel-gfx

On Tue, Mar 03, 2015 at 05:05:52PM +0000, Chris Wilson wrote:
> On Tue, Mar 03, 2015 at 02:20:03PM +0000, Damien Lespiau wrote:
> > On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> > > Right that leaves the last point in my answer to v3. With that addressed
> > > this is:
> > > 
> > > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> > 
> > Hum it seems that we've upstreamed the kernel part without the libdrm
> > API. Is it time to fix this? Akash, Chris, any objection if I push this
> > v3? is there a more up-to-date patch that was exercised by some driver?
> 
> There's no actual user of the proposed libdrm API afaik. It should make
> a nice drop-in improvement for mesa, but we may need to better expose
> bufmgr_gem->has_ext_mmap (I don't think mesa needs to care about failure
> with old CPUs if it only focuses on mesa/i965).

We've cleared that with Dave last time and if there's the kernel API in
place (because the DDX uses it) then we can upstream the libdrm one, for
the sake of completeness.

-- 
Damien
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2015-03-03 17:11                     ` Damien Lespiau
@ 2015-03-03 17:13                       ` Chris Wilson
  2015-03-03 17:16                         ` Damien Lespiau
  0 siblings, 1 reply; 46+ messages in thread
From: Chris Wilson @ 2015-03-03 17:13 UTC (permalink / raw)
  To: Damien Lespiau; +Cc: akash.goel, intel-gfx

On Tue, Mar 03, 2015 at 05:11:12PM +0000, Damien Lespiau wrote:
> On Tue, Mar 03, 2015 at 05:05:52PM +0000, Chris Wilson wrote:
> > On Tue, Mar 03, 2015 at 02:20:03PM +0000, Damien Lespiau wrote:
> > > On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> > > > Right that leaves the last point in my answer to v3. With that addressed
> > > > this is:
> > > > 
> > > > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> > > 
> > > Hum it seems that we've upstreamed the kernel part without the libdrm
> > > API. Is it time to fix this? Akash, Chris, any objection if I push this
> > > v3? is there a more up-to-date patch that was exercised by some driver?
> > 
> > There's no actual user of the proposed libdrm API afaik. It should make
> > a nice drop-in improvement for mesa, but we may need to better expose
> > bufmgr_gem->has_ext_mmap (I don't think mesa needs to care about failure
> > with old CPUs if it only focuses on mesa/i965).
> 
> We've cleared that with Dave last time and if there's the kernel API in
> place (because the DDX uses it) then we can upstream the libdrm one, for
> the sake of completeness.

I meant in terms of whether the API is sufficient for mesa.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] intel: New libdrm interface to create unbound wc user mappings for objects
  2015-03-03 17:13                       ` Chris Wilson
@ 2015-03-03 17:16                         ` Damien Lespiau
  0 siblings, 0 replies; 46+ messages in thread
From: Damien Lespiau @ 2015-03-03 17:16 UTC (permalink / raw)
  To: Chris Wilson, akash.goel, intel-gfx

On Tue, Mar 03, 2015 at 05:13:41PM +0000, Chris Wilson wrote:
> On Tue, Mar 03, 2015 at 05:11:12PM +0000, Damien Lespiau wrote:
> > On Tue, Mar 03, 2015 at 05:05:52PM +0000, Chris Wilson wrote:
> > > On Tue, Mar 03, 2015 at 02:20:03PM +0000, Damien Lespiau wrote:
> > > > On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> > > > > Right that leaves the last point in my answer to v3. With that addressed
> > > > > this is:
> > > > > 
> > > > > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> > > > 
> > > > Hum it seems that we've upstreamed the kernel part without the libdrm
> > > > API. Is it time to fix this? Akash, Chris, any objection if I push this
> > > > v3? is there a more up-to-date patch that was exercised by some driver?
> > > 
> > > There's no actual user of the proposed libdrm API afaik. It should make
> > > a nice drop-in improvement for mesa, but we may need to better expose
> > > bufmgr_gem->has_ext_mmap (I don't think mesa needs to care about failure
> > > with old CPUs if it only focuses on mesa/i965).
> > 
> > We've cleared that with Dave last time and if there's the kernel API in
> > place (because the DDX uses it) then we can upstream the libdrm one, for
> > the sake of completeness.
> 
> I meant in terms of whether the API is sufficient for mesa.

Next time, I'll actually read your comment :)

-- 
Damien
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v5] intel: New libdrm interface to create unbound wc user mappings for objects
  2014-12-14  8:41                 ` [PATCH v4] " akash.goel
@ 2016-03-09  9:09                   ` akash.goel
  2016-03-10  8:39                     ` Martin Peres
  0 siblings, 1 reply; 46+ messages in thread
From: akash.goel @ 2016-03-09  9:09 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel, martin.peres

From: Akash Goel <akash.goel@intel.com>

A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
patch. Through this interface Gfx clients can create write combining
virtual mappings of the Gem object. It will provide the same funtionality
of 'mmap_gtt' interface without the constraints of limited aperture space,
but provided clients handles the linear to tile conversion on their own.
This patch is intended for improving the CPU write operation performance,
as with such mapping, writes are almost 50% faster than with mmap_gtt.
Also it avoids the Cache flush after update from CPU side, when object is
passed on to GPU, which will be the case if regular mmap interface is used.
This type of mapping is specially useful in case of sub-region
update, i.e. when only a portion of the object is to be updated.
Also there is a support for the unsynchronized version of this interface
named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
mappings, but unsynchronized one, can be created of the Gem object.
To ensure the cache coherency, before using this mapping, the GTT domain has
been reused here. This provides the required Cache flush if the object is in
CPU domain or synchronization against the concurrent rendering

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.

v2: Aligned with the v2 of the corresponding kernel patch (Chris)
v3: Added the unmap calls for the wc mapping (Damien)
    Added the param feature check before creating the wc mapping & reduced
    the vma limit (Chris)
v4: Removed the extraneous unlock call from map_wc function (Damien)
v5: Rebased.

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
---
 intel/intel_bufmgr.h     |   3 +
 intel/intel_bufmgr_gem.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+)

diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index a1abbcd..061b500 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -183,6 +183,9 @@ void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
 int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
 int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
 int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc(drm_intel_bo *bo);
+int drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo);
 
 int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
 void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index dc28200..ca2f9ed 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -146,6 +146,7 @@ typedef struct _drm_intel_bufmgr_gem {
 	unsigned int bo_reuse : 1;
 	unsigned int no_exec : 1;
 	unsigned int has_vebox : 1;
+	unsigned int has_ext_mmap : 1;
 	bool fenced_relocs;
 
 	struct {
@@ -209,6 +210,8 @@ struct _drm_intel_bo_gem {
 
 	/** Mapped address for the buffer, saved across map/unmap cycles */
 	void *mem_virtual;
+	/** Uncached Mapped address for the buffer, saved across map/unmap cycles */
+	void *mem_wc_virtual;
 	/** GTT virtual address for the buffer, saved across map/unmap cycles */
 	void *gtt_virtual;
 	/**
@@ -1192,6 +1195,10 @@ drm_intel_gem_bo_free(drm_intel_bo *bo)
 		drm_munmap(bo_gem->gtt_virtual, bo_gem->bo.size);
 		bufmgr_gem->vma_count--;
 	}
+	if (bo_gem->mem_wc_virtual) {
+		drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+		bufmgr_gem->vma_count--;
+	}
 
 	/* Close this object */
 	memclear(close);
@@ -1215,6 +1222,9 @@ drm_intel_gem_bo_mark_mmaps_incoherent(drm_intel_bo *bo)
 
 	if (bo_gem->gtt_virtual)
 		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->gtt_virtual, bo->size);
+
+	if (bo_gem->mem_wc_virtual)
+		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->mem_wc_virtual, bo->size);
 #endif
 }
 
@@ -1260,6 +1270,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 
 	/* We may need to evict a few entries in order to create new mmaps */
 	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
+	if (bufmgr_gem->has_ext_mmap)
+		limit -= bufmgr_gem->vma_open;
 	if (limit < 0)
 		limit = 0;
 
@@ -1282,6 +1294,11 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 			bo_gem->gtt_virtual = NULL;
 			bufmgr_gem->vma_count--;
 		}
+		if (bo_gem->mem_wc_virtual) {
+			drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+			bo_gem->mem_wc_virtual = NULL;
+			bufmgr_gem->vma_count--;
+		}
 	}
 }
 
@@ -1294,6 +1311,8 @@ static void drm_intel_gem_bo_close_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count++;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count++;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count++;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1306,6 +1325,8 @@ static void drm_intel_gem_bo_open_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count--;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count--;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count--;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1410,6 +1431,122 @@ static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
+static int
+map_wc(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	if (!bufmgr_gem->has_ext_mmap)
+		return -EINVAL;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	/* Get a mapping of the buffer if we haven't before. */
+	if (bo_gem->mem_wc_virtual == NULL) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		memclear(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		/* To indicate the uncached virtual mapping to KMD */
+		mmap_arg.flags = I915_MMAP_WC;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+
+	bo->virtual = bo_gem->mem_wc_virtual;
+
+	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_wc_virtual);
+
+	return 0;
+}
+
+/* To be used in a similar way to mmap_gtt */
+int
+drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
+	/* Now move it to the GTT domain so that the GPU and CPU
+	 * caches are flushed and the GPU isn't actively using the
+	 * buffer.
+	 *
+	 * The domain change is done even for the objects which
+	 * are not bounded. For them first the pages are acquired,
+	 * before the domain change.
+	 */
+	memclear(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+int
+drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+#ifdef HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+#endif
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret == 0) {
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	}
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
 static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
@@ -1695,6 +1832,12 @@ static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
 }
 
 int
+drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
+{
+	return drm_intel_gem_bo_unmap(bo);
+}
+
+int
 drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
 {
 	return drm_intel_gem_bo_unmap(bo);
@@ -3425,6 +3568,10 @@ drm_intel_bufmgr_gem_init(int fd, int batch_size)
 	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
 	bufmgr_gem->has_vebox = (ret == 0) & (*gp.value > 0);
 
+	gp.param = I915_PARAM_MMAP_VERSION;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_ext_mmap = (ret == 0) & (*gp.value > 0);
+
 	gp.param = I915_PARAM_HAS_EXEC_SOFTPIN;
 	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
 	if (ret == 0 && *gp.value > 0)
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v5] intel: New libdrm interface to create unbound wc user mappings for objects
  2016-03-09  9:09                   ` [PATCH v5] " akash.goel
@ 2016-03-10  8:39                     ` Martin Peres
  2016-03-14 16:51                       ` Martin Peres
  0 siblings, 1 reply; 46+ messages in thread
From: Martin Peres @ 2016-03-10  8:39 UTC (permalink / raw)
  To: akash.goel, intel-gfx

On 09/03/16 11:09, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
>
> A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
> patch. Through this interface Gfx clients can create write combining
> virtual mappings of the Gem object. It will provide the same funtionality
> of 'mmap_gtt' interface without the constraints of limited aperture space,
> but provided clients handles the linear to tile conversion on their own.
> This patch is intended for improving the CPU write operation performance,
> as with such mapping, writes are almost 50% faster than with mmap_gtt.
> Also it avoids the Cache flush after update from CPU side, when object is
> passed on to GPU, which will be the case if regular mmap interface is used.
> This type of mapping is specially useful in case of sub-region
> update, i.e. when only a portion of the object is to be updated.
> Also there is a support for the unsynchronized version of this interface
> named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
> mappings, but unsynchronized one, can be created of the Gem object.
> To ensure the cache coherency, before using this mapping, the GTT domain has
> been reused here. This provides the required Cache flush if the object is in
> CPU domain or synchronization against the concurrent rendering
>
> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> extended with a new flags field (defaulting to 0 for existent users). In
> order for userspace to detect the extended ioctl, a new parameter
> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
>
> v2: Aligned with the v2 of the corresponding kernel patch (Chris)
> v3: Added the unmap calls for the wc mapping (Damien)
>      Added the param feature check before creating the wc mapping & reduced
>      the vma limit (Chris)
> v4: Removed the extraneous unlock call from map_wc function (Damien)
> v5: Rebased.
>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

Chris, Damien,

Do you still stand by your S-o-b and R-b? If so, I can push the patch in 
the coming days.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v5] intel: New libdrm interface to create unbound wc user mappings for objects
  2016-03-10  8:39                     ` Martin Peres
@ 2016-03-14 16:51                       ` Martin Peres
  2016-03-15  8:41                         ` Daniel Vetter
  0 siblings, 1 reply; 46+ messages in thread
From: Martin Peres @ 2016-03-14 16:51 UTC (permalink / raw)
  To: akash.goel, intel-gfx

On 10/03/16 10:39, Martin Peres wrote:
> On 09/03/16 11:09, akash.goel@intel.com wrote:
>> From: Akash Goel <akash.goel@intel.com>
>>
>> A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
>> patch. Through this interface Gfx clients can create write combining
>> virtual mappings of the Gem object. It will provide the same funtionality
>> of 'mmap_gtt' interface without the constraints of limited aperture
>> space,
>> but provided clients handles the linear to tile conversion on their own.
>> This patch is intended for improving the CPU write operation performance,
>> as with such mapping, writes are almost 50% faster than with mmap_gtt.
>> Also it avoids the Cache flush after update from CPU side, when object is
>> passed on to GPU, which will be the case if regular mmap interface is
>> used.
>> This type of mapping is specially useful in case of sub-region
>> update, i.e. when only a portion of the object is to be updated.
>> Also there is a support for the unsynchronized version of this interface
>> named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
>> mappings, but unsynchronized one, can be created of the Gem object.
>> To ensure the cache coherency, before using this mapping, the GTT
>> domain has
>> been reused here. This provides the required Cache flush if the object
>> is in
>> CPU domain or synchronization against the concurrent rendering
>>
>> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has
>> been
>> extended with a new flags field (defaulting to 0 for existent users). In
>> order for userspace to detect the extended ioctl, a new parameter
>> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl
>> interface.
>>
>> v2: Aligned with the v2 of the corresponding kernel patch (Chris)
>> v3: Added the unmap calls for the wc mapping (Damien)
>>      Added the param feature check before creating the wc mapping &
>> reduced
>>      the vma limit (Chris)
>> v4: Removed the extraneous unlock call from map_wc function (Damien)
>> v5: Rebased.
>>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
>
> Chris, Damien,
>
> Do you still stand by your S-o-b and R-b? If so, I can push the patch in
> the coming days.

Another ping, with both Chris and Damien CCed this time.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v5] intel: New libdrm interface to create unbound wc user mappings for objects
  2016-03-14 16:51                       ` Martin Peres
@ 2016-03-15  8:41                         ` Daniel Vetter
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Vetter @ 2016-03-15  8:41 UTC (permalink / raw)
  To: Martin Peres; +Cc: akash.goel, intel-gfx

On Mon, Mar 14, 2016 at 06:51:54PM +0200, Martin Peres wrote:
> On 10/03/16 10:39, Martin Peres wrote:
> >On 09/03/16 11:09, akash.goel@intel.com wrote:
> >>From: Akash Goel <akash.goel@intel.com>
> >>
> >>A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
> >>patch. Through this interface Gfx clients can create write combining
> >>virtual mappings of the Gem object. It will provide the same funtionality
> >>of 'mmap_gtt' interface without the constraints of limited aperture
> >>space,
> >>but provided clients handles the linear to tile conversion on their own.
> >>This patch is intended for improving the CPU write operation performance,
> >>as with such mapping, writes are almost 50% faster than with mmap_gtt.
> >>Also it avoids the Cache flush after update from CPU side, when object is
> >>passed on to GPU, which will be the case if regular mmap interface is
> >>used.
> >>This type of mapping is specially useful in case of sub-region
> >>update, i.e. when only a portion of the object is to be updated.
> >>Also there is a support for the unsynchronized version of this interface
> >>named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
> >>mappings, but unsynchronized one, can be created of the Gem object.
> >>To ensure the cache coherency, before using this mapping, the GTT
> >>domain has
> >>been reused here. This provides the required Cache flush if the object
> >>is in
> >>CPU domain or synchronization against the concurrent rendering
> >>
> >>The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has
> >>been
> >>extended with a new flags field (defaulting to 0 for existent users). In
> >>order for userspace to detect the extended ioctl, a new parameter
> >>I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl
> >>interface.
> >>
> >>v2: Aligned with the v2 of the corresponding kernel patch (Chris)
> >>v3: Added the unmap calls for the wc mapping (Damien)
> >>     Added the param feature check before creating the wc mapping &
> >>reduced
> >>     the vma limit (Chris)
> >>v4: Removed the extraneous unlock call from map_wc function (Damien)
> >>v5: Rebased.
> >>
> >>Signed-off-by: Akash Goel <akash.goel@intel.com>
> >>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> >
> >Chris, Damien,
> >
> >Do you still stand by your S-o-b and R-b? If so, I can push the patch in
> >the coming days.
> 
> Another ping, with both Chris and Damien CCed this time.

Not entirely sure, but we tended to not push libdrm patches if the open
source userspace wasn't there yet. That's why this one was hanging in the
air forever iirc.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects
@ 2015-01-02 11:06 akash.goel
  0 siblings, 0 replies; 46+ messages in thread
From: akash.goel @ 2015-01-02 11:06 UTC (permalink / raw)
  To: intel-gfx; +Cc: daniel.vetter, Akash Goel

From: Akash Goel <akash.goel@intel.com>

A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
patch. Through this interface Gfx clients can create write combining
virtual mappings of the Gem object. It will provide the same funtionality
of 'mmap_gtt' interface without the constraints of limited aperture space,
but provided clients handles the linear to tile conversion on their own.
This patch is intended for improving the CPU write operation performance,
as with such mapping, writes are almost 50% faster than with mmap_gtt.
Also it avoids the Cache flush after update from CPU side, when object is
passed on to GPU, which will be the case if regular mmap interface is used.
This type of mapping is specially useful in case of sub-region
update, i.e. when only a portion of the object is to be updated.
Also there is a support for the unsynchronized version of this interface
named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
mappings, but unsynchronized one, can be created of the Gem object.
To ensure the cache coherency, before using this mapping, the GTT domain has
been reused here. This provides the required Cache flush if the object is in
CPU domain or synchronization against the concurrent rendering

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.

v2: Aligned with the v2 of the corresponding kernel patch (Chris)
v3: Added the unmap calls for the wc mapping (Damien)
    Added the param feature check before creating the wc mapping & reduced
    the vma limit (Chris)
v4: Removed the extraneous unlock call from map_wc function (Damien)

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
---
 include/drm/i915_drm.h   |   9 +++
 intel/intel_bufmgr.h     |   3 +
 intel/intel_bufmgr_gem.c | 149 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 161 insertions(+)

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 15dd01d..a91a1d0 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -340,6 +340,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_MMAP_VERSION		 29
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -487,6 +488,14 @@ struct drm_i915_gem_mmap {
 	 * This is a fixed-size type for 32/64 compatibility.
 	 */
 	__u64 addr_ptr;
+
+	/**
+	 * Flags for extended behaviour.
+	 *
+	 * Added in version 2.
+	*/
+	__u64 flags;
+#define I915_MMAP_WC 0x1
 };
 
 struct drm_i915_gem_mmap_gtt {
diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index be83a56..bda4115 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -177,6 +177,9 @@ void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
 int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
 int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
 int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc(drm_intel_bo *bo);
+int drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo);
 
 int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
 void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 14e92c9..9109325 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -130,6 +130,7 @@ typedef struct _drm_intel_bufmgr_gem {
 	unsigned int bo_reuse : 1;
 	unsigned int no_exec : 1;
 	unsigned int has_vebox : 1;
+	unsigned int has_ext_mmap : 1;
 	bool fenced_relocs;
 
 	char *aub_filename;
@@ -184,6 +185,8 @@ struct _drm_intel_bo_gem {
 	int reloc_count;
 	/** Mapped address for the buffer, saved across map/unmap cycles */
 	void *mem_virtual;
+	/** Uncached Mapped address for the buffer, saved across map/unmap cycles */
+	void *mem_wc_virtual;
 	/** GTT virtual address for the buffer, saved across map/unmap cycles */
 	void *gtt_virtual;
 	/**
@@ -1058,6 +1061,10 @@ drm_intel_gem_bo_free(drm_intel_bo *bo)
 		drm_munmap(bo_gem->gtt_virtual, bo_gem->bo.size);
 		bufmgr_gem->vma_count--;
 	}
+	if (bo_gem->mem_wc_virtual) {
+		drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+		bufmgr_gem->vma_count--;
+	}
 
 	/* Close this object */
 	VG_CLEAR(close);
@@ -1082,6 +1089,9 @@ drm_intel_gem_bo_mark_mmaps_incoherent(drm_intel_bo *bo)
 
 	if (bo_gem->gtt_virtual)
 		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->gtt_virtual, bo->size);
+
+	if (bo_gem->mem_wc_virtual)
+		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->mem_wc_virtual, bo->size);
 #endif
 }
 
@@ -1127,6 +1137,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 
 	/* We may need to evict a few entries in order to create new mmaps */
 	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
+	if (bufmgr_gem->has_ext_mmap)
+		limit -= bufmgr_gem->vma_open;
 	if (limit < 0)
 		limit = 0;
 
@@ -1149,6 +1161,11 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 			bo_gem->gtt_virtual = NULL;
 			bufmgr_gem->vma_count--;
 		}
+		if (bo_gem->mem_wc_virtual) {
+			drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+			bo_gem->mem_wc_virtual = NULL;
+			bufmgr_gem->vma_count--;
+		}
 	}
 }
 
@@ -1161,6 +1178,8 @@ static void drm_intel_gem_bo_close_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count++;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count++;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count++;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1173,6 +1192,8 @@ static void drm_intel_gem_bo_open_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count--;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count--;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count--;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1268,6 +1289,123 @@ static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
+static int
+map_wc(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	if (!bufmgr_gem->has_ext_mmap)
+		return -EINVAL;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	/* Get a mapping of the buffer if we haven't before. */
+	if (bo_gem->mem_wc_virtual == NULL) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		VG_CLEAR(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		/* To indicate the uncached virtual mapping to KMD */
+		mmap_arg.flags = I915_MMAP_WC;
+		mmap_arg.offset = 0;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+
+	bo->virtual = bo_gem->mem_wc_virtual;
+
+	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_wc_virtual);
+
+	return 0;
+}
+
+/* To be used in a similar way to mmap_gtt */
+drm_public int
+drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
+	/* Now move it to the GTT domain so that the GPU and CPU
+	 * caches are flushed and the GPU isn't actively using the
+	 * buffer.
+	 *
+	 * The domain change is done even for the objects which
+	 * are not bounded. For them first the pages are acquired,
+	 * before the domain change.
+	 */
+	VG_CLEAR(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+drm_public int
+drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+#ifdef HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+#endif
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret == 0) {
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	}
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
 static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
@@ -1294,6 +1432,7 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 
 		VG_CLEAR(mmap_arg);
 		mmap_arg.handle = bo_gem->gem_handle;
+		mmap_arg.flags = 0;
 		mmap_arg.offset = 0;
 		mmap_arg.size = bo->size;
 		ret = drmIoctl(bufmgr_gem->fd,
@@ -1554,6 +1693,12 @@ static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
 }
 
 drm_public int
+drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
+{
+	return drm_intel_gem_bo_unmap(bo);
+}
+
+drm_public int
 drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
 {
 	return drm_intel_gem_bo_unmap(bo);
@@ -3542,6 +3687,10 @@ drm_intel_bufmgr_gem_init(int fd, int batch_size)
 	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
 	bufmgr_gem->has_vebox = (ret == 0) & (*gp.value > 0);
 
+	gp.param = I915_PARAM_MMAP_VERSION;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_ext_mmap = (ret == 0) & (*gp.value > 0);
+
 	if (bufmgr_gem->gen < 4) {
 		gp.param = I915_PARAM_NUM_FENCES_AVAIL;
 		gp.value = &bufmgr_gem->available_fences;
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2016-03-15  8:40 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-13 13:20 [PATCH] drm/i915: Broaden application of set-domain(GTT) Chris Wilson
2014-10-14 12:47 ` Chris Wilson
2014-10-23 10:33   ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object akash.goel
2014-10-23 10:37     ` [PATCH] intel: New libdrm interface to create uncached CPU mapping akash.goel
2014-10-23 10:54     ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object Chris Wilson
2014-10-23 11:03     ` Chris Wilson
2014-10-23 11:56     ` Chris Wilson
2014-10-23 13:23       ` Chris Wilson
2014-12-10  4:48         ` Chad Versace
2014-12-10  8:02           ` Chris Wilson
2014-10-23 16:55     ` [PATCH] drm/i915: Support creation of unbound wc user mappings for objects Chris Wilson
2014-10-23 19:05       ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
2014-10-23 19:05         ` [PATCH 2/3] lib/core: Check for kernel error messages and WARN if any are found Chris Wilson
2014-10-23 19:05         ` [PATCH 3/3] reg-read-8 Chris Wilson
2014-10-23 19:12         ` [PATCH 1/3] igt/gem_evict_(alignment|everything): contend with GPU hangs Chris Wilson
2014-10-23 19:11       ` [PATCH 1/3] igt/gem_mmap_wc: Exercise mmap(wc) interface Chris Wilson
2014-10-23 19:11         ` [PATCH 2/3] igt/gem_gtt_speed: compare against WC mmaps Chris Wilson
2014-10-23 19:11         ` [PATCH 3/3] igt/gem_concurrent_blit: Exercise wc mappings Chris Wilson
2014-10-25 11:51       ` [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects akash.goel
2014-10-25 12:45         ` Damien Lespiau
2014-10-26  8:36           ` Akash Goel
2014-10-26  8:41             ` Chris Wilson
2014-10-28 13:09         ` [PATCH v3] " akash.goel
2014-10-28 16:11           ` Damien Lespiau
2014-12-03 14:13           ` Damien Lespiau
2014-12-03 18:18             ` Chris Wilson
2014-12-09 10:54             ` Chris Wilson
2014-12-09 16:53               ` Damien Lespiau
2014-12-14  8:41                 ` [PATCH v4] " akash.goel
2016-03-09  9:09                   ` [PATCH v5] " akash.goel
2016-03-10  8:39                     ` Martin Peres
2016-03-14 16:51                       ` Martin Peres
2016-03-15  8:41                         ` Daniel Vetter
2015-03-03 14:20                 ` [PATCH v3] " Damien Lespiau
2015-03-03 17:05                   ` Chris Wilson
2015-03-03 17:11                     ` Damien Lespiau
2015-03-03 17:13                       ` Chris Wilson
2015-03-03 17:16                         ` Damien Lespiau
2014-11-05 12:48       ` [PATCH] drm/i915: Support creation of " Chris Wilson
2014-11-06 14:50         ` Daniel Vetter
2014-12-17 12:46       ` Tvrtko Ursulin
2014-12-17 12:52         ` Chris Wilson
2014-10-24  8:40     ` [PATCH] drm/i915: Support to create uncached user mapping for a Gem object Daniel Vetter
2014-10-24  9:23       ` Chris Wilson
2014-11-05 12:49   ` [PATCH] drm/i915: Broaden application of set-domain(GTT) Chris Wilson
2015-01-02 11:06 [PATCH] intel: New libdrm interface to create unbound wc user mappings for objects akash.goel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.