All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/6] Support for creating/using Stolen memory backed objects
@ 2015-12-09 12:46 ankitprasad.r.sharma
  2015-12-09 12:46 ` [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT ankitprasad.r.sharma
                   ` (5 more replies)
  0 siblings, 6 replies; 47+ messages in thread
From: ankitprasad.r.sharma @ 2015-12-09 12:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ankitprasad Sharma, akash.goel, shashidhar.hiremath

From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

This patch series adds support for creating/using Stolen memory backed
objects.

Despite being a unified memory architecture (UMA) some bits of memory
are more equal than others. In particular we have the thorny issue of
stolen memory, memory stolen from the system by the BIOS and reserved
for igfx use. Stolen memory is required for some functions of the GPU
and display engine, but in general it goes wasted. Whilst we cannot
return it back to the system, we need to find some other method for
utilising it. As we do not support direct access to the physical address
in the stolen region, it behaves like a different class of memory,
closer in kin to local GPU memory. This strongly suggests that we need a
placement model like TTM if we are to fully utilize these discrete
chunks of differing memory.

To add support for creating Stolen memory backed objects, we extend the
drm_i915_gem_create structure, by adding a new flag through which user
can specify the preference to allocate the object from stolen memory,
which if set, an attempt will be made to allocate the object from stolen
memory subject to the availability of free space in the stolen region.

This patch series adds support for clearing buffer objects via CPU/GTT.
This is particularly useful for clearing out the memory from stolen
region, but can also be used for other shmem allocated objects. Currently
being used for buffers allocated in the stolen region. Also adding support
for stealing purgable stolen pages, if we run out of stolen memory when
trying to allocate an object.

v2: Added support for read/write from/to objects not backed by
shmem using the pread/pwrite interface.
Also extended the current get_aperture ioctl to retrieve the
total and available size of the stolen region.

v3: Removed the extended get_aperture ioctl patch 5 (to be submitted as
part of other patch series), addressed comments by Chris about pread/pwrite
for non shmem backed objects.

v4: Rebased to the latest drm-intel-nightly.

v5: Addressed comments, replaced patch 1/4 "Clearing buffers via blitter
engine" by "Clearing buffers via CPU/GTT".

v6: Rebased to the latest drm-intel-nightly, Addressed comments, updated
stolen memory purging logic by maintaining a list for purgable stolen
memory objects, enabled pread/pwrite for all non-shmem backed objects
without tiling restrictions.

v7: Addressed comments, compiler optimization, new patch added for correct
error code propagation to the userspace.

v8: Added a new patch to the series to Migrate stolen objects before
hibernation, as stolen memory is not preserved across hibernation. Added
correct error propagation for shmem as well non-shmem backed object allocation.

v9: Addressed comments, use of insert_page helper function to map object page
by page which can be helpful in low aperture space availability.

v10: Addressed comments, use insert_page for clearing out the stolen memory
buffer contents to not thrash gtt.

This can be verified using IGT tests: igt/gem_stolen, igt/gem_create

Ankitprasad Sharma (4):
  drm/i915: Clearing buffer objects via CPU/GTT
  drm/i915: Support for creating Stolen memory backed objects
  drm/i915: Propagating correct error codes to the userspace
  drm/i915: Support for pread/pwrite from/to non shmem backed objects

Chris Wilson (2):
  drm/i915: Add support for stealing purgable stolen pages
  drm/i915: Migrate stolen objects before hibernation

 drivers/gpu/drm/i915/i915_debugfs.c          |   6 +-
 drivers/gpu/drm/i915/i915_dma.c              |   3 +
 drivers/gpu/drm/i915/i915_drv.c              |  17 +-
 drivers/gpu/drm/i915/i915_drv.h              |  27 +-
 drivers/gpu/drm/i915/i915_gem.c              | 520 ++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_batch_pool.c   |   4 +-
 drivers/gpu/drm/i915/i915_gem_context.c      |   4 +-
 drivers/gpu/drm/i915/i915_gem_render_state.c |   7 +-
 drivers/gpu/drm/i915/i915_gem_stolen.c       | 211 +++++++++--
 drivers/gpu/drm/i915/i915_guc_submission.c   |  45 ++-
 drivers/gpu/drm/i915/intel_display.c         |   5 +-
 drivers/gpu/drm/i915/intel_fbdev.c           |  12 +-
 drivers/gpu/drm/i915/intel_lrc.c             |  10 +-
 drivers/gpu/drm/i915/intel_overlay.c         |   4 +-
 drivers/gpu/drm/i915/intel_pm.c              |   8 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  27 +-
 include/uapi/drm/i915_drm.h                  |  16 +
 17 files changed, 795 insertions(+), 131 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT
  2015-12-09 12:46 [PATCH v10 0/6] Support for creating/using Stolen memory backed objects ankitprasad.r.sharma
@ 2015-12-09 12:46 ` ankitprasad.r.sharma
  2015-12-09 13:26   ` Dave Gordon
                     ` (3 more replies)
  2015-12-09 12:46 ` [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects ankitprasad.r.sharma
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 47+ messages in thread
From: ankitprasad.r.sharma @ 2015-12-09 12:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ankitprasad Sharma, akash.goel, shashidhar.hiremath

From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

This patch adds support for clearing buffer objects via CPU/GTT. This
is particularly useful for clearing out the non shmem backed objects.
Currently intend to use this only for buffers allocated from stolen
region.

v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
variable assignments (Tvrtko)

v3: Map object page by page to the gtt if the pinning of the whole object
to the ggtt fails, Corrected function name (Chris)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 548a0eb..8e554d3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 				    int *needs_clflush);
 
 int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
+int i915_gem_object_clear(struct drm_i915_gem_object *obj);
 
 static inline int __sg_page_count(struct scatterlist *sg)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9d2e6e3..d57e850 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5244,3 +5244,82 @@ fail:
 	drm_gem_object_unreference(&obj->base);
 	return ERR_PTR(ret);
 }
+
+/**
+ * i915_gem_clear_object() - Clear buffer object via CPU/GTT
+ * @obj: Buffer object to be cleared
+ *
+ * Return: 0 - success, non-zero - failure
+ */
+int i915_gem_object_clear(struct drm_i915_gem_object *obj)
+{
+	int ret, i;
+	char __iomem *base;
+	size_t size = obj->base.size;
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct drm_mm_node node;
+
+	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
+	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
+	if (ret) {
+		memset(&node, 0, sizeof(node));
+		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
+							  &node, 4096, 0,
+							  I915_CACHE_NONE, 0,
+							  i915->gtt.mappable_end,
+							  DRM_MM_SEARCH_DEFAULT,
+							  DRM_MM_CREATE_DEFAULT);
+		if (ret)
+			goto out;
+
+		i915_gem_object_pin_pages(obj);
+	} else {
+		node.start = i915_gem_obj_ggtt_offset(obj);
+		node.allocated = false;
+	}
+
+	ret = i915_gem_object_put_fence(obj);
+	if (ret)
+		goto unpin;
+
+	if (node.allocated) {
+		for (i = 0; i < size/PAGE_SIZE; i++) {
+			wmb();
+			i915->gtt.base.insert_page(&i915->gtt.base,
+					i915_gem_object_get_dma_address(obj, i),
+					node.start,
+					I915_CACHE_NONE,
+					0);
+			wmb();
+			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
+			memset_io(base, 0, 4096);
+			iounmap(base);
+		}
+	} else {
+		/* Get the CPU virtual address of the buffer */
+		base = ioremap_wc(i915->gtt.mappable_base +
+				  node.start, size);
+		if (base == NULL) {
+			DRM_ERROR("Mapping of gem object to CPU failed!\n");
+			ret = -ENOSPC;
+			goto unpin;
+		}
+
+		memset_io(base, 0, size);
+		iounmap(base);
+	}
+unpin:
+	if (node.allocated) {
+		wmb();
+		i915->gtt.base.clear_range(&i915->gtt.base,
+				node.start, node.size,
+				true);
+		drm_mm_remove_node(&node);
+		i915_gem_object_unpin_pages(obj);
+	}
+	else {
+		i915_gem_object_ggtt_unpin(obj);
+	}
+out:
+	return ret;
+}
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects
  2015-12-09 12:46 [PATCH v10 0/6] Support for creating/using Stolen memory backed objects ankitprasad.r.sharma
  2015-12-09 12:46 ` [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT ankitprasad.r.sharma
@ 2015-12-09 12:46 ` ankitprasad.r.sharma
  2015-12-09 14:06   ` Tvrtko Ursulin
  2015-12-09 12:46 ` [PATCH 3/6] drm/i915: Propagating correct error codes to the userspace ankitprasad.r.sharma
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 47+ messages in thread
From: ankitprasad.r.sharma @ 2015-12-09 12:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ankitprasad Sharma, akash.goel, shashidhar.hiremath

From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

Extend the drm_i915_gem_create structure to add support for
creating Stolen memory backed objects. Added a new flag through
which user can specify the preference to allocate the object from
stolen memory, which if set, an attempt will be made to allocate
the object from stolen memory subject to the availability of
free space in the stolen region.

v2: Rebased to the latest drm-intel-nightly (Ankit)

v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko)

v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko)
Corrected function arguments ordering (Chris)

v5: Corrected function name (Chris)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c        |  3 +++
 drivers/gpu/drm/i915/i915_drv.h        |  2 +-
 drivers/gpu/drm/i915/i915_gem.c        | 30 +++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_stolen.c |  4 ++--
 include/uapi/drm/i915_drm.h            | 16 ++++++++++++++++
 5 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index ffcb9c6..6927c7e 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_RESOURCE_STREAMER:
 		value = HAS_RESOURCE_STREAMER(dev);
 		break;
+	case I915_PARAM_CREATE_VERSION:
+		value = 2;
+		break;
 	default:
 		DRM_DEBUG("Unknown parameter %d\n", param->param);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8e554d3..d45274e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3213,7 +3213,7 @@ void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
 int i915_gem_init_stolen(struct drm_device *dev);
 void i915_gem_cleanup_stolen(struct drm_device *dev);
 struct drm_i915_gem_object *
-i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
+i915_gem_object_create_stolen(struct drm_device *dev, u64 size);
 struct drm_i915_gem_object *
 i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 					       u32 stolen_offset,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d57e850..296e63f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -375,6 +375,7 @@ static int
 i915_gem_create(struct drm_file *file,
 		struct drm_device *dev,
 		uint64_t size,
+		uint32_t flags,
 		uint32_t *handle_p)
 {
 	struct drm_i915_gem_object *obj;
@@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file,
 	if (size == 0)
 		return -EINVAL;
 
+	if (flags & __I915_CREATE_UNKNOWN_FLAGS)
+		return -EINVAL;
+
 	/* Allocate the new object */
-	obj = i915_gem_alloc_object(dev, size);
+	if (flags & I915_CREATE_PLACEMENT_STOLEN) {
+		mutex_lock(&dev->struct_mutex);
+		obj = i915_gem_object_create_stolen(dev, size);
+		if (!obj) {
+			mutex_unlock(&dev->struct_mutex);
+			return -ENOMEM;
+		}
+
+		/* Always clear fresh buffers before handing to userspace */
+		ret = i915_gem_object_clear(obj);
+		if (ret) {
+			drm_gem_object_unreference(&obj->base);
+			mutex_unlock(&dev->struct_mutex);
+			return ret;
+		}
+
+		mutex_unlock(&dev->struct_mutex);
+	} else {
+		obj = i915_gem_alloc_object(dev, size);
+	}
+
 	if (obj == NULL)
 		return -ENOMEM;
 
@@ -409,7 +433,7 @@ i915_gem_dumb_create(struct drm_file *file,
 	args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
 	args->size = args->pitch * args->height;
 	return i915_gem_create(file, dev,
-			       args->size, &args->handle);
+			       args->size, 0, &args->handle);
 }
 
 /**
@@ -422,7 +446,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_create *args = data;
 
 	return i915_gem_create(file, dev,
-			       args->size, &args->handle);
+			       args->size, args->flags, &args->handle);
 }
 
 static inline int
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 598ed2f..b98a3bf 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -583,7 +583,7 @@ cleanup:
 }
 
 struct drm_i915_gem_object *
-i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
+i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
@@ -593,7 +593,7 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
 	if (!drm_mm_initialized(&dev_priv->mm.stolen))
 		return NULL;
 
-	DRM_DEBUG_KMS("creating stolen object: size=%x\n", size);
+	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
 	if (size == 0)
 		return NULL;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 67cebe6..8e7e3a4 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -356,6 +356,7 @@ typedef struct drm_i915_irq_wait {
 #define I915_PARAM_EU_TOTAL		 34
 #define I915_PARAM_HAS_GPU_RESET	 35
 #define I915_PARAM_HAS_RESOURCE_STREAMER 36
+#define I915_PARAM_CREATE_VERSION	 37
 
 typedef struct drm_i915_getparam {
 	__s32 param;
@@ -455,6 +456,21 @@ struct drm_i915_gem_create {
 	 */
 	__u32 handle;
 	__u32 pad;
+	/**
+	 * Requested flags (currently used for placement
+	 * (which memory domain))
+	 *
+	 * You can request that the object be created from special memory
+	 * rather than regular system pages using this parameter. Such
+	 * irregular objects may have certain restrictions (such as CPU
+	 * access to a stolen object is verboten).
+	 *
+	 * This can be used in the future for other purposes too
+	 * e.g. specifying tiling/caching/madvise
+	 */
+	__u32 flags;
+#define I915_CREATE_PLACEMENT_STOLEN 	(1<<0) /* Cannot use CPU mmaps */
+#define __I915_CREATE_UNKNOWN_FLAGS	-(I915_CREATE_PLACEMENT_STOLEN << 1)
 };
 
 struct drm_i915_gem_pread {
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 3/6] drm/i915: Propagating correct error codes to the userspace
  2015-12-09 12:46 [PATCH v10 0/6] Support for creating/using Stolen memory backed objects ankitprasad.r.sharma
  2015-12-09 12:46 ` [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT ankitprasad.r.sharma
  2015-12-09 12:46 ` [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects ankitprasad.r.sharma
@ 2015-12-09 12:46 ` ankitprasad.r.sharma
  2015-12-09 15:10   ` Tvrtko Ursulin
  2015-12-09 12:46 ` [PATCH 4/6] drm/i915: Add support for stealing purgable stolen pages ankitprasad.r.sharma
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 47+ messages in thread
From: ankitprasad.r.sharma @ 2015-12-09 12:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ankitprasad Sharma, akash.goel, shashidhar.hiremath

From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

Propagating correct error codes to userspace by using ERR_PTR and
PTR_ERR macros for stolen memory based object allocation. We generally
return -ENOMEM to the user whenever there is a failure in object
allocation. This patch helps user to identify the correct reason for the
failure and not just -ENOMEM each time.

v2: Moved the patch up in the series, added error propagation for
i915_gem_alloc_object too (Chris)

v3: Removed storing of error pointer inside structs, Corrected error
propagation in caller functions (Chris)

v4: Remove assignments inside the predicate (Chris)

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c              | 16 +++++-----
 drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  4 +--
 drivers/gpu/drm/i915/i915_gem_context.c      |  4 +--
 drivers/gpu/drm/i915/i915_gem_render_state.c |  7 +++--
 drivers/gpu/drm/i915/i915_gem_stolen.c       | 43 ++++++++++++++------------
 drivers/gpu/drm/i915/i915_guc_submission.c   | 45 ++++++++++++++++++----------
 drivers/gpu/drm/i915/intel_display.c         |  2 +-
 drivers/gpu/drm/i915/intel_fbdev.c           |  6 ++--
 drivers/gpu/drm/i915/intel_lrc.c             | 10 ++++---
 drivers/gpu/drm/i915/intel_overlay.c         |  4 +--
 drivers/gpu/drm/i915/intel_pm.c              |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 21 ++++++-------
 12 files changed, 95 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 296e63f..5812748 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -393,9 +393,9 @@ i915_gem_create(struct drm_file *file,
 	if (flags & I915_CREATE_PLACEMENT_STOLEN) {
 		mutex_lock(&dev->struct_mutex);
 		obj = i915_gem_object_create_stolen(dev, size);
-		if (!obj) {
+		if (IS_ERR(obj)) {
 			mutex_unlock(&dev->struct_mutex);
-			return -ENOMEM;
+			return PTR_ERR(obj);
 		}
 
 		/* Always clear fresh buffers before handing to userspace */
@@ -411,8 +411,8 @@ i915_gem_create(struct drm_file *file,
 		obj = i915_gem_alloc_object(dev, size);
 	}
 
-	if (obj == NULL)
-		return -ENOMEM;
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
 
 	ret = drm_gem_handle_create(file, &obj->base, &handle);
 	/* drop reference from allocate - handle holds it now */
@@ -4399,14 +4399,16 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 	struct drm_i915_gem_object *obj;
 	struct address_space *mapping;
 	gfp_t mask;
+	int ret;
 
 	obj = i915_gem_object_alloc(dev);
 	if (obj == NULL)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
-	if (drm_gem_object_init(dev, &obj->base, size) != 0) {
+	ret = drm_gem_object_init(dev, &obj->base, size);
+	if (ret) {
 		i915_gem_object_free(obj);
-		return NULL;
+		return ERR_PTR(ret);
 	}
 
 	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
index 7bf2f3f..d79caa2 100644
--- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
+++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
@@ -135,8 +135,8 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
 		int ret;
 
 		obj = i915_gem_alloc_object(pool->dev, size);
-		if (obj == NULL)
-			return ERR_PTR(-ENOMEM);
+		if (IS_ERR(obj))
+			return obj;
 
 		ret = i915_gem_object_get_pages(obj);
 		if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 204dc7c..4d24cfc 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -181,8 +181,8 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
 	int ret;
 
 	obj = i915_gem_alloc_object(dev, size);
-	if (obj == NULL)
-		return ERR_PTR(-ENOMEM);
+	if (IS_ERR(obj))
+		return obj;
 
 	/*
 	 * Try to make the context utilize L3 as well as LLC.
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 5026a62..2bfdd49 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -58,8 +58,11 @@ static int render_state_init(struct render_state *so, struct drm_device *dev)
 		return -EINVAL;
 
 	so->obj = i915_gem_alloc_object(dev, 4096);
-	if (so->obj == NULL)
-		return -ENOMEM;
+	if (IS_ERR(so->obj)) {
+		ret = PTR_ERR(so->obj);
+		so->obj = NULL;
+		return ret;
+	}
 
 	ret = i915_gem_obj_ggtt_pin(so->obj, 4096, 0);
 	if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index b98a3bf..0b0ce11 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -492,6 +492,7 @@ i915_pages_create_for_stolen(struct drm_device *dev,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct sg_table *st;
 	struct scatterlist *sg;
+	int ret;
 
 	DRM_DEBUG_DRIVER("offset=0x%x, size=%d\n", offset, size);
 	BUG_ON(offset > dev_priv->gtt.stolen_size - size);
@@ -503,11 +504,12 @@ i915_pages_create_for_stolen(struct drm_device *dev,
 
 	st = kmalloc(sizeof(*st), GFP_KERNEL);
 	if (st == NULL)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
-	if (sg_alloc_table(st, 1, GFP_KERNEL)) {
+	ret = sg_alloc_table(st, 1, GFP_KERNEL);
+	if (ret) {
 		kfree(st);
-		return NULL;
+		return ERR_PTR(ret);
 	}
 
 	sg = st->sgl;
@@ -556,18 +558,21 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
 			       struct drm_mm_node *stolen)
 {
 	struct drm_i915_gem_object *obj;
+	int ret = 0;
 
 	obj = i915_gem_object_alloc(dev);
 	if (obj == NULL)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	drm_gem_private_object_init(dev, &obj->base, stolen->size);
 	i915_gem_object_init(obj, &i915_gem_object_stolen_ops);
 
 	obj->pages = i915_pages_create_for_stolen(dev,
 						  stolen->start, stolen->size);
-	if (obj->pages == NULL)
+	if (IS_ERR(obj->pages)) {
+		ret = PTR_ERR(obj->pages);
 		goto cleanup;
+	}
 
 	i915_gem_object_pin_pages(obj);
 	obj->stolen = stolen;
@@ -579,7 +584,7 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
 
 cleanup:
 	i915_gem_object_free(obj);
-	return NULL;
+	return ERR_PTR(ret);
 }
 
 struct drm_i915_gem_object *
@@ -591,29 +596,29 @@ i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
 	int ret;
 
 	if (!drm_mm_initialized(&dev_priv->mm.stolen))
-		return NULL;
+		return ERR_PTR(-ENODEV);
 
 	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
 	if (size == 0)
-		return NULL;
+		return ERR_PTR(-EINVAL);
 
 	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
 	if (!stolen)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	ret = i915_gem_stolen_insert_node(dev_priv, stolen, size, 4096);
 	if (ret) {
 		kfree(stolen);
-		return NULL;
+		return ERR_PTR(ret);
 	}
 
 	obj = _i915_gem_object_create_stolen(dev, stolen);
-	if (obj)
+	if (!IS_ERR(obj))
 		return obj;
 
 	i915_gem_stolen_remove_node(dev_priv, stolen);
 	kfree(stolen);
-	return NULL;
+	return obj;
 }
 
 struct drm_i915_gem_object *
@@ -630,7 +635,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	int ret;
 
 	if (!drm_mm_initialized(&dev_priv->mm.stolen))
-		return NULL;
+		return ERR_PTR(-ENODEV);
 
 	DRM_DEBUG_KMS("creating preallocated stolen object: stolen_offset=%x, gtt_offset=%x, size=%x\n",
 			stolen_offset, gtt_offset, size);
@@ -638,11 +643,11 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	/* KISS and expect everything to be page-aligned */
 	if (WARN_ON(size == 0) || WARN_ON(size & 4095) ||
 	    WARN_ON(stolen_offset & 4095))
-		return NULL;
+		return ERR_PTR(-EINVAL);
 
 	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
 	if (!stolen)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	stolen->start = stolen_offset;
 	stolen->size = size;
@@ -652,15 +657,15 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	if (ret) {
 		DRM_DEBUG_KMS("failed to allocate stolen space\n");
 		kfree(stolen);
-		return NULL;
+		return ERR_PTR(ret);
 	}
 
 	obj = _i915_gem_object_create_stolen(dev, stolen);
-	if (obj == NULL) {
+	if (IS_ERR(obj)) {
 		DRM_DEBUG_KMS("failed to allocate stolen object\n");
 		i915_gem_stolen_remove_node(dev_priv, stolen);
 		kfree(stolen);
-		return NULL;
+		return obj;
 	}
 
 	/* Some objects just need physical mem from stolen space */
@@ -698,5 +703,5 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 
 err:
 	drm_gem_object_unreference(&obj->base);
-	return NULL;
+	return ERR_PTR(ret);
 }
diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
index 4ac8867..aa38ae4 100644
--- a/drivers/gpu/drm/i915/i915_guc_submission.c
+++ b/drivers/gpu/drm/i915/i915_guc_submission.c
@@ -645,22 +645,24 @@ int i915_guc_submit(struct i915_guc_client *client,
  * object needs to be pinned lifetime. Also we must pin it to gtt space other
  * than [0, GUC_WOPCM_TOP) because this range is reserved inside GuC.
  *
- * Return:	A drm_i915_gem_object if successful, otherwise NULL.
+ * Return:	A drm_i915_gem_object if successful, otherwise error pointer.
  */
 static struct drm_i915_gem_object *gem_allocate_guc_obj(struct drm_device *dev,
 							u32 size)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
+	int ret;
 
 	obj = i915_gem_alloc_object(dev, size);
-	if (!obj)
-		return NULL;
+	if (IS_ERR(obj))
+		return obj;
 
-	if (i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
-			PIN_OFFSET_BIAS | GUC_WOPCM_TOP)) {
+	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
+				    PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
+	if (ret) {
 		drm_gem_object_unreference(&obj->base);
-		return NULL;
+		return ERR_PTR(ret);
 	}
 
 	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
@@ -738,10 +740,11 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_guc *guc = &dev_priv->guc;
 	struct drm_i915_gem_object *obj;
+	int ret;
 
 	client = kzalloc(sizeof(*client), GFP_KERNEL);
 	if (!client)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	client->doorbell_id = GUC_INVALID_DOORBELL_ID;
 	client->priority = priority;
@@ -752,13 +755,16 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
 			GUC_MAX_GPU_CONTEXTS, GFP_KERNEL);
 	if (client->ctx_index >= GUC_MAX_GPU_CONTEXTS) {
 		client->ctx_index = GUC_INVALID_CTX_ID;
+		ret = -EINVAL;
 		goto err;
 	}
 
 	/* The first page is doorbell/proc_desc. Two followed pages are wq. */
 	obj = gem_allocate_guc_obj(dev, GUC_DB_SIZE + GUC_WQ_SIZE);
-	if (!obj)
+	if (IS_ERR(obj)) {
+		ret = PTR_ERR(obj);
 		goto err;
+	}
 
 	client->client_obj = obj;
 	client->wq_offset = GUC_DB_SIZE;
@@ -778,9 +784,11 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
 		client->proc_desc_offset = (GUC_DB_SIZE / 2);
 
 	client->doorbell_id = assign_doorbell(guc, client->priority);
-	if (client->doorbell_id == GUC_INVALID_DOORBELL_ID)
+	if (client->doorbell_id == GUC_INVALID_DOORBELL_ID) {
 		/* XXX: evict a doorbell instead */
+		ret = -EINVAL;
 		goto err;
+	}
 
 	guc_init_proc_desc(guc, client);
 	guc_init_ctx_desc(guc, client);
@@ -788,7 +796,8 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
 
 	/* XXX: Any cache flushes needed? General domain mgmt calls? */
 
-	if (host2guc_allocate_doorbell(guc, client))
+	ret = host2guc_allocate_doorbell(guc, client);
+	if (ret)
 		goto err;
 
 	DRM_DEBUG_DRIVER("new priority %u client %p: ctx_index %u db_id %u\n",
@@ -800,7 +809,7 @@ err:
 	DRM_ERROR("FAILED to create priority %u GuC client!\n", priority);
 
 	guc_client_free(dev, client);
-	return NULL;
+	return ERR_PTR(ret);
 }
 
 static void guc_create_log(struct intel_guc *guc)
@@ -825,7 +834,7 @@ static void guc_create_log(struct intel_guc *guc)
 	obj = guc->log_obj;
 	if (!obj) {
 		obj = gem_allocate_guc_obj(dev_priv->dev, size);
-		if (!obj) {
+		if (IS_ERR(obj)) {
 			/* logging will be off */
 			i915.guc_log_level = -1;
 			return;
@@ -855,6 +864,7 @@ int i915_guc_submission_init(struct drm_device *dev)
 	const size_t poolsize = GUC_MAX_GPU_CONTEXTS * ctxsize;
 	const size_t gemsize = round_up(poolsize, PAGE_SIZE);
 	struct intel_guc *guc = &dev_priv->guc;
+	int ret = 0;
 
 	if (!i915.enable_guc_submission)
 		return 0; /* not enabled  */
@@ -863,8 +873,11 @@ int i915_guc_submission_init(struct drm_device *dev)
 		return 0; /* already allocated */
 
 	guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv->dev, gemsize);
-	if (!guc->ctx_pool_obj)
-		return -ENOMEM;
+	if (IS_ERR(guc->ctx_pool_obj)) {
+		ret = PTR_ERR(guc->ctx_pool_obj);
+		guc->ctx_pool_obj = NULL;
+		return ret;
+	}
 
 	spin_lock_init(&dev_priv->guc.host2guc_lock);
 
@@ -884,9 +897,9 @@ int i915_guc_submission_enable(struct drm_device *dev)
 
 	/* client for execbuf submission */
 	client = guc_client_alloc(dev, GUC_CTX_PRIORITY_KMD_NORMAL, ctx);
-	if (!client) {
+	if (IS_ERR(client)) {
 		DRM_ERROR("Failed to create execbuf guc_client\n");
-		return -ENOMEM;
+		return PTR_ERR(client);
 	}
 
 	guc->execbuf_client = client;
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 77979ed..f281e0b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2546,7 +2546,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 							     base_aligned,
 							     base_aligned,
 							     size_aligned);
-	if (!obj)
+	if (IS_ERR(obj))
 		return false;
 
 	obj->tiling_mode = plane_config->tiling;
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 840d6bf..f43681e 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -146,11 +146,11 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
 	 * features. */
 	if (size * 2 < dev_priv->gtt.stolen_usable_size)
 		obj = i915_gem_object_create_stolen(dev, size);
-	if (obj == NULL)
+	if (IS_ERR_OR_NULL(obj))
 		obj = i915_gem_alloc_object(dev, size);
-	if (!obj) {
+	if (IS_ERR(obj)) {
 		DRM_ERROR("failed to allocate framebuffer\n");
-		ret = -ENOMEM;
+		ret = PTR_ERR(obj);
 		goto out;
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 06180dc..4539cc6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1364,9 +1364,11 @@ static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
 	int ret;
 
 	ring->wa_ctx.obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
-	if (!ring->wa_ctx.obj) {
+	if (IS_ERR(ring->wa_ctx.obj)) {
 		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
-		return -ENOMEM;
+		ret = PTR_ERR(ring->wa_ctx.obj);
+		ring->wa_ctx.obj = NULL;
+		return ret;
 	}
 
 	ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, PAGE_SIZE, 0);
@@ -2471,9 +2473,9 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 	context_size += PAGE_SIZE * LRC_PPHWSP_PN;
 
 	ctx_obj = i915_gem_alloc_object(dev, context_size);
-	if (!ctx_obj) {
+	if (IS_ERR(ctx_obj)) {
 		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed.\n");
-		return -ENOMEM;
+		return PTR_ERR(ctx_obj);
 	}
 
 	ringbuf = intel_engine_create_ringbuffer(ring, 4 * PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
index 76f1980..3a65858 100644
--- a/drivers/gpu/drm/i915/intel_overlay.c
+++ b/drivers/gpu/drm/i915/intel_overlay.c
@@ -1392,9 +1392,9 @@ void intel_setup_overlay(struct drm_device *dev)
 	reg_bo = NULL;
 	if (!OVERLAY_NEEDS_PHYSICAL(dev))
 		reg_bo = i915_gem_object_create_stolen(dev, PAGE_SIZE);
-	if (reg_bo == NULL)
+	if (IS_ERR_OR_NULL(reg_bo))
 		reg_bo = i915_gem_alloc_object(dev, PAGE_SIZE);
-	if (reg_bo == NULL)
+	if (IS_ERR(reg_bo))
 		goto out_free;
 	overlay->reg_bo = reg_bo;
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 647c0ff..6dee908 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5172,7 +5172,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
 	 * memory, or any other relevant ranges.
 	 */
 	pctx = i915_gem_object_create_stolen(dev, pctx_size);
-	if (!pctx) {
+	if (IS_ERR(pctx)) {
 		DRM_DEBUG("not enough stolen space for PCTX, disabling\n");
 		return;
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c9b081f..5eabaf6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -678,9 +678,10 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
 	WARN_ON(ring->scratch.obj);
 
 	ring->scratch.obj = i915_gem_alloc_object(ring->dev, 4096);
-	if (ring->scratch.obj == NULL) {
+	if (IS_ERR(ring->scratch.obj)) {
 		DRM_ERROR("Failed to allocate seqno page\n");
-		ret = -ENOMEM;
+		ret = PTR_ERR(ring->scratch.obj);
+		ring->scratch.obj = NULL;
 		goto err;
 	}
 
@@ -1935,9 +1936,9 @@ static int init_status_page(struct intel_engine_cs *ring)
 		int ret;
 
 		obj = i915_gem_alloc_object(ring->dev, 4096);
-		if (obj == NULL) {
+		if (IS_ERR(obj)) {
 			DRM_ERROR("Failed to allocate status page\n");
-			return -ENOMEM;
+			return PTR_ERR(obj);
 		}
 
 		ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
@@ -2084,10 +2085,10 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 	obj = NULL;
 	if (!HAS_LLC(dev))
 		obj = i915_gem_object_create_stolen(dev, ringbuf->size);
-	if (obj == NULL)
+	if (IS_ERR_OR_NULL(obj))
 		obj = i915_gem_alloc_object(dev, ringbuf->size);
-	if (obj == NULL)
-		return -ENOMEM;
+	if (IS_ERR(obj))
+		return PTR_ERR(obj);
 
 	/* mark ring buffers as read-only from GPU side by default */
 	obj->gt_ro = 1;
@@ -2678,7 +2679,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	if (INTEL_INFO(dev)->gen >= 8) {
 		if (i915_semaphore_is_enabled(dev)) {
 			obj = i915_gem_alloc_object(dev, 4096);
-			if (obj == NULL) {
+			if (IS_ERR(obj)) {
 				DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
 				i915.semaphores = 0;
 			} else {
@@ -2785,9 +2786,9 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
 	/* Workaround batchbuffer to combat CS tlb bug. */
 	if (HAS_BROKEN_CS_TLB(dev)) {
 		obj = i915_gem_alloc_object(dev, I830_WA_SIZE);
-		if (obj == NULL) {
+		if (IS_ERR(obj)) {
 			DRM_ERROR("Failed to allocate batch bo\n");
-			return -ENOMEM;
+			return PTR_ERR(obj);
 		}
 
 		ret = i915_gem_obj_ggtt_pin(obj, 0, 0);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 4/6] drm/i915: Add support for stealing purgable stolen pages
  2015-12-09 12:46 [PATCH v10 0/6] Support for creating/using Stolen memory backed objects ankitprasad.r.sharma
                   ` (2 preceding siblings ...)
  2015-12-09 12:46 ` [PATCH 3/6] drm/i915: Propagating correct error codes to the userspace ankitprasad.r.sharma
@ 2015-12-09 12:46 ` ankitprasad.r.sharma
  2015-12-09 15:40   ` Tvrtko Ursulin
  2015-12-09 12:46 ` [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects ankitprasad.r.sharma
  2015-12-09 12:46 ` [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation ankitprasad.r.sharma
  5 siblings, 1 reply; 47+ messages in thread
From: ankitprasad.r.sharma @ 2015-12-09 12:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ankitprasad Sharma, akash.goel, shashidhar.hiremath

From: Chris Wilson <chris at chris-wilson.co.uk>

If we run out of stolen memory when trying to allocate an object, see if
we can reap enough purgeable objects to free up enough contiguous free
space for the allocation. This is in principle very much like evicting
objects to free up enough contiguous space in the vma when binding
a new object - and you will be forgiven for thinking that the code looks
very similar.

At the moment, we do not allow userspace to allocate objects in stolen,
so there is neither the memory pressure to trigger stolen eviction nor
any purgeable objects inside the stolen arena. However, this will change
in the near future, and so better management and defragmentation of
stolen memory will become a real issue.

v2: Remember to remove the drm_mm_node.

v3: Rebased to the latest drm-intel-nightly (Ankit)

v4: corrected if-else braces format (Tvrtko/kerneldoc)

v5: Rebased to the latest drm-intel-nightly (Ankit)
Added a seperate list to maintain purgable objects from stolen memory
region (Chris/Daniel)

v6: Compiler optimization (merging 2 single loops into one for() loop),
corrected code for object eviction, retire_requests before starting
object eviction (Chris)

v7: Added kernel doc for i915_gem_object_create_stolen()

v8: Check for struct_mutex lock before creating object from stolen
region (Tvrtko)

v9: Renamed variables to make usage clear, added comment, removed onetime
used macro (Tvrtko)

Testcase: igt/gem_stolen

Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c    |   6 +-
 drivers/gpu/drm/i915/i915_drv.h        |  17 +++-
 drivers/gpu/drm/i915/i915_gem.c        |  16 ++++
 drivers/gpu/drm/i915/i915_gem_stolen.c | 170 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_pm.c        |   4 +-
 5 files changed, 188 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 5659d4c..89b0fec 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -174,7 +174,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 			seq_puts(m, ")");
 	}
 	if (obj->stolen)
-		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
+		seq_printf(m, " (stolen: %08llx)", obj->stolen->base.start);
 	if (obj->pin_display || obj->fault_mappable) {
 		char s[3], *t = s;
 		if (obj->pin_display)
@@ -253,9 +253,9 @@ static int obj_rank_by_stolen(void *priv,
 	struct drm_i915_gem_object *b =
 		container_of(B, struct drm_i915_gem_object, obj_exec_link);
 
-	if (a->stolen->start < b->stolen->start)
+	if (a->stolen->base.start < b->stolen->base.start)
 		return -1;
-	if (a->stolen->start > b->stolen->start)
+	if (a->stolen->base.start > b->stolen->base.start)
 		return 1;
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d45274e..e0b09b0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -841,6 +841,12 @@ struct i915_ctx_hang_stats {
 	bool banned;
 };
 
+struct i915_stolen_node {
+	struct drm_mm_node base;
+	struct list_head mm_link;
+	struct drm_i915_gem_object *obj;
+};
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -1252,6 +1258,13 @@ struct i915_gem_mm {
 	 */
 	struct list_head unbound_list;
 
+	/**
+	 * List of stolen objects that have been marked as purgeable and
+	 * thus available for reaping if we need more space for a new
+	 * allocation. Ordered by time of marking purgeable.
+	 */
+	struct list_head stolen_list;
+
 	/** Usable portion of the GTT for GEM */
 	unsigned long stolen_base; /* limited to low memory (32-bit) */
 
@@ -2032,7 +2045,7 @@ struct drm_i915_gem_object {
 	struct list_head vma_list;
 
 	/** Stolen memory for this object, instead of being backed by shmem. */
-	struct drm_mm_node *stolen;
+	struct i915_stolen_node *stolen;
 	struct list_head global_list;
 
 	struct list_head ring_list[I915_NUM_RINGS];
@@ -2040,6 +2053,8 @@ struct drm_i915_gem_object {
 	struct list_head obj_exec_link;
 
 	struct list_head batch_pool_link;
+	/** Used during stolen memory allocations to temporarily hold a ref */
+	struct list_head stolen_link;
 
 	/**
 	 * This is set if the object is on the active lists (has pending
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5812748..ed97de6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4359,6 +4359,20 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
 	if (obj->madv == I915_MADV_DONTNEED && obj->pages == NULL)
 		i915_gem_object_truncate(obj);
 
+	if (obj->stolen) {
+		switch (obj->madv) {
+		case I915_MADV_WILLNEED:
+			list_del_init(&obj->stolen->mm_link);
+			break;
+		case I915_MADV_DONTNEED:
+			list_move(&obj->stolen->mm_link,
+				  &dev_priv->mm.stolen_list);
+			break;
+		default:
+			break;
+		}
+	}
+
 	args->retained = obj->madv != __I915_MADV_PURGED;
 
 out:
@@ -4379,6 +4393,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
 	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 	INIT_LIST_HEAD(&obj->batch_pool_link);
+	INIT_LIST_HEAD(&obj->stolen_link);
 
 	obj->ops = ops;
 
@@ -4997,6 +5012,7 @@ i915_gem_load(struct drm_device *dev)
 	INIT_LIST_HEAD(&dev_priv->context_list);
 	INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
 	INIT_LIST_HEAD(&dev_priv->mm.bound_list);
+	INIT_LIST_HEAD(&dev_priv->mm.stolen_list);
 	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
 	for (i = 0; i < I915_NUM_RINGS; i++)
 		init_ring_lists(&dev_priv->ring[i]);
diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index 0b0ce11..9d6ac67 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -542,7 +542,8 @@ i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
 	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
 
 	if (obj->stolen) {
-		i915_gem_stolen_remove_node(dev_priv, obj->stolen);
+		list_del(&obj->stolen->mm_link);
+		i915_gem_stolen_remove_node(dev_priv, &obj->stolen->base);
 		kfree(obj->stolen);
 		obj->stolen = NULL;
 	}
@@ -555,7 +556,7 @@ static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {
 
 static struct drm_i915_gem_object *
 _i915_gem_object_create_stolen(struct drm_device *dev,
-			       struct drm_mm_node *stolen)
+			       struct i915_stolen_node *stolen)
 {
 	struct drm_i915_gem_object *obj;
 	int ret = 0;
@@ -564,11 +565,12 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
 	if (obj == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	drm_gem_private_object_init(dev, &obj->base, stolen->size);
+	drm_gem_private_object_init(dev, &obj->base, stolen->base.size);
 	i915_gem_object_init(obj, &i915_gem_object_stolen_ops);
 
 	obj->pages = i915_pages_create_for_stolen(dev,
-						  stolen->start, stolen->size);
+						  stolen->base.start,
+						  stolen->base.size);
 	if (IS_ERR(obj->pages)) {
 		ret = PTR_ERR(obj->pages);
 		goto cleanup;
@@ -577,6 +579,9 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
 	i915_gem_object_pin_pages(obj);
 	obj->stolen = stolen;
 
+	stolen->obj = obj;
+	INIT_LIST_HEAD(&stolen->mm_link);
+
 	obj->base.read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
 	obj->cache_level = HAS_LLC(dev) ? I915_CACHE_LLC : I915_CACHE_NONE;
 
@@ -587,18 +592,102 @@ cleanup:
 	return ERR_PTR(ret);
 }
 
-struct drm_i915_gem_object *
-i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
+static bool
+mark_free(struct drm_i915_gem_object *obj, struct list_head *unwind)
+{
+	BUG_ON(obj->stolen == NULL);
+
+	if (obj->madv != I915_MADV_DONTNEED)
+		return false;
+
+	if (obj->pin_display)
+		return false;
+
+	list_add(&obj->stolen_link, unwind);
+	return drm_mm_scan_add_block(&obj->stolen->base);
+}
+
+static int
+stolen_evict(struct drm_i915_private *dev_priv, u64 size)
 {
-	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
-	struct drm_mm_node *stolen;
-	int ret;
+	struct list_head unwind, evict;
+	struct i915_stolen_node *iter;
+	int ret, active;
 
-	if (!drm_mm_initialized(&dev_priv->mm.stolen))
-		return ERR_PTR(-ENODEV);
+	drm_mm_init_scan(&dev_priv->mm.stolen, size, 0, 0);
+	INIT_LIST_HEAD(&unwind);
+
+	/* Retire all requests before creating the evict list */
+	i915_gem_retire_requests(dev_priv->dev);
+
+	for (active = 0; active <= 1; active++) {
+		list_for_each_entry(iter, &dev_priv->mm.stolen_list, mm_link) {
+			if (iter->obj->active != active)
+				continue;
+
+			if (mark_free(iter->obj, &unwind))
+				goto found;
+		}
+	}
+
+found:
+	INIT_LIST_HEAD(&evict);
+	while (!list_empty(&unwind)) {
+		obj = list_first_entry(&unwind,
+				       struct drm_i915_gem_object,
+				       stolen_link);
+		list_del(&obj->stolen_link);
+
+		if (drm_mm_scan_remove_block(&obj->stolen->base)) {
+			list_add(&obj->stolen_link, &evict);
+			drm_gem_object_reference(&obj->base);
+		}
+	}
+
+	ret = 0;
+	while (!list_empty(&evict)) {
+		obj = list_first_entry(&evict,
+				       struct drm_i915_gem_object,
+				       stolen_link);
+		list_del(&obj->stolen_link);
+
+		if (ret == 0) {
+			struct i915_vma *vma, *vma_next;
+
+			list_for_each_entry_safe(vma, vma_next,
+						 &obj->vma_list,
+						 vma_link)
+				if (i915_vma_unbind(vma))
+					break;
+
+			/* Stolen pins its pages to prevent the
+			 * normal shrinker from processing stolen
+			 * objects.
+			 */
+			i915_gem_object_unpin_pages(obj);
+
+			ret = i915_gem_object_put_pages(obj);
+			if (ret == 0) {
+				i915_gem_object_release_stolen(obj);
+				obj->madv = __I915_MADV_PURGED;
+			} else {
+				i915_gem_object_pin_pages(obj);
+			}
+		}
+
+		drm_gem_object_unreference(&obj->base);
+	}
+
+	return ret;
+}
+
+static struct i915_stolen_node *
+stolen_alloc(struct drm_i915_private *dev_priv, u64 size)
+{
+	struct i915_stolen_node *stolen;
+	int ret;
 
-	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
 	if (size == 0)
 		return ERR_PTR(-EINVAL);
 
@@ -606,17 +695,60 @@ i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
 	if (!stolen)
 		return ERR_PTR(-ENOMEM);
 
-	ret = i915_gem_stolen_insert_node(dev_priv, stolen, size, 4096);
+	ret = i915_gem_stolen_insert_node(dev_priv, &stolen->base, size, 4096);
+	if (ret == 0)
+		goto out;
+
+	/* No more stolen memory available, or too fragmented.
+	 * Try evicting purgeable objects and search again.
+	 */
+	ret = stolen_evict(dev_priv, size);
+	if (ret == 0)
+		ret = i915_gem_stolen_insert_node(dev_priv, &stolen->base,
+						  size, 4096);
+out:
 	if (ret) {
 		kfree(stolen);
 		return ERR_PTR(ret);
 	}
 
+	return stolen;
+}
+
+/**
+ * i915_gem_object_create_stolen() - creates object using the stolen memory
+ * @dev:	drm device
+ * @size:	size of the object requested
+ *
+ * i915_gem_object_create_stolen() tries to allocate memory for the object
+ * from the stolen memory region. If not enough memory is found, it tries
+ * evicting purgeable objects and searching again.
+ *
+ * Returns: Object pointer - success and error pointer - failure
+ */
+struct drm_i915_gem_object *
+i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_object *obj;
+	struct i915_stolen_node *stolen;
+
+	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	if (!drm_mm_initialized(&dev_priv->mm.stolen))
+		return ERR_PTR(-ENODEV);
+
+	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
+
+	stolen = stolen_alloc(dev_priv, size);
+	if (IS_ERR(stolen))
+		return ERR_PTR(-ENOMEM);
+
 	obj = _i915_gem_object_create_stolen(dev, stolen);
 	if (!IS_ERR(obj))
 		return obj;
 
-	i915_gem_stolen_remove_node(dev_priv, stolen);
+	i915_gem_stolen_remove_node(dev_priv, &stolen->base);
 	kfree(stolen);
 	return obj;
 }
@@ -630,7 +762,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct i915_address_space *ggtt = &dev_priv->gtt.base;
 	struct drm_i915_gem_object *obj;
-	struct drm_mm_node *stolen;
+	struct i915_stolen_node *stolen;
 	struct i915_vma *vma;
 	int ret;
 
@@ -649,10 +781,10 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	if (!stolen)
 		return ERR_PTR(-ENOMEM);
 
-	stolen->start = stolen_offset;
-	stolen->size = size;
+	stolen->base.start = stolen_offset;
+	stolen->base.size = size;
 	mutex_lock(&dev_priv->mm.stolen_lock);
-	ret = drm_mm_reserve_node(&dev_priv->mm.stolen, stolen);
+	ret = drm_mm_reserve_node(&dev_priv->mm.stolen, &stolen->base);
 	mutex_unlock(&dev_priv->mm.stolen_lock);
 	if (ret) {
 		DRM_DEBUG_KMS("failed to allocate stolen space\n");
@@ -663,7 +795,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	obj = _i915_gem_object_create_stolen(dev, stolen);
 	if (IS_ERR(obj)) {
 		DRM_DEBUG_KMS("failed to allocate stolen object\n");
-		i915_gem_stolen_remove_node(dev_priv, stolen);
+		i915_gem_stolen_remove_node(dev_priv, &stolen->base);
 		kfree(stolen);
 		return obj;
 	}
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 6dee908..03ad276 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5103,7 +5103,7 @@ static void valleyview_check_pctx(struct drm_i915_private *dev_priv)
 	unsigned long pctx_addr = I915_READ(VLV_PCBR) & ~4095;
 
 	WARN_ON(pctx_addr != dev_priv->mm.stolen_base +
-			     dev_priv->vlv_pctx->stolen->start);
+			     dev_priv->vlv_pctx->stolen->base.start);
 }
 
 
@@ -5177,7 +5177,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
 		return;
 	}
 
-	pctx_paddr = dev_priv->mm.stolen_base + pctx->stolen->start;
+	pctx_paddr = dev_priv->mm.stolen_base + pctx->stolen->base.start;
 	I915_WRITE(VLV_PCBR, pctx_paddr);
 
 out:
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-09 12:46 [PATCH v10 0/6] Support for creating/using Stolen memory backed objects ankitprasad.r.sharma
                   ` (3 preceding siblings ...)
  2015-12-09 12:46 ` [PATCH 4/6] drm/i915: Add support for stealing purgable stolen pages ankitprasad.r.sharma
@ 2015-12-09 12:46 ` ankitprasad.r.sharma
  2015-12-09 16:15   ` Tvrtko Ursulin
  2015-12-09 12:46 ` [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation ankitprasad.r.sharma
  5 siblings, 1 reply; 47+ messages in thread
From: ankitprasad.r.sharma @ 2015-12-09 12:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ankitprasad Sharma, akash.goel, shashidhar.hiremath

From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.

v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)

v3: Rebased to the latest drm-intel-nightly (Ankit)

v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)

v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)

v6: Using pwrite_fast for non-shmem backed objects as well (Chris)

v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)

v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 151 +++++++++++++++++++++++++++++++++-------
 1 file changed, 127 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ed97de6..68ed67a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
 	return ret ? - EFAULT : 0;
 }
 
+static inline uint64_t
+slow_user_access(struct io_mapping *mapping,
+		 uint64_t page_base, int page_offset,
+		 char __user *user_data,
+		 int length, bool pwrite)
+{
+	void __iomem *vaddr_inatomic;
+	void *vaddr;
+	uint64_t unwritten;
+
+	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
+	/* We can use the cpu mem copy function because this is X86. */
+	vaddr = (void __force *)vaddr_inatomic + page_offset;
+	if (pwrite)
+		unwritten = __copy_from_user(vaddr, user_data, length);
+	else
+		unwritten = __copy_to_user(user_data, vaddr, length);
+
+	io_mapping_unmap(vaddr_inatomic);
+	return unwritten;
+}
+
+static int
+i915_gem_gtt_copy(struct drm_device *dev,
+		   struct drm_i915_gem_object *obj, uint64_t size,
+		   uint64_t data_offset, uint64_t data_ptr)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	char __user *user_data;
+	uint64_t remain;
+	uint64_t offset, page_base;
+	int page_offset, page_length, ret = 0;
+
+	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
+	if (ret)
+		goto out;
+
+	ret = i915_gem_object_set_to_gtt_domain(obj, false);
+	if (ret)
+		goto out_unpin;
+
+	ret = i915_gem_object_put_fence(obj);
+	if (ret)
+		goto out_unpin;
+
+	user_data = to_user_ptr(data_ptr);
+	remain = size;
+	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
+
+	mutex_unlock(&dev->struct_mutex);
+	if (likely(!i915.prefault_disable))
+		ret = fault_in_multipages_writeable(user_data, remain);
+
+	/*
+	 * page_offset = offset within page
+	 * page_base = page offset within aperture
+	 */
+	page_offset = offset_in_page(offset);
+	page_base = offset & PAGE_MASK;
+
+	while (remain > 0) {
+		/* page_length = bytes to copy for this page */
+		page_length = remain;
+		if ((page_offset + remain) > PAGE_SIZE)
+			page_length = PAGE_SIZE - page_offset;
+
+		/* This is a slow read/write as it tries to read from
+		 * and write to user memory which may result into page
+		 * faults
+		 */
+		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
+				       page_offset, user_data,
+				       page_length, false);
+
+		if (ret) {
+			ret = -EFAULT;
+			break;
+		}
+
+		remain -= page_length;
+		user_data += page_length;
+		page_base += page_length;
+		page_offset = 0;
+	}
+
+	mutex_lock(&dev->struct_mutex);
+
+out_unpin:
+	i915_gem_object_ggtt_unpin(obj);
+out:
+	return ret;
+}
+
 static int
 i915_gem_shmem_pread(struct drm_device *dev,
 		     struct drm_i915_gem_object *obj,
@@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	/* prime objects have no backing filp to GEM pread/pwrite
-	 * pages from.
-	 */
-	if (!obj->base.filp) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	trace_i915_gem_object_pread(obj, args->offset, args->size);
 
-	ret = i915_gem_shmem_pread(dev, obj, args, file);
+	/* pread for non shmem backed objects */
+	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
+		ret = i915_gem_gtt_copy(dev, obj, args->size,
+					args->offset, args->data_ptr);
+	else
+		ret = i915_gem_shmem_pread(dev, obj, args, file);
 
 out:
 	drm_gem_object_unreference(&obj->base);
@@ -789,10 +879,12 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 			 struct drm_i915_gem_pwrite *args,
 			 struct drm_file *file)
 {
+	struct drm_device *dev = obj->base.dev;
 	struct drm_mm_node node;
 	uint64_t remain, offset;
 	char __user *user_data;
 	int ret;
+	bool faulted = false;
 
 	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
 	if (ret) {
@@ -851,11 +943,29 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 		/* If we get a fault while copying data, then (presumably) our
 		 * source page isn't available.  Return the error and we'll
 		 * retry in the slow path.
+		 * If the object is non-shmem backed, we retry again with the
+		 * path that handles page fault.
 		 */
-		if (fast_user_write(i915->gtt.mappable, page_base,
-				    page_offset, user_data, page_length)) {
-			ret = -EFAULT;
-			goto out_flush;
+		if (faulted || fast_user_write(i915->gtt.mappable,
+						page_base, page_offset,
+						user_data, page_length)) {
+			if (!obj->base.filp) {
+				faulted = true;
+				mutex_unlock(&dev->struct_mutex);
+				if (slow_user_access(i915->gtt.mappable,
+						     page_base,
+						     page_offset, user_data,
+						     page_length, true)) {
+					ret = -EFAULT;
+					mutex_lock(&dev->struct_mutex);
+					goto out_flush;
+				}
+
+				mutex_lock(&dev->struct_mutex);
+			} else {
+				ret = -EFAULT;
+				goto out_flush;
+			}
 		}
 
 		remain -= page_length;
@@ -1121,14 +1231,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	/* prime objects have no backing filp to GEM pread/pwrite
-	 * pages from.
-	 */
-	if (!obj->base.filp) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
 
 	ret = -EFAULT;
@@ -1139,8 +1241,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	 * perspective, requiring manual detiling by the client.
 	 */
 	if (obj->tiling_mode == I915_TILING_NONE &&
-	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
-	    cpu_write_needs_clflush(obj)) {
+	    (!obj->base.filp ||
+	    (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
+	    cpu_write_needs_clflush(obj)))) {
 		ret = i915_gem_gtt_pwrite_fast(dev_priv, obj, args, file);
 		/* Note that the gtt paths might fail with non-page-backed user
 		 * pointers (e.g. gtt mappings when moving data between
@@ -1150,7 +1253,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	if (ret == -EFAULT || ret == -ENOSPC) {
 		if (obj->phys_handle)
 			ret = i915_gem_phys_pwrite(obj, args, file);
-		else
+		else if (obj->base.filp)
 			ret = i915_gem_shmem_pwrite(dev, obj, args, file);
 	}
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-09 12:46 [PATCH v10 0/6] Support for creating/using Stolen memory backed objects ankitprasad.r.sharma
                   ` (4 preceding siblings ...)
  2015-12-09 12:46 ` [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects ankitprasad.r.sharma
@ 2015-12-09 12:46 ` ankitprasad.r.sharma
  2015-12-09 17:25   ` Tvrtko Ursulin
                     ` (2 more replies)
  5 siblings, 3 replies; 47+ messages in thread
From: ankitprasad.r.sharma @ 2015-12-09 12:46 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ankitprasad Sharma, akash.goel, shashidhar.hiremath

From: Chris Wilson <chris@chris-wilson.co.uk>

Ville reminded us that stolen memory is not preserved across
hibernation, and a result of this was that context objects now being
allocated from stolen were being corrupted on S4 and promptly hanging
the GPU on resume.

We want to utilise stolen for as much as possible (nothing else will use
that wasted memory otherwise), so we need a strategy for handling
general objects allocated from stolen and hibernation. A simple solution
is to do a CPU copy through the GTT of the stolen object into a fresh
shmemfs backing store and thenceforth treat it as a normal objects. This
can be refined in future to either use a GPU copy to avoid the slow
uncached reads (though it's hibernation!) and recreate stolen objects
upon resume/first-use. For now, a simple approach should suffice for
testing the object migration.

v2:
Swap PTE for pinned bindings over to the shmemfs. This adds a
complicated dance, but is required as many stolen objects are likely to
be pinned for use by the hardware. Swapping the PTEs should not result
in externally visible behaviour, as each PTE update should be atomic and
the two pages identical. (danvet)

safe-by-default, or the principle of least surprise. We need a new flag
to mark objects that we can wilfully discard and recreate across
hibernation. (danvet)

Just use the global_list rather than invent a new stolen_list. This is
the slowpath hibernate and so adding a new list and the associated
complexity isn't worth it.

v3: Rebased on drm-intel-nightly (Ankit)

v4: Use insert_page to map stolen memory backed pages for migration to
shmem (Chris)

v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)

Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
 drivers/gpu/drm/i915/i915_drv.h         |   7 +
 drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_display.c    |   3 +
 drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
 drivers/gpu/drm/i915/intel_pm.c         |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
 7 files changed, 261 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 9f55209..2bb9e9e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
 	return i915_drm_suspend(drm_dev);
 }
 
+static int i915_pm_freeze(struct device *dev)
+{
+	int ret;
+
+	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
+	if (ret)
+		return ret;
+
+	ret = i915_pm_suspend(dev);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
 static int i915_pm_suspend_late(struct device *dev)
 {
 	struct drm_device *drm_dev = dev_to_i915(dev)->dev;
@@ -1700,7 +1715,7 @@ static const struct dev_pm_ops i915_pm_ops = {
 	 * @restore, @restore_early : called after rebooting and restoring the
 	 *                            hibernation image [PMSG_RESTORE]
 	 */
-	.freeze = i915_pm_suspend,
+	.freeze = i915_pm_freeze,
 	.freeze_late = i915_pm_suspend_late,
 	.thaw_early = i915_pm_resume_early,
 	.thaw = i915_pm_resume,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e0b09b0..0d18b07 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2080,6 +2080,12 @@ struct drm_i915_gem_object {
 	 * Advice: are the backing pages purgeable?
 	 */
 	unsigned int madv:2;
+	/**
+	 * Whereas madv is for userspace, there are certain situations
+	 * where we want I915_MADV_DONTNEED behaviour on internal objects
+	 * without conflating the userspace setting.
+	 */
+	unsigned int internal_volatile:1;
 
 	/**
 	 * Current tiling mode for the object.
@@ -3006,6 +3012,7 @@ int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice);
 void i915_gem_init_swizzling(struct drm_device *dev);
 void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
 int __must_check i915_gpu_idle(struct drm_device *dev);
+int __must_check i915_gem_freeze(struct drm_device *dev);
 int __must_check i915_gem_suspend(struct drm_device *dev);
 void __i915_add_request(struct drm_i915_gem_request *req,
 			struct drm_i915_gem_object *batch_obj,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 68ed67a..1f134b0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4511,12 +4511,27 @@ static const struct drm_i915_gem_object_ops i915_gem_object_ops = {
 	.put_pages = i915_gem_object_put_pages_gtt,
 };
 
+static struct address_space *
+i915_gem_set_inode_gfp(struct drm_device *dev, struct file *file)
+{
+	struct address_space *mapping = file_inode(file)->i_mapping;
+	gfp_t mask;
+
+	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
+	if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
+		/* 965gm cannot relocate objects above 4GiB. */
+		mask &= ~__GFP_HIGHMEM;
+		mask |= __GFP_DMA32;
+	}
+	mapping_set_gfp_mask(mapping, mask);
+
+	return mapping;
+}
+
 struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 						  size_t size)
 {
 	struct drm_i915_gem_object *obj;
-	struct address_space *mapping;
-	gfp_t mask;
 	int ret;
 
 	obj = i915_gem_object_alloc(dev);
@@ -4529,15 +4544,7 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
 		return ERR_PTR(ret);
 	}
 
-	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
-	if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
-		/* 965gm cannot relocate objects above 4GiB. */
-		mask &= ~__GFP_HIGHMEM;
-		mask |= __GFP_DMA32;
-	}
-
-	mapping = file_inode(obj->base.filp)->i_mapping;
-	mapping_set_gfp_mask(mapping, mask);
+	i915_gem_set_inode_gfp(dev, obj->base.filp);
 
 	i915_gem_object_init(obj, &i915_gem_object_ops);
 
@@ -4714,6 +4721,209 @@ i915_gem_stop_ringbuffers(struct drm_device *dev)
 		dev_priv->gt.stop_ring(ring);
 }
 
+static int
+i915_gem_object_migrate_stolen_to_shmemfs(struct drm_i915_gem_object *obj)
+{
+	struct drm_i915_private *i915 = to_i915(obj->base.dev);
+	struct i915_vma *vma, *vn;
+	struct drm_mm_node node;
+	struct file *file;
+	struct address_space *mapping;
+	struct sg_table *stolen_pages, *shmemfs_pages;
+	int ret, i;
+
+	if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
+		return -EINVAL;
+
+	ret = i915_gem_object_set_to_gtt_domain(obj, false);
+	if (ret)
+		return ret;
+
+	file = shmem_file_setup("drm mm object", obj->base.size, VM_NORESERVE);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+	mapping = i915_gem_set_inode_gfp(obj->base.dev, file);
+
+	list_for_each_entry_safe(vma, vn, &obj->vma_list, vma_link)
+		if (i915_vma_unbind(vma))
+			continue;
+
+	if (obj->madv != I915_MADV_WILLNEED && list_empty(&obj->vma_list)) {
+		/* Discard the stolen reservation, and replace with
+		 * an unpopulated shmemfs object.
+		 */
+		obj->madv = __I915_MADV_PURGED;
+		goto swap_pages;
+	}
+
+	/* stolen objects are already pinned to prevent shrinkage */
+	memset(&node, 0, sizeof(node));
+	ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
+						  &node,
+						  4096, 0, I915_CACHE_NONE,
+						  0, i915->gtt.mappable_end,
+						  DRM_MM_SEARCH_DEFAULT,
+						  DRM_MM_CREATE_DEFAULT);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
+		struct page *page;
+		void *__iomem src;
+		void *dst;
+
+		wmb();
+		i915->gtt.base.insert_page(&i915->gtt.base,
+					   i915_gem_object_get_dma_address(obj, i),
+					   node.start,
+					   I915_CACHE_NONE,
+					   0);
+		wmb();
+
+		page = shmem_read_mapping_page(mapping, i);
+		if (IS_ERR(page)) {
+			ret = PTR_ERR(page);
+			goto err_node;
+		}
+
+		src = io_mapping_map_atomic_wc(i915->gtt.mappable, node.start + PAGE_SIZE * i);
+		dst = kmap_atomic(page);
+		memcpy_fromio(dst, src, PAGE_SIZE);
+		kunmap_atomic(dst);
+		io_mapping_unmap_atomic(src);
+
+		page_cache_release(page);
+	}
+
+	wmb();
+	i915->gtt.base.clear_range(&i915->gtt.base,
+				   node.start, node.size,
+				   true);
+	drm_mm_remove_node(&node);
+
+swap_pages:
+	stolen_pages = obj->pages;
+	obj->pages = NULL;
+
+	obj->base.filp = file;
+	obj->base.read_domains = I915_GEM_DOMAIN_CPU;
+	obj->base.write_domain = I915_GEM_DOMAIN_CPU;
+
+	/* Recreate any pinned binding with pointers to the new storage */
+	if (!list_empty(&obj->vma_list)) {
+		ret = i915_gem_object_get_pages_gtt(obj);
+		if (ret) {
+			obj->pages = stolen_pages;
+			goto err_file;
+		}
+
+		ret = i915_gem_object_set_to_gtt_domain(obj, true);
+		if (ret) {
+			i915_gem_object_put_pages_gtt(obj);
+			obj->pages = stolen_pages;
+			goto err_file;
+		}
+
+		obj->get_page.sg = obj->pages->sgl;
+		obj->get_page.last = 0;
+
+		list_for_each_entry(vma, &obj->vma_list, vma_link) {
+			if (!drm_mm_node_allocated(&vma->node))
+				continue;
+
+			WARN_ON(i915_vma_bind(vma,
+					      obj->cache_level,
+					      PIN_UPDATE));
+		}
+	} else
+		list_del(&obj->global_list);
+
+	/* drop the stolen pin and backing */
+	shmemfs_pages = obj->pages;
+	obj->pages = stolen_pages;
+
+	i915_gem_object_unpin_pages(obj);
+	obj->ops->put_pages(obj);
+	if (obj->ops->release)
+		obj->ops->release(obj);
+
+	obj->ops = &i915_gem_object_ops;
+	obj->pages = shmemfs_pages;
+
+	return 0;
+
+err_node:
+	wmb();
+	i915->gtt.base.clear_range(&i915->gtt.base,
+				   node.start, node.size,
+				   true);
+	drm_mm_remove_node(&node);
+err_file:
+	fput(file);
+	obj->base.filp = NULL;
+	return ret;
+}
+
+int
+i915_gem_freeze(struct drm_device *dev)
+{
+	/* Called before i915_gem_suspend() when hibernating */
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_object *obj, *tmp;
+	struct list_head *phase[] = {
+		&i915->mm.unbound_list, &i915->mm.bound_list, NULL
+	}, **p;
+	int ret;
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+	/* Across hibernation, the stolen area is not preserved.
+	 * Anything inside stolen must copied back to normal
+	 * memory if we wish to preserve it.
+	 */
+	for (p = phase; *p; p++) {
+		struct list_head migrate;
+		int ret;
+
+		INIT_LIST_HEAD(&migrate);
+		list_for_each_entry_safe(obj, tmp, *p, global_list) {
+			if (obj->stolen == NULL)
+				continue;
+
+			if (obj->internal_volatile)
+				continue;
+
+			/* In the general case, this object may only be alive
+			 * due to an active reference, and that may disappear
+			 * when we unbind any of the objects (and so wait upon
+			 * the GPU and retire requests). To prevent one of the
+			 * objects from disappearing beneath us, we need to
+			 * take a reference to each as we build the migration
+			 * list.
+			 *
+			 * This is similar to the strategy required whilst
+			 * shrinking or evicting objects (for the same reason).
+			 */
+			drm_gem_object_reference(&obj->base);
+			list_move(&obj->global_list, &migrate);
+		}
+
+		ret = 0;
+		list_for_each_entry_safe(obj, tmp, &migrate, global_list) {
+			if (ret == 0)
+				ret = i915_gem_object_migrate_stolen_to_shmemfs(obj);
+			drm_gem_object_unreference(&obj->base);
+		}
+		list_splice(&migrate, *p);
+		if (ret)
+			break;
+	}
+
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
+}
+
 int
 i915_gem_suspend(struct drm_device *dev)
 {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index f281e0b..0803922 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2549,6 +2549,9 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 	if (IS_ERR(obj))
 		return false;
 
+	/* Not to be preserved across hibernation */
+	obj->internal_volatile = true;
+
 	obj->tiling_mode = plane_config->tiling;
 	if (obj->tiling_mode == I915_TILING_X)
 		obj->stride = fb->pitches[0];
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index f43681e..1d89253 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -154,6 +154,12 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
 		goto out;
 	}
 
+	/* Discard the contents of the BIOS fb across hibernation.
+	 * We really want to completely throwaway the earlier fbdev
+	 * and reconfigure it anyway.
+	 */
+	obj->internal_volatile = true;
+
 	fb = __intel_framebuffer_create(dev, &mode_cmd, obj);
 	if (IS_ERR(fb)) {
 		ret = PTR_ERR(fb);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 03ad276..6ddc20a 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5181,6 +5181,8 @@ static void valleyview_setup_pctx(struct drm_device *dev)
 	I915_WRITE(VLV_PCBR, pctx_paddr);
 
 out:
+	/* The power context need not be preserved across hibernation */
+	pctx->internal_volatile = true;
 	DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
 	dev_priv->vlv_pctx = pctx;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 5eabaf6..370d96a 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2090,6 +2090,12 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
+	/* Ringbuffer objects are by definition volatile - only the commands
+	 * between HEAD and TAIL need to be preserved and whilst there are
+	 * any commands there, the ringbuffer is pinned by activity.
+	 */
+	obj->internal_volatile = true;
+
 	/* mark ring buffers as read-only from GPU side by default */
 	obj->gt_ro = 1;
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT
  2015-12-09 12:46 ` [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT ankitprasad.r.sharma
@ 2015-12-09 13:26   ` Dave Gordon
  2015-12-10 10:02     ` Ankitprasad Sharma
  2015-12-09 13:30   ` Tvrtko Ursulin
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Dave Gordon @ 2015-12-09 13:26 UTC (permalink / raw)
  To: intel-gfx, Sharma, Ankitprasad R

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> This patch adds support for clearing buffer objects via CPU/GTT. This
> is particularly useful for clearing out the non shmem backed objects.
> Currently intend to use this only for buffers allocated from stolen
> region.
>
> v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> variable assignments (Tvrtko)
>
> v3: Map object page by page to the gtt if the pinning of the whole object
> to the ggtt fails, Corrected function name (Chris)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h |  1 +
>   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 80 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 548a0eb..8e554d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   				    int *needs_clflush);
>
>   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
>
>   static inline int __sg_page_count(struct scatterlist *sg)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9d2e6e3..d57e850 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5244,3 +5244,82 @@ fail:
>   	drm_gem_object_unreference(&obj->base);
>   	return ERR_PTR(ret);
>   }
> +
> +/**
> + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> + * @obj: Buffer object to be cleared
> + *
> + * Return: 0 - success, non-zero - failure
> + */
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> +{
> +	int ret, i;
> +	char __iomem *base;
> +	size_t size = obj->base.size;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_mm_node node;
> +
> +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> +	if (ret) {
> +		memset(&node, 0, sizeof(node));
> +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> +							  &node, 4096, 0,
> +							  I915_CACHE_NONE, 0,
> +							  i915->gtt.mappable_end,
> +							  DRM_MM_SEARCH_DEFAULT,
> +							  DRM_MM_CREATE_DEFAULT);
> +		if (ret)
> +			goto out;
> +
> +		i915_gem_object_pin_pages(obj);
> +	} else {
> +		node.start = i915_gem_obj_ggtt_offset(obj);
> +		node.allocated = false;
> +	}
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto unpin;
> +
> +	if (node.allocated) {
> +		for (i = 0; i < size/PAGE_SIZE; i++) {
> +			wmb();
> +			i915->gtt.base.insert_page(&i915->gtt.base,
> +					i915_gem_object_get_dma_address(obj, i),
> +					node.start,
> +					I915_CACHE_NONE,
> +					0);
> +			wmb();
> +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> +			memset_io(base, 0, 4096);
> +			iounmap(base);
> +		}
> +	} else {
> +		/* Get the CPU virtual address of the buffer */
> +		base = ioremap_wc(i915->gtt.mappable_base +
> +				  node.start, size);
> +		if (base == NULL) {
> +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> +			ret = -ENOSPC;
> +			goto unpin;
> +		}
> +
> +		memset_io(base, 0, size);
> +		iounmap(base);
> +	}
> +unpin:
> +	if (node.allocated) {
> +		wmb();
> +		i915->gtt.base.clear_range(&i915->gtt.base,
> +				node.start, node.size,
> +				true);
> +		drm_mm_remove_node(&node);
> +		i915_gem_object_unpin_pages(obj);
> +	}
> +	else {
> +		i915_gem_object_ggtt_unpin(obj);
> +	}
> +out:
> +	return ret;
> +}

This is effectively two functions interleaved, as shown by the repeated 
if (node.allocated) tests. Would it not be clearer to have the mainline 
function deal only with the GTT-pinned case, and a separate function for 
the page-by-page version, called as a fallback if pinning fails?

int i915_gem_object_clear(struct drm_i915_gem_object *obj)
{
	int ret, i;
	char __iomem *base;
	size_t size = obj->base.size;
	struct drm_i915_private *i915 = to_i915(obj->base.dev);

	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE|PIN_NONBLOCK);
	if (ret)
		return __i915_obj_clear_by_pages(...);

	... mainline (fast) code here ...

	return ret;
}

static int __i915_obj_clear_by_pages(...);
{
	... complicated page-by-page fallback code here ...
}

.Dave.


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT
  2015-12-09 12:46 ` [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT ankitprasad.r.sharma
  2015-12-09 13:26   ` Dave Gordon
@ 2015-12-09 13:30   ` Tvrtko Ursulin
  2015-12-09 13:57   ` Tvrtko Ursulin
  2015-12-09 13:57   ` Chris Wilson
  3 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-09 13:30 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath


Hi,

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> This patch adds support for clearing buffer objects via CPU/GTT. This
> is particularly useful for clearing out the non shmem backed objects.
> Currently intend to use this only for buffers allocated from stolen
> region.
>
> v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> variable assignments (Tvrtko)
>
> v3: Map object page by page to the gtt if the pinning of the whole object
> to the ggtt fails, Corrected function name (Chris)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

I gave my r-b from v2 and v3 is sufficiently different that it cannot 
apply any more.

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/i915_drv.h |  1 +
>   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 80 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 548a0eb..8e554d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   				    int *needs_clflush);
>
>   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
>
>   static inline int __sg_page_count(struct scatterlist *sg)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9d2e6e3..d57e850 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5244,3 +5244,82 @@ fail:
>   	drm_gem_object_unreference(&obj->base);
>   	return ERR_PTR(ret);
>   }
> +
> +/**
> + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> + * @obj: Buffer object to be cleared
> + *
> + * Return: 0 - success, non-zero - failure
> + */
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> +{
> +	int ret, i;
> +	char __iomem *base;
> +	size_t size = obj->base.size;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_mm_node node;
> +
> +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> +	if (ret) {
> +		memset(&node, 0, sizeof(node));
> +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> +							  &node, 4096, 0,
> +							  I915_CACHE_NONE, 0,
> +							  i915->gtt.mappable_end,
> +							  DRM_MM_SEARCH_DEFAULT,
> +							  DRM_MM_CREATE_DEFAULT);
> +		if (ret)
> +			goto out;
> +
> +		i915_gem_object_pin_pages(obj);
> +	} else {
> +		node.start = i915_gem_obj_ggtt_offset(obj);
> +		node.allocated = false;
> +	}
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto unpin;
> +
> +	if (node.allocated) {
> +		for (i = 0; i < size/PAGE_SIZE; i++) {
> +			wmb();
> +			i915->gtt.base.insert_page(&i915->gtt.base,
> +					i915_gem_object_get_dma_address(obj, i),
> +					node.start,
> +					I915_CACHE_NONE,
> +					0);
> +			wmb();
> +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> +			memset_io(base, 0, 4096);
> +			iounmap(base);
> +		}
> +	} else {
> +		/* Get the CPU virtual address of the buffer */
> +		base = ioremap_wc(i915->gtt.mappable_base +
> +				  node.start, size);
> +		if (base == NULL) {
> +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> +			ret = -ENOSPC;
> +			goto unpin;
> +		}
> +
> +		memset_io(base, 0, size);
> +		iounmap(base);
> +	}
> +unpin:
> +	if (node.allocated) {
> +		wmb();
> +		i915->gtt.base.clear_range(&i915->gtt.base,
> +				node.start, node.size,
> +				true);
> +		drm_mm_remove_node(&node);
> +		i915_gem_object_unpin_pages(obj);
> +	}
> +	else {
> +		i915_gem_object_ggtt_unpin(obj);
> +	}
> +out:
> +	return ret;
> +}
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT
  2015-12-09 12:46 ` [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT ankitprasad.r.sharma
  2015-12-09 13:26   ` Dave Gordon
  2015-12-09 13:30   ` Tvrtko Ursulin
@ 2015-12-09 13:57   ` Tvrtko Ursulin
  2015-12-10 10:23     ` Ankitprasad Sharma
  2015-12-09 13:57   ` Chris Wilson
  3 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-09 13:57 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath


On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> This patch adds support for clearing buffer objects via CPU/GTT. This
> is particularly useful for clearing out the non shmem backed objects.
> Currently intend to use this only for buffers allocated from stolen
> region.
>
> v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> variable assignments (Tvrtko)
>
> v3: Map object page by page to the gtt if the pinning of the whole object
> to the ggtt fails, Corrected function name (Chris)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h |  1 +
>   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 80 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 548a0eb..8e554d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   				    int *needs_clflush);
>
>   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
>
>   static inline int __sg_page_count(struct scatterlist *sg)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9d2e6e3..d57e850 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5244,3 +5244,82 @@ fail:
>   	drm_gem_object_unreference(&obj->base);
>   	return ERR_PTR(ret);
>   }
> +
> +/**
> + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> + * @obj: Buffer object to be cleared
> + *
> + * Return: 0 - success, non-zero - failure
> + */
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> +{
> +	int ret, i;
> +	char __iomem *base;
> +	size_t size = obj->base.size;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_mm_node node;
> +
> +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);

Hm, I thought Chrises suggestion was not to even try mapping all of it 
into GTT but just go page by page?

If I misunderstood that then I agree with Dave's comment that it should 
be split in two helper functions.

> +	if (ret) {
> +		memset(&node, 0, sizeof(node));
> +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> +							  &node, 4096, 0,
> +							  I915_CACHE_NONE, 0,
> +							  i915->gtt.mappable_end,
> +							  DRM_MM_SEARCH_DEFAULT,
> +							  DRM_MM_CREATE_DEFAULT);
> +		if (ret)
> +			goto out;
> +
> +		i915_gem_object_pin_pages(obj);
> +	} else {
> +		node.start = i915_gem_obj_ggtt_offset(obj);
> +		node.allocated = false;

This looks very hacky anyway and I would not recommend it.

> +	}
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto unpin;
> +
> +	if (node.allocated) {
> +		for (i = 0; i < size/PAGE_SIZE; i++) {
> +			wmb();

What is this barreier for? Shouldn't the one after writting out the PTEs 
and before remapping be enough?

> +			i915->gtt.base.insert_page(&i915->gtt.base,
> +					i915_gem_object_get_dma_address(obj, i),
> +					node.start,
> +					I915_CACHE_NONE,
> +					0);
> +			wmb();
> +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> +			memset_io(base, 0, 4096);
> +			iounmap(base);
> +		}
> +	} else {
> +		/* Get the CPU virtual address of the buffer */
> +		base = ioremap_wc(i915->gtt.mappable_base +
> +				  node.start, size);
> +		if (base == NULL) {
> +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> +			ret = -ENOSPC;
> +			goto unpin;
> +		}
> +
> +		memset_io(base, 0, size);
> +		iounmap(base);
> +	}
> +unpin:
> +	if (node.allocated) {
> +		wmb();

I don't understand this one either?

> +		i915->gtt.base.clear_range(&i915->gtt.base,
> +				node.start, node.size,
> +				true);
> +		drm_mm_remove_node(&node);
> +		i915_gem_object_unpin_pages(obj);
> +	}
> +	else {
> +		i915_gem_object_ggtt_unpin(obj);
> +	}
> +out:
> +	return ret;
> +}
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT
  2015-12-09 12:46 ` [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT ankitprasad.r.sharma
                     ` (2 preceding siblings ...)
  2015-12-09 13:57   ` Tvrtko Ursulin
@ 2015-12-09 13:57   ` Chris Wilson
  2015-12-10 10:27     ` Ankitprasad Sharma
  3 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2015-12-09 13:57 UTC (permalink / raw)
  To: ankitprasad.r.sharma; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Wed, Dec 09, 2015 at 06:16:17PM +0530, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> 
> This patch adds support for clearing buffer objects via CPU/GTT. This
> is particularly useful for clearing out the non shmem backed objects.
> Currently intend to use this only for buffers allocated from stolen
> region.
> 
> v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> variable assignments (Tvrtko)
> 
> v3: Map object page by page to the gtt if the pinning of the whole object
> to the ggtt fails, Corrected function name (Chris)
> 
> Testcase: igt/gem_stolen
> 
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |  1 +
>  drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 548a0eb..8e554d3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>  				    int *needs_clflush);
>  
>  int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
>  
>  static inline int __sg_page_count(struct scatterlist *sg)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9d2e6e3..d57e850 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -5244,3 +5244,82 @@ fail:
>  	drm_gem_object_unreference(&obj->base);
>  	return ERR_PTR(ret);
>  }
> +
> +/**
> + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> + * @obj: Buffer object to be cleared
> + *
> + * Return: 0 - success, non-zero - failure
> + */
> +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> +{
> +	int ret, i;
> +	char __iomem *base;
> +	size_t size = obj->base.size;
> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct drm_mm_node node;
> +
> +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));

Just lockdep_assert_held.

> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);

Would be nice to get the PIN_NOFAULT patches in to give preference to
userspace mappings....

> +	if (ret) {
> +		memset(&node, 0, sizeof(node));
> +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> +							  &node, 4096, 0,
> +							  I915_CACHE_NONE, 0,
> +							  i915->gtt.mappable_end,
> +							  DRM_MM_SEARCH_DEFAULT,
> +							  DRM_MM_CREATE_DEFAULT);

We use this often enough to merit a little helper.

> +		if (ret)
> +			goto out;
> +
> +		i915_gem_object_pin_pages(obj);
> +	} else {
> +		node.start = i915_gem_obj_ggtt_offset(obj);
> +		node.allocated = false;
> +	}
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto unpin;

You only need to drop the fence when using the whole object GGTT
mmaping.

> +
> +	if (node.allocated) {
> +		for (i = 0; i < size/PAGE_SIZE; i++) {
> +			wmb();
> +			i915->gtt.base.insert_page(&i915->gtt.base,
> +					i915_gem_object_get_dma_address(obj, i),
> +					node.start,
> +					I915_CACHE_NONE,
> +					0);
> +			wmb();
> +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> +			memset_io(base, 0, 4096);
> +			iounmap(base);
> +		}
> +	} else {
> +		/* Get the CPU virtual address of the buffer */
> +		base = ioremap_wc(i915->gtt.mappable_base +
> +				  node.start, size);

You should not use ioremap_wc() as it is easy to exhaust the kernel
address space on 32bit.

If you did a page by page approach for both paths, you could do this with
much less code...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects
  2015-12-09 12:46 ` [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects ankitprasad.r.sharma
@ 2015-12-09 14:06   ` Tvrtko Ursulin
  2015-12-11 11:22     ` Ankitprasad Sharma
  0 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-09 14:06 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath


Hi,

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> Extend the drm_i915_gem_create structure to add support for
> creating Stolen memory backed objects. Added a new flag through
> which user can specify the preference to allocate the object from
> stolen memory, which if set, an attempt will be made to allocate
> the object from stolen memory subject to the availability of
> free space in the stolen region.
>
> v2: Rebased to the latest drm-intel-nightly (Ankit)
>
> v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko)
>
> v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko)
> Corrected function arguments ordering (Chris)
>
> v5: Corrected function name (Chris)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_dma.c        |  3 +++
>   drivers/gpu/drm/i915/i915_drv.h        |  2 +-
>   drivers/gpu/drm/i915/i915_gem.c        | 30 +++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/i915_gem_stolen.c |  4 ++--
>   include/uapi/drm/i915_drm.h            | 16 ++++++++++++++++
>   5 files changed, 49 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index ffcb9c6..6927c7e 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>   	case I915_PARAM_HAS_RESOURCE_STREAMER:
>   		value = HAS_RESOURCE_STREAMER(dev);
>   		break;
> +	case I915_PARAM_CREATE_VERSION:
> +		value = 2;
> +		break;
>   	default:
>   		DRM_DEBUG("Unknown parameter %d\n", param->param);
>   		return -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8e554d3..d45274e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3213,7 +3213,7 @@ void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
>   int i915_gem_init_stolen(struct drm_device *dev);
>   void i915_gem_cleanup_stolen(struct drm_device *dev);
>   struct drm_i915_gem_object *
> -i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
> +i915_gem_object_create_stolen(struct drm_device *dev, u64 size);
>   struct drm_i915_gem_object *
>   i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>   					       u32 stolen_offset,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d57e850..296e63f 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -375,6 +375,7 @@ static int
>   i915_gem_create(struct drm_file *file,
>   		struct drm_device *dev,
>   		uint64_t size,
> +		uint32_t flags,
>   		uint32_t *handle_p)
>   {
>   	struct drm_i915_gem_object *obj;
> @@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file,
>   	if (size == 0)
>   		return -EINVAL;
>
> +	if (flags & __I915_CREATE_UNKNOWN_FLAGS)
> +		return -EINVAL;
> +
>   	/* Allocate the new object */
> -	obj = i915_gem_alloc_object(dev, size);
> +	if (flags & I915_CREATE_PLACEMENT_STOLEN) {
> +		mutex_lock(&dev->struct_mutex);
> +		obj = i915_gem_object_create_stolen(dev, size);
> +		if (!obj) {
> +			mutex_unlock(&dev->struct_mutex);
> +			return -ENOMEM;
> +		}
> +
> +		/* Always clear fresh buffers before handing to userspace */
> +		ret = i915_gem_object_clear(obj);
> +		if (ret) {
> +			drm_gem_object_unreference(&obj->base);
> +			mutex_unlock(&dev->struct_mutex);
> +			return ret;
> +		}
> +
> +		mutex_unlock(&dev->struct_mutex);
> +	} else {
> +		obj = i915_gem_alloc_object(dev, size);
> +	}
> +
>   	if (obj == NULL)
>   		return -ENOMEM;
>
> @@ -409,7 +433,7 @@ i915_gem_dumb_create(struct drm_file *file,
>   	args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
>   	args->size = args->pitch * args->height;
>   	return i915_gem_create(file, dev,
> -			       args->size, &args->handle);
> +			       args->size, 0, &args->handle);
>   }
>
>   /**
> @@ -422,7 +446,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
>   	struct drm_i915_gem_create *args = data;
>
>   	return i915_gem_create(file, dev,
> -			       args->size, &args->handle);
> +			       args->size, args->flags, &args->handle);
>   }
>
>   static inline int
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index 598ed2f..b98a3bf 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -583,7 +583,7 @@ cleanup:
>   }
>
>   struct drm_i915_gem_object *
> -i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
> +i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct drm_i915_gem_object *obj;
> @@ -593,7 +593,7 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
>   	if (!drm_mm_initialized(&dev_priv->mm.stolen))
>   		return NULL;
>
> -	DRM_DEBUG_KMS("creating stolen object: size=%x\n", size);
> +	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
>   	if (size == 0)
>   		return NULL;
>
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 67cebe6..8e7e3a4 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -356,6 +356,7 @@ typedef struct drm_i915_irq_wait {
>   #define I915_PARAM_EU_TOTAL		 34
>   #define I915_PARAM_HAS_GPU_RESET	 35
>   #define I915_PARAM_HAS_RESOURCE_STREAMER 36
> +#define I915_PARAM_CREATE_VERSION	 37
>
>   typedef struct drm_i915_getparam {
>   	__s32 param;
> @@ -455,6 +456,21 @@ struct drm_i915_gem_create {
>   	 */
>   	__u32 handle;
>   	__u32 pad;
> +	/**
> +	 * Requested flags (currently used for placement
> +	 * (which memory domain))
> +	 *
> +	 * You can request that the object be created from special memory
> +	 * rather than regular system pages using this parameter. Such
> +	 * irregular objects may have certain restrictions (such as CPU
> +	 * access to a stolen object is verboten).
> +	 *
> +	 * This can be used in the future for other purposes too
> +	 * e.g. specifying tiling/caching/madvise
> +	 */
> +	__u32 flags;
> +#define I915_CREATE_PLACEMENT_STOLEN 	(1<<0) /* Cannot use CPU mmaps */
> +#define __I915_CREATE_UNKNOWN_FLAGS	-(I915_CREATE_PLACEMENT_STOLEN << 1)

I've asked in another reply, now that userspace can create a stolen 
object, what happens if it tries to use it for a batch buffer?

Can it end up in the relocate_entry_cpu with a batch buffer allocated 
from stolen, which would then call i915_gem_object_get_page and crash?

>   };
>
>   struct drm_i915_gem_pread {
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 3/6] drm/i915: Propagating correct error codes to the userspace
  2015-12-09 12:46 ` [PATCH 3/6] drm/i915: Propagating correct error codes to the userspace ankitprasad.r.sharma
@ 2015-12-09 15:10   ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-09 15:10 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath


Hi,

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> Propagating correct error codes to userspace by using ERR_PTR and
> PTR_ERR macros for stolen memory based object allocation. We generally
> return -ENOMEM to the user whenever there is a failure in object
> allocation. This patch helps user to identify the correct reason for the
> failure and not just -ENOMEM each time.
>
> v2: Moved the patch up in the series, added error propagation for
> i915_gem_alloc_object too (Chris)
>
> v3: Removed storing of error pointer inside structs, Corrected error
> propagation in caller functions (Chris)
>
> v4: Remove assignments inside the predicate (Chris)
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c              | 16 +++++-----
>   drivers/gpu/drm/i915/i915_gem_batch_pool.c   |  4 +--
>   drivers/gpu/drm/i915/i915_gem_context.c      |  4 +--
>   drivers/gpu/drm/i915/i915_gem_render_state.c |  7 +++--
>   drivers/gpu/drm/i915/i915_gem_stolen.c       | 43 ++++++++++++++------------
>   drivers/gpu/drm/i915/i915_guc_submission.c   | 45 ++++++++++++++++++----------
>   drivers/gpu/drm/i915/intel_display.c         |  2 +-
>   drivers/gpu/drm/i915/intel_fbdev.c           |  6 ++--
>   drivers/gpu/drm/i915/intel_lrc.c             | 10 ++++---
>   drivers/gpu/drm/i915/intel_overlay.c         |  4 +--
>   drivers/gpu/drm/i915/intel_pm.c              |  2 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c      | 21 ++++++-------
>   12 files changed, 95 insertions(+), 69 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 296e63f..5812748 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -393,9 +393,9 @@ i915_gem_create(struct drm_file *file,
>   	if (flags & I915_CREATE_PLACEMENT_STOLEN) {
>   		mutex_lock(&dev->struct_mutex);
>   		obj = i915_gem_object_create_stolen(dev, size);
> -		if (!obj) {
> +		if (IS_ERR(obj)) {
>   			mutex_unlock(&dev->struct_mutex);
> -			return -ENOMEM;
> +			return PTR_ERR(obj);
>   		}
>
>   		/* Always clear fresh buffers before handing to userspace */
> @@ -411,8 +411,8 @@ i915_gem_create(struct drm_file *file,
>   		obj = i915_gem_alloc_object(dev, size);
>   	}
>
> -	if (obj == NULL)
> -		return -ENOMEM;
> +	if (IS_ERR(obj))
> +		return PTR_ERR(obj);
>
>   	ret = drm_gem_handle_create(file, &obj->base, &handle);
>   	/* drop reference from allocate - handle holds it now */
> @@ -4399,14 +4399,16 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>   	struct drm_i915_gem_object *obj;
>   	struct address_space *mapping;
>   	gfp_t mask;
> +	int ret;
>
>   	obj = i915_gem_object_alloc(dev);
>   	if (obj == NULL)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>
> -	if (drm_gem_object_init(dev, &obj->base, size) != 0) {
> +	ret = drm_gem_object_init(dev, &obj->base, size);
> +	if (ret) {
>   		i915_gem_object_free(obj);
> -		return NULL;
> +		return ERR_PTR(ret);
>   	}
>
>   	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> index 7bf2f3f..d79caa2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> +++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c
> @@ -135,8 +135,8 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool,
>   		int ret;
>
>   		obj = i915_gem_alloc_object(pool->dev, size);
> -		if (obj == NULL)
> -			return ERR_PTR(-ENOMEM);
> +		if (IS_ERR(obj))
> +			return obj;
>
>   		ret = i915_gem_object_get_pages(obj);
>   		if (ret)
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 204dc7c..4d24cfc 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -181,8 +181,8 @@ i915_gem_alloc_context_obj(struct drm_device *dev, size_t size)
>   	int ret;
>
>   	obj = i915_gem_alloc_object(dev, size);
> -	if (obj == NULL)
> -		return ERR_PTR(-ENOMEM);
> +	if (IS_ERR(obj))
> +		return obj;
>
>   	/*
>   	 * Try to make the context utilize L3 as well as LLC.
> diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
> index 5026a62..2bfdd49 100644
> --- a/drivers/gpu/drm/i915/i915_gem_render_state.c
> +++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
> @@ -58,8 +58,11 @@ static int render_state_init(struct render_state *so, struct drm_device *dev)
>   		return -EINVAL;
>
>   	so->obj = i915_gem_alloc_object(dev, 4096);
> -	if (so->obj == NULL)
> -		return -ENOMEM;
> +	if (IS_ERR(so->obj)) {
> +		ret = PTR_ERR(so->obj);
> +		so->obj = NULL;
> +		return ret;
> +	}
>
>   	ret = i915_gem_obj_ggtt_pin(so->obj, 4096, 0);
>   	if (ret)
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index b98a3bf..0b0ce11 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -492,6 +492,7 @@ i915_pages_create_for_stolen(struct drm_device *dev,
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct sg_table *st;
>   	struct scatterlist *sg;
> +	int ret;
>
>   	DRM_DEBUG_DRIVER("offset=0x%x, size=%d\n", offset, size);
>   	BUG_ON(offset > dev_priv->gtt.stolen_size - size);
> @@ -503,11 +504,12 @@ i915_pages_create_for_stolen(struct drm_device *dev,
>
>   	st = kmalloc(sizeof(*st), GFP_KERNEL);
>   	if (st == NULL)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>
> -	if (sg_alloc_table(st, 1, GFP_KERNEL)) {
> +	ret = sg_alloc_table(st, 1, GFP_KERNEL);
> +	if (ret) {
>   		kfree(st);
> -		return NULL;
> +		return ERR_PTR(ret);
>   	}
>
>   	sg = st->sgl;
> @@ -556,18 +558,21 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
>   			       struct drm_mm_node *stolen)
>   {
>   	struct drm_i915_gem_object *obj;
> +	int ret = 0;

ret looks only to be used on the cleanup path so no need to initialise 
it to zero. Or is the compiler complaining? In which cases it might be 
simpler to remove the cleanup path and just do it from the "if 
(IS_ERR(obj->pages))" block?

>
>   	obj = i915_gem_object_alloc(dev);
>   	if (obj == NULL)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>
>   	drm_gem_private_object_init(dev, &obj->base, stolen->size);
>   	i915_gem_object_init(obj, &i915_gem_object_stolen_ops);
>
>   	obj->pages = i915_pages_create_for_stolen(dev,
>   						  stolen->start, stolen->size);
> -	if (obj->pages == NULL)
> +	if (IS_ERR(obj->pages)) {
> +		ret = PTR_ERR(obj->pages);
>   		goto cleanup;
> +	}
>
>   	i915_gem_object_pin_pages(obj);
>   	obj->stolen = stolen;
> @@ -579,7 +584,7 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
>
>   cleanup:
>   	i915_gem_object_free(obj);
> -	return NULL;
> +	return ERR_PTR(ret);
>   }
>
>   struct drm_i915_gem_object *
> @@ -591,29 +596,29 @@ i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
>   	int ret;
>
>   	if (!drm_mm_initialized(&dev_priv->mm.stolen))
> -		return NULL;
> +		return ERR_PTR(-ENODEV);
>
>   	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
>   	if (size == 0)
> -		return NULL;
> +		return ERR_PTR(-EINVAL);
>
>   	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
>   	if (!stolen)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>
>   	ret = i915_gem_stolen_insert_node(dev_priv, stolen, size, 4096);
>   	if (ret) {
>   		kfree(stolen);
> -		return NULL;
> +		return ERR_PTR(ret);
>   	}
>
>   	obj = _i915_gem_object_create_stolen(dev, stolen);
> -	if (obj)
> +	if (!IS_ERR(obj))
>   		return obj;
>
>   	i915_gem_stolen_remove_node(dev_priv, stolen);
>   	kfree(stolen);
> -	return NULL;
> +	return obj;
>   }
>
>   struct drm_i915_gem_object *
> @@ -630,7 +635,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>   	int ret;
>
>   	if (!drm_mm_initialized(&dev_priv->mm.stolen))
> -		return NULL;
> +		return ERR_PTR(-ENODEV);

What about the callers of this function? They don't seem to handle ERR_PTR.

>
>   	DRM_DEBUG_KMS("creating preallocated stolen object: stolen_offset=%x, gtt_offset=%x, size=%x\n",
>   			stolen_offset, gtt_offset, size);
> @@ -638,11 +643,11 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>   	/* KISS and expect everything to be page-aligned */
>   	if (WARN_ON(size == 0) || WARN_ON(size & 4095) ||
>   	    WARN_ON(stolen_offset & 4095))
> -		return NULL;
> +		return ERR_PTR(-EINVAL);
>
>   	stolen = kzalloc(sizeof(*stolen), GFP_KERNEL);
>   	if (!stolen)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>
>   	stolen->start = stolen_offset;
>   	stolen->size = size;
> @@ -652,15 +657,15 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>   	if (ret) {
>   		DRM_DEBUG_KMS("failed to allocate stolen space\n");
>   		kfree(stolen);
> -		return NULL;
> +		return ERR_PTR(ret);
>   	}
>
>   	obj = _i915_gem_object_create_stolen(dev, stolen);
> -	if (obj == NULL) {
> +	if (IS_ERR(obj)) {
>   		DRM_DEBUG_KMS("failed to allocate stolen object\n");
>   		i915_gem_stolen_remove_node(dev_priv, stolen);
>   		kfree(stolen);
> -		return NULL;
> +		return obj;
>   	}
>
>   	/* Some objects just need physical mem from stolen space */
> @@ -698,5 +703,5 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>
>   err:
>   	drm_gem_object_unreference(&obj->base);
> -	return NULL;
> +	return ERR_PTR(ret);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c b/drivers/gpu/drm/i915/i915_guc_submission.c
> index 4ac8867..aa38ae4 100644
> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
> @@ -645,22 +645,24 @@ int i915_guc_submit(struct i915_guc_client *client,
>    * object needs to be pinned lifetime. Also we must pin it to gtt space other
>    * than [0, GUC_WOPCM_TOP) because this range is reserved inside GuC.
>    *
> - * Return:	A drm_i915_gem_object if successful, otherwise NULL.
> + * Return:	A drm_i915_gem_object if successful, otherwise error pointer.
>    */
>   static struct drm_i915_gem_object *gem_allocate_guc_obj(struct drm_device *dev,
>   							u32 size)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct drm_i915_gem_object *obj;
> +	int ret;
>
>   	obj = i915_gem_alloc_object(dev, size);
> -	if (!obj)
> -		return NULL;
> +	if (IS_ERR(obj))
> +		return obj;
>
> -	if (i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
> -			PIN_OFFSET_BIAS | GUC_WOPCM_TOP)) {
> +	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
> +				    PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
> +	if (ret) {
>   		drm_gem_object_unreference(&obj->base);
> -		return NULL;
> +		return ERR_PTR(ret);
>   	}
>
>   	/* Invalidate GuC TLB to let GuC take the latest updates to GTT. */
> @@ -738,10 +740,11 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,

Need to update kerneldoc for this function.

>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct intel_guc *guc = &dev_priv->guc;
>   	struct drm_i915_gem_object *obj;
> +	int ret;
>
>   	client = kzalloc(sizeof(*client), GFP_KERNEL);
>   	if (!client)
> -		return NULL;
> +		return ERR_PTR(-ENOMEM);
>
>   	client->doorbell_id = GUC_INVALID_DOORBELL_ID;
>   	client->priority = priority;
> @@ -752,13 +755,16 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
>   			GUC_MAX_GPU_CONTEXTS, GFP_KERNEL);
>   	if (client->ctx_index >= GUC_MAX_GPU_CONTEXTS) {
>   		client->ctx_index = GUC_INVALID_CTX_ID;
> +		ret = -EINVAL;
>   		goto err;
>   	}
>
>   	/* The first page is doorbell/proc_desc. Two followed pages are wq. */
>   	obj = gem_allocate_guc_obj(dev, GUC_DB_SIZE + GUC_WQ_SIZE);
> -	if (!obj)
> +	if (IS_ERR(obj)) {
> +		ret = PTR_ERR(obj);
>   		goto err;
> +	}
>
>   	client->client_obj = obj;
>   	client->wq_offset = GUC_DB_SIZE;
> @@ -778,9 +784,11 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
>   		client->proc_desc_offset = (GUC_DB_SIZE / 2);
>
>   	client->doorbell_id = assign_doorbell(guc, client->priority);
> -	if (client->doorbell_id == GUC_INVALID_DOORBELL_ID)
> +	if (client->doorbell_id == GUC_INVALID_DOORBELL_ID) {
>   		/* XXX: evict a doorbell instead */
> +		ret = -EINVAL;
>   		goto err;
> +	}
>
>   	guc_init_proc_desc(guc, client);
>   	guc_init_ctx_desc(guc, client);
> @@ -788,7 +796,8 @@ static struct i915_guc_client *guc_client_alloc(struct drm_device *dev,
>
>   	/* XXX: Any cache flushes needed? General domain mgmt calls? */
>
> -	if (host2guc_allocate_doorbell(guc, client))
> +	ret = host2guc_allocate_doorbell(guc, client);
> +	if (ret)
>   		goto err;
>
>   	DRM_DEBUG_DRIVER("new priority %u client %p: ctx_index %u db_id %u\n",
> @@ -800,7 +809,7 @@ err:
>   	DRM_ERROR("FAILED to create priority %u GuC client!\n", priority);
>
>   	guc_client_free(dev, client);
> -	return NULL;
> +	return ERR_PTR(ret);
>   }
>
>   static void guc_create_log(struct intel_guc *guc)
> @@ -825,7 +834,7 @@ static void guc_create_log(struct intel_guc *guc)
>   	obj = guc->log_obj;
>   	if (!obj) {
>   		obj = gem_allocate_guc_obj(dev_priv->dev, size);
> -		if (!obj) {
> +		if (IS_ERR(obj)) {
>   			/* logging will be off */
>   			i915.guc_log_level = -1;
>   			return;
> @@ -855,6 +864,7 @@ int i915_guc_submission_init(struct drm_device *dev)
>   	const size_t poolsize = GUC_MAX_GPU_CONTEXTS * ctxsize;
>   	const size_t gemsize = round_up(poolsize, PAGE_SIZE);
>   	struct intel_guc *guc = &dev_priv->guc;
> +	int ret = 0;

No need to initialise.

>
>   	if (!i915.enable_guc_submission)
>   		return 0; /* not enabled  */
> @@ -863,8 +873,11 @@ int i915_guc_submission_init(struct drm_device *dev)
>   		return 0; /* already allocated */
>
>   	guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv->dev, gemsize);
> -	if (!guc->ctx_pool_obj)
> -		return -ENOMEM;
> +	if (IS_ERR(guc->ctx_pool_obj)) {
> +		ret = PTR_ERR(guc->ctx_pool_obj);
> +		guc->ctx_pool_obj = NULL;
> +		return ret;
> +	}
>
>   	spin_lock_init(&dev_priv->guc.host2guc_lock);
>
> @@ -884,9 +897,9 @@ int i915_guc_submission_enable(struct drm_device *dev)
>
>   	/* client for execbuf submission */
>   	client = guc_client_alloc(dev, GUC_CTX_PRIORITY_KMD_NORMAL, ctx);
> -	if (!client) {
> +	if (IS_ERR(client)) {
>   		DRM_ERROR("Failed to create execbuf guc_client\n");
> -		return -ENOMEM;
> +		return PTR_ERR(client);
>   	}
>
>   	guc->execbuf_client = client;
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 77979ed..f281e0b 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -2546,7 +2546,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
>   							     base_aligned,
>   							     base_aligned,
>   							     size_aligned);
> -	if (!obj)
> +	if (IS_ERR(obj))
>   		return false;
>
>   	obj->tiling_mode = plane_config->tiling;
> diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> index 840d6bf..f43681e 100644
> --- a/drivers/gpu/drm/i915/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> @@ -146,11 +146,11 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
>   	 * features. */
>   	if (size * 2 < dev_priv->gtt.stolen_usable_size)
>   		obj = i915_gem_object_create_stolen(dev, size);
> -	if (obj == NULL)
> +	if (IS_ERR_OR_NULL(obj))
>   		obj = i915_gem_alloc_object(dev, size);
> -	if (!obj) {
> +	if (IS_ERR(obj)) {
>   		DRM_ERROR("failed to allocate framebuffer\n");
> -		ret = -ENOMEM;
> +		ret = PTR_ERR(obj);
>   		goto out;
>   	}
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 06180dc..4539cc6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1364,9 +1364,11 @@ static int lrc_setup_wa_ctx_obj(struct intel_engine_cs *ring, u32 size)
>   	int ret;
>
>   	ring->wa_ctx.obj = i915_gem_alloc_object(ring->dev, PAGE_ALIGN(size));
> -	if (!ring->wa_ctx.obj) {
> +	if (IS_ERR(ring->wa_ctx.obj)) {
>   		DRM_DEBUG_DRIVER("alloc LRC WA ctx backing obj failed.\n");
> -		return -ENOMEM;
> +		ret = PTR_ERR(ring->wa_ctx.obj);
> +		ring->wa_ctx.obj = NULL;
> +		return ret;
>   	}
>
>   	ret = i915_gem_obj_ggtt_pin(ring->wa_ctx.obj, PAGE_SIZE, 0);
> @@ -2471,9 +2473,9 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
>   	context_size += PAGE_SIZE * LRC_PPHWSP_PN;
>
>   	ctx_obj = i915_gem_alloc_object(dev, context_size);
> -	if (!ctx_obj) {
> +	if (IS_ERR(ctx_obj)) {
>   		DRM_DEBUG_DRIVER("Alloc LRC backing obj failed.\n");
> -		return -ENOMEM;
> +		return PTR_ERR(ctx_obj);
>   	}
>
>   	ringbuf = intel_engine_create_ringbuffer(ring, 4 * PAGE_SIZE);
> diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c
> index 76f1980..3a65858 100644
> --- a/drivers/gpu/drm/i915/intel_overlay.c
> +++ b/drivers/gpu/drm/i915/intel_overlay.c
> @@ -1392,9 +1392,9 @@ void intel_setup_overlay(struct drm_device *dev)
>   	reg_bo = NULL;
>   	if (!OVERLAY_NEEDS_PHYSICAL(dev))
>   		reg_bo = i915_gem_object_create_stolen(dev, PAGE_SIZE);
> -	if (reg_bo == NULL)
> +	if (IS_ERR_OR_NULL(reg_bo))
>   		reg_bo = i915_gem_alloc_object(dev, PAGE_SIZE);
> -	if (reg_bo == NULL)
> +	if (IS_ERR(reg_bo))
>   		goto out_free;
>   	overlay->reg_bo = reg_bo;
>
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 647c0ff..6dee908 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5172,7 +5172,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
>   	 * memory, or any other relevant ranges.
>   	 */
>   	pctx = i915_gem_object_create_stolen(dev, pctx_size);
> -	if (!pctx) {
> +	if (IS_ERR(pctx)) {
>   		DRM_DEBUG("not enough stolen space for PCTX, disabling\n");
>   		return;
>   	}
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index c9b081f..5eabaf6 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -678,9 +678,10 @@ intel_init_pipe_control(struct intel_engine_cs *ring)
>   	WARN_ON(ring->scratch.obj);
>
>   	ring->scratch.obj = i915_gem_alloc_object(ring->dev, 4096);
> -	if (ring->scratch.obj == NULL) {
> +	if (IS_ERR(ring->scratch.obj)) {
>   		DRM_ERROR("Failed to allocate seqno page\n");
> -		ret = -ENOMEM;
> +		ret = PTR_ERR(ring->scratch.obj);
> +		ring->scratch.obj = NULL;
>   		goto err;
>   	}
>
> @@ -1935,9 +1936,9 @@ static int init_status_page(struct intel_engine_cs *ring)
>   		int ret;
>
>   		obj = i915_gem_alloc_object(ring->dev, 4096);
> -		if (obj == NULL) {
> +		if (IS_ERR(obj)) {
>   			DRM_ERROR("Failed to allocate status page\n");
> -			return -ENOMEM;
> +			return PTR_ERR(obj);
>   		}
>
>   		ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
> @@ -2084,10 +2085,10 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
>   	obj = NULL;
>   	if (!HAS_LLC(dev))
>   		obj = i915_gem_object_create_stolen(dev, ringbuf->size);
> -	if (obj == NULL)
> +	if (IS_ERR_OR_NULL(obj))
>   		obj = i915_gem_alloc_object(dev, ringbuf->size);
> -	if (obj == NULL)
> -		return -ENOMEM;
> +	if (IS_ERR(obj))
> +		return PTR_ERR(obj);
>
>   	/* mark ring buffers as read-only from GPU side by default */
>   	obj->gt_ro = 1;
> @@ -2678,7 +2679,7 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   	if (INTEL_INFO(dev)->gen >= 8) {
>   		if (i915_semaphore_is_enabled(dev)) {
>   			obj = i915_gem_alloc_object(dev, 4096);
> -			if (obj == NULL) {
> +			if (IS_ERR(obj)) {
>   				DRM_ERROR("Failed to allocate semaphore bo. Disabling semaphores\n");
>   				i915.semaphores = 0;
>   			} else {
> @@ -2785,9 +2786,9 @@ int intel_init_render_ring_buffer(struct drm_device *dev)
>   	/* Workaround batchbuffer to combat CS tlb bug. */
>   	if (HAS_BROKEN_CS_TLB(dev)) {
>   		obj = i915_gem_alloc_object(dev, I830_WA_SIZE);
> -		if (obj == NULL) {
> +		if (IS_ERR(obj)) {
>   			DRM_ERROR("Failed to allocate batch bo\n");
> -			return -ENOMEM;
> +			return PTR_ERR(obj);
>   		}
>
>   		ret = i915_gem_obj_ggtt_pin(obj, 0, 0);
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/6] drm/i915: Add support for stealing purgable stolen pages
  2015-12-09 12:46 ` [PATCH 4/6] drm/i915: Add support for stealing purgable stolen pages ankitprasad.r.sharma
@ 2015-12-09 15:40   ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-09 15:40 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath


Hi,

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Chris Wilson <chris at chris-wilson.co.uk>
>
> If we run out of stolen memory when trying to allocate an object, see if
> we can reap enough purgeable objects to free up enough contiguous free
> space for the allocation. This is in principle very much like evicting
> objects to free up enough contiguous space in the vma when binding
> a new object - and you will be forgiven for thinking that the code looks
> very similar.
>
> At the moment, we do not allow userspace to allocate objects in stolen,
> so there is neither the memory pressure to trigger stolen eviction nor
> any purgeable objects inside the stolen arena. However, this will change
> in the near future, and so better management and defragmentation of
> stolen memory will become a real issue.
>
> v2: Remember to remove the drm_mm_node.
>
> v3: Rebased to the latest drm-intel-nightly (Ankit)
>
> v4: corrected if-else braces format (Tvrtko/kerneldoc)
>
> v5: Rebased to the latest drm-intel-nightly (Ankit)
> Added a seperate list to maintain purgable objects from stolen memory
> region (Chris/Daniel)
>
> v6: Compiler optimization (merging 2 single loops into one for() loop),
> corrected code for object eviction, retire_requests before starting
> object eviction (Chris)
>
> v7: Added kernel doc for i915_gem_object_create_stolen()
>
> v8: Check for struct_mutex lock before creating object from stolen
> region (Tvrtko)
>
> v9: Renamed variables to make usage clear, added comment, removed onetime
> used macro (Tvrtko)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c    |   6 +-
>   drivers/gpu/drm/i915/i915_drv.h        |  17 +++-
>   drivers/gpu/drm/i915/i915_gem.c        |  16 ++++
>   drivers/gpu/drm/i915/i915_gem_stolen.c | 170 +++++++++++++++++++++++++++++----
>   drivers/gpu/drm/i915/intel_pm.c        |   4 +-
>   5 files changed, 188 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 5659d4c..89b0fec 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -174,7 +174,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
>   			seq_puts(m, ")");
>   	}
>   	if (obj->stolen)
> -		seq_printf(m, " (stolen: %08llx)", obj->stolen->start);
> +		seq_printf(m, " (stolen: %08llx)", obj->stolen->base.start);
>   	if (obj->pin_display || obj->fault_mappable) {
>   		char s[3], *t = s;
>   		if (obj->pin_display)
> @@ -253,9 +253,9 @@ static int obj_rank_by_stolen(void *priv,
>   	struct drm_i915_gem_object *b =
>   		container_of(B, struct drm_i915_gem_object, obj_exec_link);
>
> -	if (a->stolen->start < b->stolen->start)
> +	if (a->stolen->base.start < b->stolen->base.start)
>   		return -1;
> -	if (a->stolen->start > b->stolen->start)
> +	if (a->stolen->base.start > b->stolen->base.start)
>   		return 1;
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d45274e..e0b09b0 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -841,6 +841,12 @@ struct i915_ctx_hang_stats {
>   	bool banned;
>   };
>
> +struct i915_stolen_node {
> +	struct drm_mm_node base;
> +	struct list_head mm_link;
> +	struct drm_i915_gem_object *obj;
> +};
> +
>   /* This must match up with the value previously used for execbuf2.rsvd1. */
>   #define DEFAULT_CONTEXT_HANDLE 0
>
> @@ -1252,6 +1258,13 @@ struct i915_gem_mm {
>   	 */
>   	struct list_head unbound_list;
>
> +	/**
> +	 * List of stolen objects that have been marked as purgeable and
> +	 * thus available for reaping if we need more space for a new
> +	 * allocation. Ordered by time of marking purgeable.
> +	 */
> +	struct list_head stolen_list;
> +
>   	/** Usable portion of the GTT for GEM */
>   	unsigned long stolen_base; /* limited to low memory (32-bit) */
>
> @@ -2032,7 +2045,7 @@ struct drm_i915_gem_object {
>   	struct list_head vma_list;
>
>   	/** Stolen memory for this object, instead of being backed by shmem. */
> -	struct drm_mm_node *stolen;
> +	struct i915_stolen_node *stolen;
>   	struct list_head global_list;
>
>   	struct list_head ring_list[I915_NUM_RINGS];
> @@ -2040,6 +2053,8 @@ struct drm_i915_gem_object {
>   	struct list_head obj_exec_link;
>
>   	struct list_head batch_pool_link;
> +	/** Used during stolen memory allocations to temporarily hold a ref */
> +	struct list_head stolen_link;
>
>   	/**
>   	 * This is set if the object is on the active lists (has pending
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5812748..ed97de6 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4359,6 +4359,20 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
>   	if (obj->madv == I915_MADV_DONTNEED && obj->pages == NULL)
>   		i915_gem_object_truncate(obj);
>
> +	if (obj->stolen) {
> +		switch (obj->madv) {
> +		case I915_MADV_WILLNEED:
> +			list_del_init(&obj->stolen->mm_link);
> +			break;
> +		case I915_MADV_DONTNEED:
> +			list_move(&obj->stolen->mm_link,
> +				  &dev_priv->mm.stolen_list);
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +
>   	args->retained = obj->madv != __I915_MADV_PURGED;
>
>   out:
> @@ -4379,6 +4393,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
>   	INIT_LIST_HEAD(&obj->obj_exec_link);
>   	INIT_LIST_HEAD(&obj->vma_list);
>   	INIT_LIST_HEAD(&obj->batch_pool_link);
> +	INIT_LIST_HEAD(&obj->stolen_link);
>
>   	obj->ops = ops;
>
> @@ -4997,6 +5012,7 @@ i915_gem_load(struct drm_device *dev)
>   	INIT_LIST_HEAD(&dev_priv->context_list);
>   	INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
>   	INIT_LIST_HEAD(&dev_priv->mm.bound_list);
> +	INIT_LIST_HEAD(&dev_priv->mm.stolen_list);
>   	INIT_LIST_HEAD(&dev_priv->mm.fence_list);
>   	for (i = 0; i < I915_NUM_RINGS; i++)
>   		init_ring_lists(&dev_priv->ring[i]);
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> index 0b0ce11..9d6ac67 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -542,7 +542,8 @@ i915_gem_object_release_stolen(struct drm_i915_gem_object *obj)
>   	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
>
>   	if (obj->stolen) {
> -		i915_gem_stolen_remove_node(dev_priv, obj->stolen);
> +		list_del(&obj->stolen->mm_link);
> +		i915_gem_stolen_remove_node(dev_priv, &obj->stolen->base);
>   		kfree(obj->stolen);
>   		obj->stolen = NULL;
>   	}
> @@ -555,7 +556,7 @@ static const struct drm_i915_gem_object_ops i915_gem_object_stolen_ops = {
>
>   static struct drm_i915_gem_object *
>   _i915_gem_object_create_stolen(struct drm_device *dev,
> -			       struct drm_mm_node *stolen)
> +			       struct i915_stolen_node *stolen)
>   {
>   	struct drm_i915_gem_object *obj;
>   	int ret = 0;
> @@ -564,11 +565,12 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
>   	if (obj == NULL)
>   		return ERR_PTR(-ENOMEM);
>
> -	drm_gem_private_object_init(dev, &obj->base, stolen->size);
> +	drm_gem_private_object_init(dev, &obj->base, stolen->base.size);
>   	i915_gem_object_init(obj, &i915_gem_object_stolen_ops);
>
>   	obj->pages = i915_pages_create_for_stolen(dev,
> -						  stolen->start, stolen->size);
> +						  stolen->base.start,
> +						  stolen->base.size);
>   	if (IS_ERR(obj->pages)) {
>   		ret = PTR_ERR(obj->pages);
>   		goto cleanup;
> @@ -577,6 +579,9 @@ _i915_gem_object_create_stolen(struct drm_device *dev,
>   	i915_gem_object_pin_pages(obj);
>   	obj->stolen = stolen;
>
> +	stolen->obj = obj;
> +	INIT_LIST_HEAD(&stolen->mm_link);
> +
>   	obj->base.read_domains = I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT;
>   	obj->cache_level = HAS_LLC(dev) ? I915_CACHE_LLC : I915_CACHE_NONE;
>
> @@ -587,18 +592,102 @@ cleanup:
>   	return ERR_PTR(ret);
>   }
>
> -struct drm_i915_gem_object *
> -i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
> +static bool
> +mark_free(struct drm_i915_gem_object *obj, struct list_head *unwind)
> +{
> +	BUG_ON(obj->stolen == NULL);
> +
> +	if (obj->madv != I915_MADV_DONTNEED)
> +		return false;
> +
> +	if (obj->pin_display)
> +		return false;
> +
> +	list_add(&obj->stolen_link, unwind);
> +	return drm_mm_scan_add_block(&obj->stolen->base);
> +}
> +
> +static int
> +stolen_evict(struct drm_i915_private *dev_priv, u64 size)
>   {
> -	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct drm_i915_gem_object *obj;
> -	struct drm_mm_node *stolen;
> -	int ret;
> +	struct list_head unwind, evict;
> +	struct i915_stolen_node *iter;
> +	int ret, active;
>
> -	if (!drm_mm_initialized(&dev_priv->mm.stolen))
> -		return ERR_PTR(-ENODEV);
> +	drm_mm_init_scan(&dev_priv->mm.stolen, size, 0, 0);
> +	INIT_LIST_HEAD(&unwind);
> +
> +	/* Retire all requests before creating the evict list */
> +	i915_gem_retire_requests(dev_priv->dev);
> +
> +	for (active = 0; active <= 1; active++) {
> +		list_for_each_entry(iter, &dev_priv->mm.stolen_list, mm_link) {
> +			if (iter->obj->active != active)
> +				continue;
> +
> +			if (mark_free(iter->obj, &unwind))
> +				goto found;
> +		}
> +	}
> +
> +found:
> +	INIT_LIST_HEAD(&evict);
> +	while (!list_empty(&unwind)) {
> +		obj = list_first_entry(&unwind,
> +				       struct drm_i915_gem_object,
> +				       stolen_link);
> +		list_del(&obj->stolen_link);
> +
> +		if (drm_mm_scan_remove_block(&obj->stolen->base)) {
> +			list_add(&obj->stolen_link, &evict);
> +			drm_gem_object_reference(&obj->base);
> +		}
> +	}
> +
> +	ret = 0;
> +	while (!list_empty(&evict)) {
> +		obj = list_first_entry(&evict,
> +				       struct drm_i915_gem_object,
> +				       stolen_link);
> +		list_del(&obj->stolen_link);
> +
> +		if (ret == 0) {
> +			struct i915_vma *vma, *vma_next;
> +
> +			list_for_each_entry_safe(vma, vma_next,
> +						 &obj->vma_list,
> +						 vma_link)
> +				if (i915_vma_unbind(vma))
> +					break;
> +
> +			/* Stolen pins its pages to prevent the
> +			 * normal shrinker from processing stolen
> +			 * objects.
> +			 */
> +			i915_gem_object_unpin_pages(obj);
> +
> +			ret = i915_gem_object_put_pages(obj);
> +			if (ret == 0) {
> +				i915_gem_object_release_stolen(obj);
> +				obj->madv = __I915_MADV_PURGED;
> +			} else {
> +				i915_gem_object_pin_pages(obj);
> +			}
> +		}
> +
> +		drm_gem_object_unreference(&obj->base);
> +	}
> +
> +	return ret;
> +}
> +
> +static struct i915_stolen_node *
> +stolen_alloc(struct drm_i915_private *dev_priv, u64 size)
> +{
> +	struct i915_stolen_node *stolen;
> +	int ret;
>
> -	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
>   	if (size == 0)
>   		return ERR_PTR(-EINVAL);
>
> @@ -606,17 +695,60 @@ i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
>   	if (!stolen)
>   		return ERR_PTR(-ENOMEM);
>
> -	ret = i915_gem_stolen_insert_node(dev_priv, stolen, size, 4096);
> +	ret = i915_gem_stolen_insert_node(dev_priv, &stolen->base, size, 4096);
> +	if (ret == 0)
> +		goto out;
> +
> +	/* No more stolen memory available, or too fragmented.
> +	 * Try evicting purgeable objects and search again.
> +	 */
> +	ret = stolen_evict(dev_priv, size);
> +	if (ret == 0)
> +		ret = i915_gem_stolen_insert_node(dev_priv, &stolen->base,
> +						  size, 4096);
> +out:
>   	if (ret) {
>   		kfree(stolen);
>   		return ERR_PTR(ret);
>   	}
>
> +	return stolen;
> +}
> +
> +/**
> + * i915_gem_object_create_stolen() - creates object using the stolen memory
> + * @dev:	drm device
> + * @size:	size of the object requested
> + *
> + * i915_gem_object_create_stolen() tries to allocate memory for the object
> + * from the stolen memory region. If not enough memory is found, it tries
> + * evicting purgeable objects and searching again.
> + *
> + * Returns: Object pointer - success and error pointer - failure
> + */
> +struct drm_i915_gem_object *
> +i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_gem_object *obj;
> +	struct i915_stolen_node *stolen;
> +
> +	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> +
> +	if (!drm_mm_initialized(&dev_priv->mm.stolen))
> +		return ERR_PTR(-ENODEV);
> +
> +	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
> +
> +	stolen = stolen_alloc(dev_priv, size);
> +	if (IS_ERR(stolen))
> +		return ERR_PTR(-ENOMEM);

Why not "return stolen" to avoid masking different errors from stolen_alloc?

> +
>   	obj = _i915_gem_object_create_stolen(dev, stolen);
>   	if (!IS_ERR(obj))
>   		return obj;
>
> -	i915_gem_stolen_remove_node(dev_priv, stolen);
> +	i915_gem_stolen_remove_node(dev_priv, &stolen->base);
>   	kfree(stolen);
>   	return obj;
>   }
> @@ -630,7 +762,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct i915_address_space *ggtt = &dev_priv->gtt.base;
>   	struct drm_i915_gem_object *obj;
> -	struct drm_mm_node *stolen;
> +	struct i915_stolen_node *stolen;
>   	struct i915_vma *vma;
>   	int ret;
>
> @@ -649,10 +781,10 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>   	if (!stolen)
>   		return ERR_PTR(-ENOMEM);
>
> -	stolen->start = stolen_offset;
> -	stolen->size = size;
> +	stolen->base.start = stolen_offset;
> +	stolen->base.size = size;
>   	mutex_lock(&dev_priv->mm.stolen_lock);
> -	ret = drm_mm_reserve_node(&dev_priv->mm.stolen, stolen);
> +	ret = drm_mm_reserve_node(&dev_priv->mm.stolen, &stolen->base);
>   	mutex_unlock(&dev_priv->mm.stolen_lock);
>   	if (ret) {
>   		DRM_DEBUG_KMS("failed to allocate stolen space\n");
> @@ -663,7 +795,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>   	obj = _i915_gem_object_create_stolen(dev, stolen);
>   	if (IS_ERR(obj)) {
>   		DRM_DEBUG_KMS("failed to allocate stolen object\n");
> -		i915_gem_stolen_remove_node(dev_priv, stolen);
> +		i915_gem_stolen_remove_node(dev_priv, &stolen->base);
>   		kfree(stolen);
>   		return obj;
>   	}
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 6dee908..03ad276 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5103,7 +5103,7 @@ static void valleyview_check_pctx(struct drm_i915_private *dev_priv)
>   	unsigned long pctx_addr = I915_READ(VLV_PCBR) & ~4095;
>
>   	WARN_ON(pctx_addr != dev_priv->mm.stolen_base +
> -			     dev_priv->vlv_pctx->stolen->start);
> +			     dev_priv->vlv_pctx->stolen->base.start);
>   }
>
>
> @@ -5177,7 +5177,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
>   		return;
>   	}
>
> -	pctx_paddr = dev_priv->mm.stolen_base + pctx->stolen->start;
> +	pctx_paddr = dev_priv->mm.stolen_base + pctx->stolen->base.start;
>   	I915_WRITE(VLV_PCBR, pctx_paddr);
>
>   out:
>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-09 12:46 ` [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects ankitprasad.r.sharma
@ 2015-12-09 16:15   ` Tvrtko Ursulin
  2015-12-09 19:39     ` Dave Gordon
  2015-12-10 10:54     ` Ankitprasad Sharma
  0 siblings, 2 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-09 16:15 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath


Hi,

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> This patch adds support for extending the pread/pwrite functionality
> for objects not backed by shmem. The access will be made through
> gtt interface. This will cover objects backed by stolen memory as well
> as other non-shmem backed objects.
>
> v2: Drop locks around slow_user_access, prefault the pages before
> access (Chris)
>
> v3: Rebased to the latest drm-intel-nightly (Ankit)
>
> v4: Moved page base & offset calculations outside the copy loop,
> corrected data types for size and offset variables, corrected if-else
> braces format (Tvrtko/kerneldocs)
>
> v5: Enabled pread/pwrite for all non-shmem backed objects including
> without tiling restrictions (Ankit)
>
> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
>
> v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
> added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
>
> v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> non-shmem backed objects (Tvrtko)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 151 +++++++++++++++++++++++++++++++++-------
>   1 file changed, 127 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ed97de6..68ed67a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
>   	return ret ? - EFAULT : 0;
>   }
>
> +static inline uint64_t
> +slow_user_access(struct io_mapping *mapping,
> +		 uint64_t page_base, int page_offset,
> +		 char __user *user_data,
> +		 int length, bool pwrite)
> +{
> +	void __iomem *vaddr_inatomic;
> +	void *vaddr;
> +	uint64_t unwritten;
> +
> +	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> +	/* We can use the cpu mem copy function because this is X86. */
> +	vaddr = (void __force *)vaddr_inatomic + page_offset;
> +	if (pwrite)
> +		unwritten = __copy_from_user(vaddr, user_data, length);
> +	else
> +		unwritten = __copy_to_user(user_data, vaddr, length);
> +
> +	io_mapping_unmap(vaddr_inatomic);
> +	return unwritten;
> +}
> +
> +static int
> +i915_gem_gtt_copy(struct drm_device *dev,
> +		   struct drm_i915_gem_object *obj, uint64_t size,
> +		   uint64_t data_offset, uint64_t data_ptr)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	char __user *user_data;
> +	uint64_t remain;
> +	uint64_t offset, page_base;
> +	int page_offset, page_length, ret = 0;
> +
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> +	if (ret)
> +		goto out;
> +
> +	ret = i915_gem_object_set_to_gtt_domain(obj, false);
> +	if (ret)
> +		goto out_unpin;
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto out_unpin;
> +
> +	user_data = to_user_ptr(data_ptr);
> +	remain = size;
> +	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +	if (likely(!i915.prefault_disable))
> +		ret = fault_in_multipages_writeable(user_data, remain);
> +
> +	/*
> +	 * page_offset = offset within page
> +	 * page_base = page offset within aperture
> +	 */
> +	page_offset = offset_in_page(offset);
> +	page_base = offset & PAGE_MASK;
> +
> +	while (remain > 0) {
> +		/* page_length = bytes to copy for this page */
> +		page_length = remain;
> +		if ((page_offset + remain) > PAGE_SIZE)
> +			page_length = PAGE_SIZE - page_offset;
> +
> +		/* This is a slow read/write as it tries to read from
> +		 * and write to user memory which may result into page
> +		 * faults
> +		 */
> +		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> +				       page_offset, user_data,
> +				       page_length, false);
> +
> +		if (ret) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		remain -= page_length;
> +		user_data += page_length;
> +		page_base += page_length;
> +		page_offset = 0;
> +	}
> +
> +	mutex_lock(&dev->struct_mutex);
> +
> +out_unpin:
> +	i915_gem_object_ggtt_unpin(obj);
> +out:
> +	return ret;
> +}
> +
>   static int
>   i915_gem_shmem_pread(struct drm_device *dev,
>   		     struct drm_i915_gem_object *obj,
> @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
>   		goto out;
>   	}
>
> -	/* prime objects have no backing filp to GEM pread/pwrite
> -	 * pages from.
> -	 */
> -	if (!obj->base.filp) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
>   	trace_i915_gem_object_pread(obj, args->offset, args->size);
>
> -	ret = i915_gem_shmem_pread(dev, obj, args, file);
> +	/* pread for non shmem backed objects */
> +	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> +		ret = i915_gem_gtt_copy(dev, obj, args->size,
> +					args->offset, args->data_ptr);
> +	else
> +		ret = i915_gem_shmem_pread(dev, obj, args, file);

Hm, it will end up calling i915_gem_shmem_pread for non-shmem backed 
objects if tiling is set. Sounds wrong to me unless I am missing something?

>
>   out:
>   	drm_gem_object_unreference(&obj->base);
> @@ -789,10 +879,12 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
>   			 struct drm_i915_gem_pwrite *args,
>   			 struct drm_file *file)
>   {
> +	struct drm_device *dev = obj->base.dev;
>   	struct drm_mm_node node;
>   	uint64_t remain, offset;
>   	char __user *user_data;
>   	int ret;
> +	bool faulted = false;
>
>   	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
>   	if (ret) {
> @@ -851,11 +943,29 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
>   		/* If we get a fault while copying data, then (presumably) our
>   		 * source page isn't available.  Return the error and we'll
>   		 * retry in the slow path.
> +		 * If the object is non-shmem backed, we retry again with the
> +		 * path that handles page fault.
>   		 */
> -		if (fast_user_write(i915->gtt.mappable, page_base,
> -				    page_offset, user_data, page_length)) {
> -			ret = -EFAULT;
> -			goto out_flush;
> +		if (faulted || fast_user_write(i915->gtt.mappable,
> +						page_base, page_offset,
> +						user_data, page_length)) {
> +			if (!obj->base.filp) {
> +				faulted = true;
> +				mutex_unlock(&dev->struct_mutex);
> +				if (slow_user_access(i915->gtt.mappable,
> +						     page_base,
> +						     page_offset, user_data,
> +						     page_length, true)) {
> +					ret = -EFAULT;
> +					mutex_lock(&dev->struct_mutex);
> +					goto out_flush;
> +				}
> +
> +				mutex_lock(&dev->struct_mutex);
> +			} else {
> +				ret = -EFAULT;
> +				goto out_flush;
> +			}
>   		}

Some questions:

1. What is the advantage of doing the slow access for non-shmem backed 
objects inside a single loop, as opposed to extracting it in a separate 
function?

For example i915_gem_gtt_pwrite_slow ? Then it could have been called 
from i915_gem_pwrite_ioctl depending on the master if statement there, 
fallback etc.

I think it would be clearer unless there is a special reason it makes 
sense to go with the fast path first and then switch to slow path at the 
point first fault is hit.

2. I have noticed the shmem pwrite slowpath makes explicit mention of 
potential changes to the object domain while the lock was dropped and 
takes care of flushing the cache in that case.

Is this something this path should do as well, or if not why not?


>   		remain -= page_length;
> @@ -1121,14 +1231,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>   		goto out;
>   	}
>
> -	/* prime objects have no backing filp to GEM pread/pwrite
> -	 * pages from.
> -	 */
> -	if (!obj->base.filp) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
>   	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
>
>   	ret = -EFAULT;
> @@ -1139,8 +1241,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>   	 * perspective, requiring manual detiling by the client.
>   	 */
>   	if (obj->tiling_mode == I915_TILING_NONE &&
> -	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> -	    cpu_write_needs_clflush(obj)) {
> +	    (!obj->base.filp ||
> +	    (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> +	    cpu_write_needs_clflush(obj)))) {
>   		ret = i915_gem_gtt_pwrite_fast(dev_priv, obj, args, file);
>   		/* Note that the gtt paths might fail with non-page-backed user
>   		 * pointers (e.g. gtt mappings when moving data between
> @@ -1150,7 +1253,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>   	if (ret == -EFAULT || ret == -ENOSPC) {
>   		if (obj->phys_handle)
>   			ret = i915_gem_phys_pwrite(obj, args, file);
> -		else
> +		else if (obj->base.filp)
>   			ret = i915_gem_shmem_pwrite(dev, obj, args, file);
>   	}
>
>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-09 12:46 ` [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation ankitprasad.r.sharma
@ 2015-12-09 17:25   ` Tvrtko Ursulin
  2015-12-09 19:24     ` Ville Syrjälä
  2015-12-10 13:17     ` Ankitprasad Sharma
  2015-12-09 19:35   ` Dave Gordon
  2015-12-10  9:43   ` Tvrtko Ursulin
  2 siblings, 2 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-09 17:25 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath


Hi,

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
>
> Ville reminded us that stolen memory is not preserved across
> hibernation, and a result of this was that context objects now being
> allocated from stolen were being corrupted on S4 and promptly hanging
> the GPU on resume.
>
> We want to utilise stolen for as much as possible (nothing else will use
> that wasted memory otherwise), so we need a strategy for handling
> general objects allocated from stolen and hibernation. A simple solution
> is to do a CPU copy through the GTT of the stolen object into a fresh
> shmemfs backing store and thenceforth treat it as a normal objects. This
> can be refined in future to either use a GPU copy to avoid the slow
> uncached reads (though it's hibernation!) and recreate stolen objects
> upon resume/first-use. For now, a simple approach should suffice for
> testing the object migration.
>
> v2:
> Swap PTE for pinned bindings over to the shmemfs. This adds a
> complicated dance, but is required as many stolen objects are likely to
> be pinned for use by the hardware. Swapping the PTEs should not result
> in externally visible behaviour, as each PTE update should be atomic and
> the two pages identical. (danvet)
>
> safe-by-default, or the principle of least surprise. We need a new flag
> to mark objects that we can wilfully discard and recreate across
> hibernation. (danvet)
>
> Just use the global_list rather than invent a new stolen_list. This is
> the slowpath hibernate and so adding a new list and the associated
> complexity isn't worth it.
>
> v3: Rebased on drm-intel-nightly (Ankit)
>
> v4: Use insert_page to map stolen memory backed pages for migration to
> shmem (Chris)
>
> v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
>   drivers/gpu/drm/i915/i915_drv.h         |   7 +
>   drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/intel_display.c    |   3 +
>   drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
>   drivers/gpu/drm/i915/intel_pm.c         |   2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
>   7 files changed, 261 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 9f55209..2bb9e9e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
>   	return i915_drm_suspend(drm_dev);
>   }
>
> +static int i915_pm_freeze(struct device *dev)
> +{
> +	int ret;
> +
> +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> +	if (ret)
> +		return ret;

Can we distinguish between S3 and S4 if the stolen corruption only 
happens in S4? Not to spend all the extra effort for nothing in S3? Or 
maybe this is not even called for S3?

> +
> +	ret = i915_pm_suspend(dev);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
>   static int i915_pm_suspend_late(struct device *dev)
>   {
>   	struct drm_device *drm_dev = dev_to_i915(dev)->dev;
> @@ -1700,7 +1715,7 @@ static const struct dev_pm_ops i915_pm_ops = {
>   	 * @restore, @restore_early : called after rebooting and restoring the
>   	 *                            hibernation image [PMSG_RESTORE]
>   	 */
> -	.freeze = i915_pm_suspend,
> +	.freeze = i915_pm_freeze,
>   	.freeze_late = i915_pm_suspend_late,
>   	.thaw_early = i915_pm_resume_early,
>   	.thaw = i915_pm_resume,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e0b09b0..0d18b07 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2080,6 +2080,12 @@ struct drm_i915_gem_object {
>   	 * Advice: are the backing pages purgeable?
>   	 */
>   	unsigned int madv:2;
> +	/**
> +	 * Whereas madv is for userspace, there are certain situations
> +	 * where we want I915_MADV_DONTNEED behaviour on internal objects
> +	 * without conflating the userspace setting.
> +	 */
> +	unsigned int internal_volatile:1;
>
>   	/**
>   	 * Current tiling mode for the object.
> @@ -3006,6 +3012,7 @@ int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice);
>   void i915_gem_init_swizzling(struct drm_device *dev);
>   void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
>   int __must_check i915_gpu_idle(struct drm_device *dev);
> +int __must_check i915_gem_freeze(struct drm_device *dev);
>   int __must_check i915_gem_suspend(struct drm_device *dev);
>   void __i915_add_request(struct drm_i915_gem_request *req,
>   			struct drm_i915_gem_object *batch_obj,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 68ed67a..1f134b0 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4511,12 +4511,27 @@ static const struct drm_i915_gem_object_ops i915_gem_object_ops = {
>   	.put_pages = i915_gem_object_put_pages_gtt,
>   };
>
> +static struct address_space *
> +i915_gem_set_inode_gfp(struct drm_device *dev, struct file *file)
> +{
> +	struct address_space *mapping = file_inode(file)->i_mapping;
> +	gfp_t mask;
> +
> +	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> +	if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
> +		/* 965gm cannot relocate objects above 4GiB. */
> +		mask &= ~__GFP_HIGHMEM;
> +		mask |= __GFP_DMA32;
> +	}
> +	mapping_set_gfp_mask(mapping, mask);
> +
> +	return mapping;
> +}
> +
>   struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>   						  size_t size)
>   {
>   	struct drm_i915_gem_object *obj;
> -	struct address_space *mapping;
> -	gfp_t mask;
>   	int ret;
>
>   	obj = i915_gem_object_alloc(dev);
> @@ -4529,15 +4544,7 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
>   		return ERR_PTR(ret);
>   	}
>
> -	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> -	if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
> -		/* 965gm cannot relocate objects above 4GiB. */
> -		mask &= ~__GFP_HIGHMEM;
> -		mask |= __GFP_DMA32;
> -	}
> -
> -	mapping = file_inode(obj->base.filp)->i_mapping;
> -	mapping_set_gfp_mask(mapping, mask);
> +	i915_gem_set_inode_gfp(dev, obj->base.filp);
>
>   	i915_gem_object_init(obj, &i915_gem_object_ops);
>
> @@ -4714,6 +4721,209 @@ i915_gem_stop_ringbuffers(struct drm_device *dev)
>   		dev_priv->gt.stop_ring(ring);
>   }
>
> +static int
> +i915_gem_object_migrate_stolen_to_shmemfs(struct drm_i915_gem_object *obj)
> +{

Some documentation for this function would be good.

> +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> +	struct i915_vma *vma, *vn;
> +	struct drm_mm_node node;
> +	struct file *file;
> +	struct address_space *mapping;
> +	struct sg_table *stolen_pages, *shmemfs_pages;
> +	int ret, i;
> +
> +	if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
> +		return -EINVAL;

I am no expert in hibernation or swizzling but this looks really bad to me.

It is both platform and user controlled and it will cause hibernation to 
fail in a very noisy way, correct?

At least it needs to be WARN_ON_ONCE, but if my thinking is correct it 
should really be that either:

a) hibernation is prevented in a quieter way (DRM_ERROR, once) 
altogether when dev_priv->mm.bit_6_swizzle_x == 
I915_BIT_6_SWIZZLE_9_10_17, or

b) set_tiling fails on the same platforms which also support hibernation.

Comments?

> +
> +	ret = i915_gem_object_set_to_gtt_domain(obj, false);
> +	if (ret)
> +		return ret;
> +
> +	file = shmem_file_setup("drm mm object", obj->base.size, VM_NORESERVE);
> +	if (IS_ERR(file))
> +		return PTR_ERR(file);
> +	mapping = i915_gem_set_inode_gfp(obj->base.dev, file);
> +
> +	list_for_each_entry_safe(vma, vn, &obj->vma_list, vma_link)
> +		if (i915_vma_unbind(vma))
> +			continue;
> +
> +	if (obj->madv != I915_MADV_WILLNEED && list_empty(&obj->vma_list)) {
> +		/* Discard the stolen reservation, and replace with
> +		 * an unpopulated shmemfs object.
> +		 */
> +		obj->madv = __I915_MADV_PURGED;
> +		goto swap_pages;
> +	}

Maybe put a comment before this block saying "no need to copy 
content/something for objects...", if I got it right.

> +
> +	/* stolen objects are already pinned to prevent shrinkage */
> +	memset(&node, 0, sizeof(node));
> +	ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> +						  &node,
> +						  4096, 0, I915_CACHE_NONE,
> +						  0, i915->gtt.mappable_end,
> +						  DRM_MM_SEARCH_DEFAULT,
> +						  DRM_MM_CREATE_DEFAULT);
> +	if (ret)
> +		return ret;

If there is a likelyhood global gtt can be full would it be worth it 
trying to evict something before attempting hibernation?

Also leaks file.

> +
> +	for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
> +		struct page *page;
> +		void *__iomem src;
> +		void *dst;
> +
> +		wmb();

What is this one for? If it is for the memcpy_fromio would it be more 
obvious to put it after that call?

> +		i915->gtt.base.insert_page(&i915->gtt.base,
> +					   i915_gem_object_get_dma_address(obj, i),
> +					   node.start,
> +					   I915_CACHE_NONE,
> +					   0);
> +		wmb();
> +
> +		page = shmem_read_mapping_page(mapping, i);
> +		if (IS_ERR(page)) {
> +			ret = PTR_ERR(page);
> +			goto err_node;
> +		}
> +
> +		src = io_mapping_map_atomic_wc(i915->gtt.mappable, node.start + PAGE_SIZE * i);
> +		dst = kmap_atomic(page);
> +		memcpy_fromio(dst, src, PAGE_SIZE);
> +		kunmap_atomic(dst);
> +		io_mapping_unmap_atomic(src);
> +
> +		page_cache_release(page);

I assume shmem_file_setup takes one reference to each page, 
shmem_read_mapping_page another and then here we release that extra one? Or?

> +	}
> +
> +	wmb();
> +	i915->gtt.base.clear_range(&i915->gtt.base,
> +				   node.start, node.size,
> +				   true);
> +	drm_mm_remove_node(&node);

Maybe move the whole copy content loop into a helper for readability?

> +
> +swap_pages:
> +	stolen_pages = obj->pages;
> +	obj->pages = NULL;
> +
> +	obj->base.filp = file;
> +	obj->base.read_domains = I915_GEM_DOMAIN_CPU;
> +	obj->base.write_domain = I915_GEM_DOMAIN_CPU;
> +
> +	/* Recreate any pinned binding with pointers to the new storage */
> +	if (!list_empty(&obj->vma_list)) {
> +		ret = i915_gem_object_get_pages_gtt(obj);
> +		if (ret) {
> +			obj->pages = stolen_pages;
> +			goto err_file;
> +		}
> +
> +		ret = i915_gem_object_set_to_gtt_domain(obj, true);
> +		if (ret) {
> +			i915_gem_object_put_pages_gtt(obj);
> +			obj->pages = stolen_pages;
> +			goto err_file;
> +		}
> +
> +		obj->get_page.sg = obj->pages->sgl;
> +		obj->get_page.last = 0;
> +
> +		list_for_each_entry(vma, &obj->vma_list, vma_link) {
> +			if (!drm_mm_node_allocated(&vma->node))
> +				continue;
> +
> +			WARN_ON(i915_vma_bind(vma,
> +					      obj->cache_level,
> +					      PIN_UPDATE));
> +		}
> +	} else
> +		list_del(&obj->global_list);

Hm, can it be bound if there were no VMAs?

> +
> +	/* drop the stolen pin and backing */
> +	shmemfs_pages = obj->pages;
> +	obj->pages = stolen_pages;
> +
> +	i915_gem_object_unpin_pages(obj);
> +	obj->ops->put_pages(obj);
> +	if (obj->ops->release)
> +		obj->ops->release(obj);
> +
> +	obj->ops = &i915_gem_object_ops;
> +	obj->pages = shmemfs_pages;
> +
> +	return 0;
> +
> +err_node:
> +	wmb();
> +	i915->gtt.base.clear_range(&i915->gtt.base,
> +				   node.start, node.size,
> +				   true);
> +	drm_mm_remove_node(&node);
> +err_file:
> +	fput(file);
> +	obj->base.filp = NULL;
> +	return ret;
> +}
> +
> +int
> +i915_gem_freeze(struct drm_device *dev)
> +{
> +	/* Called before i915_gem_suspend() when hibernating */
> +	struct drm_i915_private *i915 = to_i915(dev);
> +	struct drm_i915_gem_object *obj, *tmp;
> +	struct list_head *phase[] = {
> +		&i915->mm.unbound_list, &i915->mm.bound_list, NULL
> +	}, **p;
> +	int ret;
> +
> +	ret = i915_mutex_lock_interruptible(dev);
> +	if (ret)
> +		return ret;
> +	/* Across hibernation, the stolen area is not preserved.
> +	 * Anything inside stolen must copied back to normal
> +	 * memory if we wish to preserve it.
> +	 */
> +	for (p = phase; *p; p++) {
> +		struct list_head migrate;
> +		int ret;
> +
> +		INIT_LIST_HEAD(&migrate);
> +		list_for_each_entry_safe(obj, tmp, *p, global_list) {
> +			if (obj->stolen == NULL)
> +				continue;
> +
> +			if (obj->internal_volatile)
> +				continue;
> +
> +			/* In the general case, this object may only be alive
> +			 * due to an active reference, and that may disappear
> +			 * when we unbind any of the objects (and so wait upon
> +			 * the GPU and retire requests). To prevent one of the
> +			 * objects from disappearing beneath us, we need to
> +			 * take a reference to each as we build the migration
> +			 * list.
> +			 *
> +			 * This is similar to the strategy required whilst
> +			 * shrinking or evicting objects (for the same reason).
> +			 */
> +			drm_gem_object_reference(&obj->base);
> +			list_move(&obj->global_list, &migrate);
> +		}
> +
> +		ret = 0;
> +		list_for_each_entry_safe(obj, tmp, &migrate, global_list) {
> +			if (ret == 0)
> +				ret = i915_gem_object_migrate_stolen_to_shmemfs(obj);
> +			drm_gem_object_unreference(&obj->base);
> +		}
> +		list_splice(&migrate, *p);

Hmmm are this some clever games with obj->global_list ?

> +		if (ret)
> +			break;
> +	}
> +
> +	mutex_unlock(&dev->struct_mutex);
> +	return ret;
> +}
> +
>   int
>   i915_gem_suspend(struct drm_device *dev)
>   {
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index f281e0b..0803922 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -2549,6 +2549,9 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
>   	if (IS_ERR(obj))
>   		return false;
>
> +	/* Not to be preserved across hibernation */
> +	obj->internal_volatile = true;
> +
>   	obj->tiling_mode = plane_config->tiling;
>   	if (obj->tiling_mode == I915_TILING_X)
>   		obj->stride = fb->pitches[0];
> diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> index f43681e..1d89253 100644
> --- a/drivers/gpu/drm/i915/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> @@ -154,6 +154,12 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
>   		goto out;
>   	}
>
> +	/* Discard the contents of the BIOS fb across hibernation.
> +	 * We really want to completely throwaway the earlier fbdev
> +	 * and reconfigure it anyway.
> +	 */
> +	obj->internal_volatile = true;
> +
>   	fb = __intel_framebuffer_create(dev, &mode_cmd, obj);
>   	if (IS_ERR(fb)) {
>   		ret = PTR_ERR(fb);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 03ad276..6ddc20a 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5181,6 +5181,8 @@ static void valleyview_setup_pctx(struct drm_device *dev)
>   	I915_WRITE(VLV_PCBR, pctx_paddr);
>
>   out:
> +	/* The power context need not be preserved across hibernation */
> +	pctx->internal_volatile = true;
>   	DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
>   	dev_priv->vlv_pctx = pctx;
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 5eabaf6..370d96a 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2090,6 +2090,12 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
>   	if (IS_ERR(obj))
>   		return PTR_ERR(obj);
>
> +	/* Ringbuffer objects are by definition volatile - only the commands
> +	 * between HEAD and TAIL need to be preserved and whilst there are
> +	 * any commands there, the ringbuffer is pinned by activity.
> +	 */
> +	obj->internal_volatile = true;
> +

What does this mean? It gets correctly re-initialized by existing code 
on resume? Don't see anythign specific about HEAD and TAIL in this patch.

>   	/* mark ring buffers as read-only from GPU side by default */
>   	obj->gt_ro = 1;
>
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-09 17:25   ` Tvrtko Ursulin
@ 2015-12-09 19:24     ` Ville Syrjälä
  2015-12-10 13:17     ` Ankitprasad Sharma
  1 sibling, 0 replies; 47+ messages in thread
From: Ville Syrjälä @ 2015-12-09 19:24 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: ankitprasad.r.sharma, intel-gfx, akash.goel, shashidhar.hiremath

On Wed, Dec 09, 2015 at 05:25:19PM +0000, Tvrtko Ursulin wrote:
> 
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> >
> > Ville reminded us that stolen memory is not preserved across
> > hibernation, and a result of this was that context objects now being
> > allocated from stolen were being corrupted on S4 and promptly hanging
> > the GPU on resume.
> >
> > We want to utilise stolen for as much as possible (nothing else will use
> > that wasted memory otherwise), so we need a strategy for handling
> > general objects allocated from stolen and hibernation. A simple solution
> > is to do a CPU copy through the GTT of the stolen object into a fresh
> > shmemfs backing store and thenceforth treat it as a normal objects. This
> > can be refined in future to either use a GPU copy to avoid the slow
> > uncached reads (though it's hibernation!) and recreate stolen objects
> > upon resume/first-use. For now, a simple approach should suffice for
> > testing the object migration.
> >
> > v2:
> > Swap PTE for pinned bindings over to the shmemfs. This adds a
> > complicated dance, but is required as many stolen objects are likely to
> > be pinned for use by the hardware. Swapping the PTEs should not result
> > in externally visible behaviour, as each PTE update should be atomic and
> > the two pages identical. (danvet)
> >
> > safe-by-default, or the principle of least surprise. We need a new flag
> > to mark objects that we can wilfully discard and recreate across
> > hibernation. (danvet)
> >
> > Just use the global_list rather than invent a new stolen_list. This is
> > the slowpath hibernate and so adding a new list and the associated
> > complexity isn't worth it.
> >
> > v3: Rebased on drm-intel-nightly (Ankit)
> >
> > v4: Use insert_page to map stolen memory backed pages for migration to
> > shmem (Chris)
> >
> > v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
> >   drivers/gpu/drm/i915/i915_drv.h         |   7 +
> >   drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
> >   drivers/gpu/drm/i915/intel_display.c    |   3 +
> >   drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
> >   drivers/gpu/drm/i915/intel_pm.c         |   2 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
> >   7 files changed, 261 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 9f55209..2bb9e9e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
> >   	return i915_drm_suspend(drm_dev);
> >   }
> >
> > +static int i915_pm_freeze(struct device *dev)
> > +{
> > +	int ret;
> > +
> > +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> > +	if (ret)
> > +		return ret;
> 
> Can we distinguish between S3 and S4 if the stolen corruption only 
> happens in S4? Not to spend all the extra effort for nothing in S3? Or 
> maybe this is not even called for S3?

The hook is only for hibernation as explained in the nice comment
Imre added next to the function pointer assignments.

It actually gets called for both the freeze and quiesce transitions.
We should only need it for freeze. I'm not sure if the PMSG_ thing
gets stored anywhere that we could look it up and skip this for 
quiesce. And not sure if ayone really cares that much. I don't,
since I don't even load i915 for the loader kernel.

https://bugs.freedesktop.org/show_bug.cgi?id=91295 actually says
we might need this for S3 too if rabidstart is enabled. I have a
laptop that supports it, but I don't have a clue how what kind of
partition it would need. Not that I would be willing to repartition
the disk anyway. Judging by drivers/platform/x86/intel-rst.c,
maybe we could just look for the INT3392 ACPI device, or something?

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-09 12:46 ` [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation ankitprasad.r.sharma
  2015-12-09 17:25   ` Tvrtko Ursulin
@ 2015-12-09 19:35   ` Dave Gordon
  2015-12-10  9:43   ` Tvrtko Ursulin
  2 siblings, 0 replies; 47+ messages in thread
From: Dave Gordon @ 2015-12-09 19:35 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
>
> Ville reminded us that stolen memory is not preserved across
> hibernation, and a result of this was that context objects now being
> allocated from stolen were being corrupted on S4 and promptly hanging
> the GPU on resume.
>
> We want to utilise stolen for as much as possible (nothing else will use
> that wasted memory otherwise), so we need a strategy for handling
> general objects allocated from stolen and hibernation. A simple solution
> is to do a CPU copy through the GTT of the stolen object into a fresh
> shmemfs backing store and thenceforth treat it as a normal objects. This
> can be refined in future to either use a GPU copy to avoid the slow
> uncached reads (though it's hibernation!) and recreate stolen objects
> upon resume/first-use. For now, a simple approach should suffice for
> testing the object migration.
>
> v2:
> Swap PTE for pinned bindings over to the shmemfs. This adds a
> complicated dance, but is required as many stolen objects are likely to
> be pinned for use by the hardware. Swapping the PTEs should not result
> in externally visible behaviour, as each PTE update should be atomic and
> the two pages identical. (danvet)
>
> safe-by-default, or the principle of least surprise. We need a new flag
> to mark objects that we can wilfully discard and recreate across
> hibernation. (danvet)
>
> Just use the global_list rather than invent a new stolen_list. This is
> the slowpath hibernate and so adding a new list and the associated
> complexity isn't worth it.
>
> v3: Rebased on drm-intel-nightly (Ankit)
>
> v4: Use insert_page to map stolen memory backed pages for migration to
> shmem (Chris)
>
> v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
>   drivers/gpu/drm/i915/i915_drv.h         |   7 +
>   drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/intel_display.c    |   3 +
>   drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
>   drivers/gpu/drm/i915/intel_pm.c         |   2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
>   7 files changed, 261 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 9f55209..2bb9e9e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
>   	return i915_drm_suspend(drm_dev);
>   }
>
> +static int i915_pm_freeze(struct device *dev)
> +{
> +	int ret;
> +
> +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> +	if (ret)
> +		return ret;
> +
> +	ret = i915_pm_suspend(dev);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
>   static int i915_pm_suspend_late(struct device *dev)
>   {
>   	struct drm_device *drm_dev = dev_to_i915(dev)->dev;
> @@ -1700,7 +1715,7 @@ static const struct dev_pm_ops i915_pm_ops = {
>   	 * @restore, @restore_early : called after rebooting and restoring the
>   	 *                            hibernation image [PMSG_RESTORE]
>   	 */
> -	.freeze = i915_pm_suspend,
> +	.freeze = i915_pm_freeze,
>   	.freeze_late = i915_pm_suspend_late,
>   	.thaw_early = i915_pm_resume_early,
>   	.thaw = i915_pm_resume,
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e0b09b0..0d18b07 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2080,6 +2080,12 @@ struct drm_i915_gem_object {
>   	 * Advice: are the backing pages purgeable?
>   	 */
>   	unsigned int madv:2;
> +	/**
> +	 * Whereas madv is for userspace, there are certain situations
> +	 * where we want I915_MADV_DONTNEED behaviour on internal objects
> +	 * without conflating the userspace setting.
> +	 */
> +	unsigned int internal_volatile:1;

Does this new flag need to be examined by other code that currently 
checks 'madv', e.g. put_pages() ? Or does this indicate 
not-really-volatile-in-normal-use-only-across-hibernation ?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-09 16:15   ` Tvrtko Ursulin
@ 2015-12-09 19:39     ` Dave Gordon
  2015-12-10 11:12       ` Ankitprasad Sharma
  2015-12-11 18:15       ` Daniel Vetter
  2015-12-10 10:54     ` Ankitprasad Sharma
  1 sibling, 2 replies; 47+ messages in thread
From: Dave Gordon @ 2015-12-09 19:39 UTC (permalink / raw)
  To: Tvrtko Ursulin, ankitprasad.r.sharma, intel-gfx
  Cc: akash.goel, shashidhar.hiremath

On 09/12/15 16:15, Tvrtko Ursulin wrote:
>
> Hi,
>
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
>> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>
>> This patch adds support for extending the pread/pwrite functionality
>> for objects not backed by shmem. The access will be made through
>> gtt interface. This will cover objects backed by stolen memory as well
>> as other non-shmem backed objects.
>>
>> v2: Drop locks around slow_user_access, prefault the pages before
>> access (Chris)
>>
>> v3: Rebased to the latest drm-intel-nightly (Ankit)
>>
>> v4: Moved page base & offset calculations outside the copy loop,
>> corrected data types for size and offset variables, corrected if-else
>> braces format (Tvrtko/kerneldocs)
>>
>> v5: Enabled pread/pwrite for all non-shmem backed objects including
>> without tiling restrictions (Ankit)
>>
>> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
>>
>> v7: Updated commit message, Renamed i915_gem_gtt_read to
>> i915_gem_gtt_copy,
>> added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
>>
>> v8: Updated v7 commit message, mutex unlock around pwrite slow path for
>> non-shmem backed objects (Tvrtko)
>>
>> Testcase: igt/gem_stolen
>>
>> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem.c | 151
>> +++++++++++++++++++++++++++++++++-------
>>   1 file changed, 127 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index ed97de6..68ed67a 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int
>> shmem_page_offset, int page_length,
>>       return ret ? - EFAULT : 0;
>>   }
>>
>> +static inline uint64_t
>> +slow_user_access(struct io_mapping *mapping,
>> +         uint64_t page_base, int page_offset,
>> +         char __user *user_data,
>> +         int length, bool pwrite)
>> +{
>> +    void __iomem *vaddr_inatomic;
>> +    void *vaddr;
>> +    uint64_t unwritten;
>> +
>> +    vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
>> +    /* We can use the cpu mem copy function because this is X86. */
>> +    vaddr = (void __force *)vaddr_inatomic + page_offset;
>> +    if (pwrite)
>> +        unwritten = __copy_from_user(vaddr, user_data, length);
>> +    else
>> +        unwritten = __copy_to_user(user_data, vaddr, length);
>> +
>> +    io_mapping_unmap(vaddr_inatomic);
>> +    return unwritten;
>> +}
>> +
>> +static int
>> +i915_gem_gtt_copy(struct drm_device *dev,
>> +           struct drm_i915_gem_object *obj, uint64_t size,
>> +           uint64_t data_offset, uint64_t data_ptr)
>> +{
>> +    struct drm_i915_private *dev_priv = dev->dev_private;
>> +    char __user *user_data;
>> +    uint64_t remain;
>> +    uint64_t offset, page_base;
>> +    int page_offset, page_length, ret = 0;
>> +
>> +    ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
>> +    if (ret)
>> +        goto out;
>> +
>> +    ret = i915_gem_object_set_to_gtt_domain(obj, false);
>> +    if (ret)
>> +        goto out_unpin;
>> +
>> +    ret = i915_gem_object_put_fence(obj);
>> +    if (ret)
>> +        goto out_unpin;
>> +
>> +    user_data = to_user_ptr(data_ptr);
>> +    remain = size;
>> +    offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
>> +
>> +    mutex_unlock(&dev->struct_mutex);
>> +    if (likely(!i915.prefault_disable))
>> +        ret = fault_in_multipages_writeable(user_data, remain);
>> +
>> +    /*
>> +     * page_offset = offset within page
>> +     * page_base = page offset within aperture
>> +     */
>> +    page_offset = offset_in_page(offset);
>> +    page_base = offset & PAGE_MASK;
>> +
>> +    while (remain > 0) {
>> +        /* page_length = bytes to copy for this page */
>> +        page_length = remain;
>> +        if ((page_offset + remain) > PAGE_SIZE)
>> +            page_length = PAGE_SIZE - page_offset;
>> +
>> +        /* This is a slow read/write as it tries to read from
>> +         * and write to user memory which may result into page
>> +         * faults
>> +         */
>> +        ret = slow_user_access(dev_priv->gtt.mappable, page_base,
>> +                       page_offset, user_data,
>> +                       page_length, false);
>> +
>> +        if (ret) {
>> +            ret = -EFAULT;
>> +            break;
>> +        }
>> +
>> +        remain -= page_length;
>> +        user_data += page_length;
>> +        page_base += page_length;
>> +        page_offset = 0;
>> +    }
>> +
>> +    mutex_lock(&dev->struct_mutex);
>> +
>> +out_unpin:
>> +    i915_gem_object_ggtt_unpin(obj);
>> +out:
>> +    return ret;
>> +}
>> +
>>   static int
>>   i915_gem_shmem_pread(struct drm_device *dev,
>>                struct drm_i915_gem_object *obj,
>> @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev,
>> void *data,
>>           goto out;
>>       }
>>
>> -    /* prime objects have no backing filp to GEM pread/pwrite
>> -     * pages from.
>> -     */
>> -    if (!obj->base.filp) {
>> -        ret = -EINVAL;
>> -        goto out;
>> -    }
>> -
>>       trace_i915_gem_object_pread(obj, args->offset, args->size);
>>
>> -    ret = i915_gem_shmem_pread(dev, obj, args, file);
>> +    /* pread for non shmem backed objects */
>> +    if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
>> +        ret = i915_gem_gtt_copy(dev, obj, args->size,
>> +                    args->offset, args->data_ptr);
>> +    else
>> +        ret = i915_gem_shmem_pread(dev, obj, args, file);
>
> Hm, it will end up calling i915_gem_shmem_pread for non-shmem backed
> objects if tiling is set. Sounds wrong to me unless I am missing something?

Which GEM objects have obj->base.filp set? Is it ONLY regular gtt-type 
objects? What about (phys, stolen, userptr, dmabuf, ...?) Which of these 
is the alternate path going to work with?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-09 12:46 ` [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation ankitprasad.r.sharma
  2015-12-09 17:25   ` Tvrtko Ursulin
  2015-12-09 19:35   ` Dave Gordon
@ 2015-12-10  9:43   ` Tvrtko Ursulin
  2015-12-10 13:17     ` Ankitprasad Sharma
  2 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-10  9:43 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath

Hi,

Two more comments below:

On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
>
> Ville reminded us that stolen memory is not preserved across
> hibernation, and a result of this was that context objects now being
> allocated from stolen were being corrupted on S4 and promptly hanging
> the GPU on resume.
>
> We want to utilise stolen for as much as possible (nothing else will use
> that wasted memory otherwise), so we need a strategy for handling
> general objects allocated from stolen and hibernation. A simple solution
> is to do a CPU copy through the GTT of the stolen object into a fresh
> shmemfs backing store and thenceforth treat it as a normal objects. This
> can be refined in future to either use a GPU copy to avoid the slow
> uncached reads (though it's hibernation!) and recreate stolen objects
> upon resume/first-use. For now, a simple approach should suffice for
> testing the object migration.

Mention of "testing" in the commit message and absence of a path to 
migrate the objects back to stolen memory on resume makes me think this 
is kind of half finished and note really ready for review / merge ?

Because I don't see how it is useful to migrate it one way and never 
move back?

>
> v2:
> Swap PTE for pinned bindings over to the shmemfs. This adds a
> complicated dance, but is required as many stolen objects are likely to
> be pinned for use by the hardware. Swapping the PTEs should not result
> in externally visible behaviour, as each PTE update should be atomic and
> the two pages identical. (danvet)
>
> safe-by-default, or the principle of least surprise. We need a new flag
> to mark objects that we can wilfully discard and recreate across
> hibernation. (danvet)
>
> Just use the global_list rather than invent a new stolen_list. This is
> the slowpath hibernate and so adding a new list and the associated
> complexity isn't worth it.
>
> v3: Rebased on drm-intel-nightly (Ankit)
>
> v4: Use insert_page to map stolen memory backed pages for migration to
> shmem (Chris)
>
> v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
>   drivers/gpu/drm/i915/i915_drv.h         |   7 +
>   drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/intel_display.c    |   3 +
>   drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
>   drivers/gpu/drm/i915/intel_pm.c         |   2 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
>   7 files changed, 261 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 9f55209..2bb9e9e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
>   	return i915_drm_suspend(drm_dev);
>   }
>
> +static int i915_pm_freeze(struct device *dev)
> +{
> +	int ret;
> +
> +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> +	if (ret)
> +		return ret;

One of the first steps in idling GEM seems to be idling the GPU and 
retiring requests.

Would it also make sense to do those steps before attempting to migrate 
the stolen objects?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT
  2015-12-09 13:26   ` Dave Gordon
@ 2015-12-10 10:02     ` Ankitprasad Sharma
  0 siblings, 0 replies; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-10 10:02 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx

On Wed, 2015-12-09 at 13:26 +0000, Dave Gordon wrote:
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> >
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> >
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h |  1 +
> >   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 80 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> >   				    int *needs_clflush);
> >
> >   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >
> >   static inline int __sg_page_count(struct scatterlist *sg)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> >   	drm_gem_object_unreference(&obj->base);
> >   	return ERR_PTR(ret);
> >   }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +	int ret, i;
> > +	char __iomem *base;
> > +	size_t size = obj->base.size;
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_mm_node node;
> > +
> > +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> > +	if (ret) {
> > +		memset(&node, 0, sizeof(node));
> > +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > +							  &node, 4096, 0,
> > +							  I915_CACHE_NONE, 0,
> > +							  i915->gtt.mappable_end,
> > +							  DRM_MM_SEARCH_DEFAULT,
> > +							  DRM_MM_CREATE_DEFAULT);
> > +		if (ret)
> > +			goto out;
> > +
> > +		i915_gem_object_pin_pages(obj);
> > +	} else {
> > +		node.start = i915_gem_obj_ggtt_offset(obj);
> > +		node.allocated = false;
> > +	}
> > +
> > +	ret = i915_gem_object_put_fence(obj);
> > +	if (ret)
> > +		goto unpin;
> > +
> > +	if (node.allocated) {
> > +		for (i = 0; i < size/PAGE_SIZE; i++) {
> > +			wmb();
> > +			i915->gtt.base.insert_page(&i915->gtt.base,
> > +					i915_gem_object_get_dma_address(obj, i),
> > +					node.start,
> > +					I915_CACHE_NONE,
> > +					0);
> > +			wmb();
> > +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> > +			memset_io(base, 0, 4096);
> > +			iounmap(base);
> > +		}
> > +	} else {
> > +		/* Get the CPU virtual address of the buffer */
> > +		base = ioremap_wc(i915->gtt.mappable_base +
> > +				  node.start, size);
> > +		if (base == NULL) {
> > +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> > +			ret = -ENOSPC;
> > +			goto unpin;
> > +		}
> > +
> > +		memset_io(base, 0, size);
> > +		iounmap(base);
> > +	}
> > +unpin:
> > +	if (node.allocated) {
> > +		wmb();
> > +		i915->gtt.base.clear_range(&i915->gtt.base,
> > +				node.start, node.size,
> > +				true);
> > +		drm_mm_remove_node(&node);
> > +		i915_gem_object_unpin_pages(obj);
> > +	}
> > +	else {
> > +		i915_gem_object_ggtt_unpin(obj);
> > +	}
> > +out:
> > +	return ret;
> > +}
> 
> This is effectively two functions interleaved, as shown by the repeated 
> if (node.allocated) tests. Would it not be clearer to have the mainline 
> function deal only with the GTT-pinned case, and a separate function for 
> the page-by-page version, called as a fallback if pinning fails?
> 
> int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> {
> 	int ret, i;
> 	char __iomem *base;
> 	size_t size = obj->base.size;
> 	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> 
> 	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> 	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE|PIN_NONBLOCK);
> 	if (ret)
> 		return __i915_obj_clear_by_pages(...);
> 
> 	... mainline (fast) code here ...
> 
> 	return ret;
> }
> 
> static int __i915_obj_clear_by_pages(...);
> {
> 	... complicated page-by-page fallback code here ...
> }
> 
This is good to separate the page-by-page path to not make the code
messy, Also I kind of liked Chris' suggestion to not use ioremap_wc() as
it could easily exhaust kernel space.

To make it less messy and more robust, I would prefer to use only the
page-by-page path (no need to even try mapping the full object), with
io_mapping_map_wc()

Thanks,
Ankit
> 


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT
  2015-12-09 13:57   ` Tvrtko Ursulin
@ 2015-12-10 10:23     ` Ankitprasad Sharma
  0 siblings, 0 replies; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-10 10:23 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Wed, 2015-12-09 at 13:57 +0000, Tvrtko Ursulin wrote:
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> >
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> >
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h |  1 +
> >   drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 80 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> >   				    int *needs_clflush);
> >
> >   int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >
> >   static inline int __sg_page_count(struct scatterlist *sg)
> >   {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> >   	drm_gem_object_unreference(&obj->base);
> >   	return ERR_PTR(ret);
> >   }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +	int ret, i;
> > +	char __iomem *base;
> > +	size_t size = obj->base.size;
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_mm_node node;
> > +
> > +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> 
> Hm, I thought Chrises suggestion was not to even try mapping all of it 
> into GTT but just go page by page?
> 
Yes, I will modify this to use only the page-by-page approach.
> If I misunderstood that then I agree with Dave's comment that it should 
> be split in two helper functions.
> 
> > +	if (ret) {
> > +		memset(&node, 0, sizeof(node));
> > +		ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > +							  &node, 4096, 0,
> > +							  I915_CACHE_NONE, 0,
> > +							  i915->gtt.mappable_end,
> > +							  DRM_MM_SEARCH_DEFAULT,
> > +							  DRM_MM_CREATE_DEFAULT);
> > +		if (ret)
> > +			goto out;
> > +
> > +		i915_gem_object_pin_pages(obj);
> > +	} else {
> > +		node.start = i915_gem_obj_ggtt_offset(obj);
> > +		node.allocated = false;
> 
> This looks very hacky anyway and I would not recommend it.
> 
> > +	}
> > +
> > +	ret = i915_gem_object_put_fence(obj);
> > +	if (ret)
> > +		goto unpin;
> > +
> > +	if (node.allocated) {
> > +		for (i = 0; i < size/PAGE_SIZE; i++) {
> > +			wmb();
> 
> What is this barreier for? Shouldn't the one after writting out the PTEs 
> and before remapping be enough?
This is to be fully on the safer side, to avoid any overlapping done by
the compiler across the iterations and that the loop instructions are
strictly executed in the program order.

Having just one barrier will ensure that insert_page and subsequent
ioremap are done in that order but the end of one iteration can still
overlap the start of next iteration.
> 
> > +			i915->gtt.base.insert_page(&i915->gtt.base,
> > +					i915_gem_object_get_dma_address(obj, i),
> > +					node.start,
> > +					I915_CACHE_NONE,
> > +					0);
> > +			wmb();
> > +			base = ioremap_wc(i915->gtt.mappable_base + node.start, 4096);
> > +			memset_io(base, 0, 4096);
> > +			iounmap(base);
> > +		}
> > +	} else {
> > +		/* Get the CPU virtual address of the buffer */
> > +		base = ioremap_wc(i915->gtt.mappable_base +
> > +				  node.start, size);
> > +		if (base == NULL) {
> > +			DRM_ERROR("Mapping of gem object to CPU failed!\n");
> > +			ret = -ENOSPC;
> > +			goto unpin;
> > +		}
> > +
> > +		memset_io(base, 0, size);
> > +		iounmap(base);
> > +	}
> > +unpin:
> > +	if (node.allocated) {
> > +		wmb();
> 
> I don't understand this one either?
This is to make sure the last memset is over before we move to
clear_range.
> 
> > +		i915->gtt.base.clear_range(&i915->gtt.base,
> > +				node.start, node.size,
> > +				true);
> > +		drm_mm_remove_node(&node);
> > +		i915_gem_object_unpin_pages(obj);
> > +	}
> > +	else {
> > +		i915_gem_object_ggtt_unpin(obj);
> > +	}
> > +out:
> > +	return ret;
> > +}
> >
> 
> Regards,
> 
> Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT
  2015-12-09 13:57   ` Chris Wilson
@ 2015-12-10 10:27     ` Ankitprasad Sharma
  0 siblings, 0 replies; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-10 10:27 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Wed, 2015-12-09 at 13:57 +0000, Chris Wilson wrote:
> On Wed, Dec 09, 2015 at 06:16:17PM +0530, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > 
> > This patch adds support for clearing buffer objects via CPU/GTT. This
> > is particularly useful for clearing out the non shmem backed objects.
> > Currently intend to use this only for buffers allocated from stolen
> > region.
> > 
> > v2: Added kernel doc for i915_gem_clear_object(), corrected/removed
> > variable assignments (Tvrtko)
> > 
> > v3: Map object page by page to the gtt if the pinning of the whole object
> > to the ggtt fails, Corrected function name (Chris)
> > 
> > Testcase: igt/gem_stolen
> > 
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h |  1 +
> >  drivers/gpu/drm/i915/i915_gem.c | 79 +++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 80 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 548a0eb..8e554d3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2856,6 +2856,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
> >  				    int *needs_clflush);
> >  
> >  int __must_check i915_gem_object_get_pages(struct drm_i915_gem_object *obj);
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj);
> >  
> >  static inline int __sg_page_count(struct scatterlist *sg)
> >  {
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 9d2e6e3..d57e850 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -5244,3 +5244,82 @@ fail:
> >  	drm_gem_object_unreference(&obj->base);
> >  	return ERR_PTR(ret);
> >  }
> > +
> > +/**
> > + * i915_gem_clear_object() - Clear buffer object via CPU/GTT
> > + * @obj: Buffer object to be cleared
> > + *
> > + * Return: 0 - success, non-zero - failure
> > + */
> > +int i915_gem_object_clear(struct drm_i915_gem_object *obj)
> > +{
> > +	int ret, i;
> > +	char __iomem *base;
> > +	size_t size = obj->base.size;
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct drm_mm_node node;
> > +
> > +	WARN_ON(!mutex_is_locked(&obj->base.dev->struct_mutex));
> 
> Just lockdep_assert_held.
> 
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> 
> Would be nice to get the PIN_NOFAULT patches in to give preference to
> userspace mappings....
> 
Wouldn't it be better, not to use 2 approaches and just do the clearing
using the insert_page function. (not to even try mapping the whole
object) ?

Thanks,
Ankit


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-09 16:15   ` Tvrtko Ursulin
  2015-12-09 19:39     ` Dave Gordon
@ 2015-12-10 10:54     ` Ankitprasad Sharma
  2015-12-10 11:00       ` Ankitprasad Sharma
  1 sibling, 1 reply; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-10 10:54 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Wed, 2015-12-09 at 16:15 +0000, Tvrtko Ursulin wrote:
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >
> > This patch adds support for extending the pread/pwrite functionality
> > for objects not backed by shmem. The access will be made through
> > gtt interface. This will cover objects backed by stolen memory as well
> > as other non-shmem backed objects.
> >
> > v2: Drop locks around slow_user_access, prefault the pages before
> > access (Chris)
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: Moved page base & offset calculations outside the copy loop,
> > corrected data types for size and offset variables, corrected if-else
> > braces format (Tvrtko/kerneldocs)
> >
> > v5: Enabled pread/pwrite for all non-shmem backed objects including
> > without tiling restrictions (Ankit)
> >
> > v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >
> > v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
> > added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >
> > v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> > non-shmem backed objects (Tvrtko)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 151 +++++++++++++++++++++++++++++++++-------
> >   1 file changed, 127 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index ed97de6..68ed67a 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
> >   	return ret ? - EFAULT : 0;
> >   }
> >
> > +static inline uint64_t
> > +slow_user_access(struct io_mapping *mapping,
> > +		 uint64_t page_base, int page_offset,
> > +		 char __user *user_data,
> > +		 int length, bool pwrite)
> > +{
> > +	void __iomem *vaddr_inatomic;
> > +	void *vaddr;
> > +	uint64_t unwritten;
> > +
> > +	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> > +	/* We can use the cpu mem copy function because this is X86. */
> > +	vaddr = (void __force *)vaddr_inatomic + page_offset;
> > +	if (pwrite)
> > +		unwritten = __copy_from_user(vaddr, user_data, length);
> > +	else
> > +		unwritten = __copy_to_user(user_data, vaddr, length);
> > +
> > +	io_mapping_unmap(vaddr_inatomic);
> > +	return unwritten;
> > +}
> > +
> > +static int
> > +i915_gem_gtt_copy(struct drm_device *dev,
> > +		   struct drm_i915_gem_object *obj, uint64_t size,
> > +		   uint64_t data_offset, uint64_t data_ptr)
> > +{
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	char __user *user_data;
> > +	uint64_t remain;
> > +	uint64_t offset, page_base;
> > +	int page_offset, page_length, ret = 0;
> > +
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> > +	if (ret)
> > +		goto out;
> > +
> > +	ret = i915_gem_object_set_to_gtt_domain(obj, false);
> > +	if (ret)
> > +		goto out_unpin;
> > +
> > +	ret = i915_gem_object_put_fence(obj);
> > +	if (ret)
> > +		goto out_unpin;
> > +
> > +	user_data = to_user_ptr(data_ptr);
> > +	remain = size;
> > +	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> > +
> > +	mutex_unlock(&dev->struct_mutex);
> > +	if (likely(!i915.prefault_disable))
> > +		ret = fault_in_multipages_writeable(user_data, remain);
> > +
> > +	/*
> > +	 * page_offset = offset within page
> > +	 * page_base = page offset within aperture
> > +	 */
> > +	page_offset = offset_in_page(offset);
> > +	page_base = offset & PAGE_MASK;
> > +
> > +	while (remain > 0) {
> > +		/* page_length = bytes to copy for this page */
> > +		page_length = remain;
> > +		if ((page_offset + remain) > PAGE_SIZE)
> > +			page_length = PAGE_SIZE - page_offset;
> > +
> > +		/* This is a slow read/write as it tries to read from
> > +		 * and write to user memory which may result into page
> > +		 * faults
> > +		 */
> > +		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> > +				       page_offset, user_data,
> > +				       page_length, false);
> > +
> > +		if (ret) {
> > +			ret = -EFAULT;
> > +			break;
> > +		}
> > +
> > +		remain -= page_length;
> > +		user_data += page_length;
> > +		page_base += page_length;
> > +		page_offset = 0;
> > +	}
> > +
> > +	mutex_lock(&dev->struct_mutex);
> > +
> > +out_unpin:
> > +	i915_gem_object_ggtt_unpin(obj);
> > +out:
> > +	return ret;
> > +}
> > +
> >   static int
> >   i915_gem_shmem_pread(struct drm_device *dev,
> >   		     struct drm_i915_gem_object *obj,
> > @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
> >   		goto out;
> >   	}
> >
> > -	/* prime objects have no backing filp to GEM pread/pwrite
> > -	 * pages from.
> > -	 */
> > -	if (!obj->base.filp) {
> > -		ret = -EINVAL;
> > -		goto out;
> > -	}
> > -
> >   	trace_i915_gem_object_pread(obj, args->offset, args->size);
> >
> > -	ret = i915_gem_shmem_pread(dev, obj, args, file);
> > +	/* pread for non shmem backed objects */
> > +	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> > +		ret = i915_gem_gtt_copy(dev, obj, args->size,
> > +					args->offset, args->data_ptr);
> > +	else
> > +		ret = i915_gem_shmem_pread(dev, obj, args, file);
> 
> Hm, it will end up calling i915_gem_shmem_pread for non-shmem backed 
> objects if tiling is set. Sounds wrong to me unless I am missing something?
> 
Thanks for pointing it out, need to add a check there.
> >
> >   out:
> >   	drm_gem_object_unreference(&obj->base);
> > @@ -789,10 +879,12 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
> >   			 struct drm_i915_gem_pwrite *args,
> >   			 struct drm_file *file)
> >   {
> > +	struct drm_device *dev = obj->base.dev;
> >   	struct drm_mm_node node;
> >   	uint64_t remain, offset;
> >   	char __user *user_data;
> >   	int ret;
> > +	bool faulted = false;
> >
> >   	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> >   	if (ret) {
> > @@ -851,11 +943,29 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
> >   		/* If we get a fault while copying data, then (presumably) our
> >   		 * source page isn't available.  Return the error and we'll
> >   		 * retry in the slow path.
> > +		 * If the object is non-shmem backed, we retry again with the
> > +		 * path that handles page fault.
> >   		 */
> > -		if (fast_user_write(i915->gtt.mappable, page_base,
> > -				    page_offset, user_data, page_length)) {
> > -			ret = -EFAULT;
> > -			goto out_flush;
> > +		if (faulted || fast_user_write(i915->gtt.mappable,
> > +						page_base, page_offset,
> > +						user_data, page_length)) {
> > +			if (!obj->base.filp) {
> > +				faulted = true;
> > +				mutex_unlock(&dev->struct_mutex);
> > +				if (slow_user_access(i915->gtt.mappable,
> > +						     page_base,
> > +						     page_offset, user_data,
> > +						     page_length, true)) {
> > +					ret = -EFAULT;
> > +					mutex_lock(&dev->struct_mutex);
> > +					goto out_flush;
> > +				}
> > +
> > +				mutex_lock(&dev->struct_mutex);
> > +			} else {
> > +				ret = -EFAULT;
> > +				goto out_flush;
> > +			}
> >   		}
> 
> Some questions:
> 
> 1. What is the advantage of doing the slow access for non-shmem backed 
> objects inside a single loop, as opposed to extracting it in a separate 
> function?
> 
> For example i915_gem_gtt_pwrite_slow ? Then it could have been called 
> from i915_gem_pwrite_ioctl depending on the master if statement there, 
> fallback etc.
> 
> I think it would be clearer unless there is a special reason it makes 
> sense to go with the fast path first and then switch to slow path at the 
> point first fault is hit.
I am ready for any of the approach, but Chris suggested to extend
pwrite_fast as it is already being used for faster pwrites.

Chris,
Would it be better to do a pwrite to non-shmem backed objects via a
separate function?

> 
> 2. I have noticed the shmem pwrite slowpath makes explicit mention of 
> potential changes to the object domain while the lock was dropped and 
> takes care of flushing the cache in that case.
> 
> Is this something this path should do as well, or if not why not?
I do not think that this path needs to take care of flushing the cache,
as for stolen backed objects are not accessible to CPU hence no
possibility of it being in the CPU cache atleast for the stolen-backed
objects. 
For other non-shmem backed objects (dmabuf, usrptr, phys), may need some
inputs from Chris on how to handle it.

> 
> 
> >   		remain -= page_length;
> > @@ -1121,14 +1231,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> >   		goto out;
> >   	}
> >
> > -	/* prime objects have no backing filp to GEM pread/pwrite
> > -	 * pages from.
> > -	 */
> > -	if (!obj->base.filp) {
> > -		ret = -EINVAL;
> > -		goto out;
> > -	}
> > -
> >   	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
> >
> >   	ret = -EFAULT;
> > @@ -1139,8 +1241,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> >   	 * perspective, requiring manual detiling by the client.
> >   	 */
> >   	if (obj->tiling_mode == I915_TILING_NONE &&
> > -	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> > -	    cpu_write_needs_clflush(obj)) {
> > +	    (!obj->base.filp ||
> > +	    (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> > +	    cpu_write_needs_clflush(obj)))) {
> >   		ret = i915_gem_gtt_pwrite_fast(dev_priv, obj, args, file);
> >   		/* Note that the gtt paths might fail with non-page-backed user
> >   		 * pointers (e.g. gtt mappings when moving data between
> > @@ -1150,7 +1253,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> >   	if (ret == -EFAULT || ret == -ENOSPC) {
> >   		if (obj->phys_handle)
> >   			ret = i915_gem_phys_pwrite(obj, args, file);
> > -		else
> > +		else if (obj->base.filp)
> >   			ret = i915_gem_shmem_pwrite(dev, obj, args, file);
> >   	}
> >
> >
> 
> Regards,
> 
> Tvrtko
> 


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-10 10:54     ` Ankitprasad Sharma
@ 2015-12-10 11:00       ` Ankitprasad Sharma
  0 siblings, 0 replies; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-10 11:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Thu, 2015-12-10 at 16:24 +0530, Ankitprasad Sharma wrote:
Missed Chris in last mail, adding him
On Wed, 2015-12-09 at 16:15 +0000, Tvrtko Ursulin wrote:
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >
> > This patch adds support for extending the pread/pwrite functionality
> > for objects not backed by shmem. The access will be made through
> > gtt interface. This will cover objects backed by stolen memory as well
> > as other non-shmem backed objects.
> >
> > v2: Drop locks around slow_user_access, prefault the pages before
> > access (Chris)
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: Moved page base & offset calculations outside the copy loop,
> > corrected data types for size and offset variables, corrected if-else
> > braces format (Tvrtko/kerneldocs)
> >
> > v5: Enabled pread/pwrite for all non-shmem backed objects including
> > without tiling restrictions (Ankit)
> >
> > v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >
> > v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
> > added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >
> > v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> > non-shmem backed objects (Tvrtko)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 151 +++++++++++++++++++++++++++++++++-------
> >   1 file changed, 127 insertions(+), 24 deletions(-)> >

> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index ed97de6..68ed67a 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
> >   	return ret ? - EFAULT : 0;
> >   }
> >
> > +static inline uint64_t
> > +slow_user_access(struct io_mapping *mapping,
> > +		 uint64_t page_base, int page_offset,
> > +		 char __user *user_data,
> > +		 int length, bool pwrite)
> > +{
> > +	void __iomem *vaddr_inatomic;
> > +	void *vaddr;
> > +	uint64_t unwritten;
> > +
> > +	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> > +	/* We can use the cpu mem copy function because this is X86. */
> > +	vaddr = (void __force *)vaddr_inatomic + page_offset;
> > +	if (pwrite)
> > +		unwritten = __copy_from_user(vaddr, user_data, length);
> > +	else
> > +		unwritten = __copy_to_user(user_data, vaddr, length);
> > +
> > +	io_mapping_unmap(vaddr_inatomic);
> > +	return unwritten;
> > +}
> > +
> > +static int
> > +i915_gem_gtt_copy(struct drm_device *dev,
> > +		   struct drm_i915_gem_object *obj, uint64_t size,
> > +		   uint64_t data_offset, uint64_t data_ptr)
> > +{
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	char __user *user_data;
> > +	uint64_t remain;
> > +	uint64_t offset, page_base;
> > +	int page_offset, page_length, ret = 0;
> > +
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> > +	if (ret)
> > +		goto out;
> > +
> > +	ret = i915_gem_object_set_to_gtt_domain(obj, false);
> > +	if (ret)
> > +		goto out_unpin;
> > +
> > +	ret = i915_gem_object_put_fence(obj);
> > +	if (ret)
> > +		goto out_unpin;
> > +
> > +	user_data = to_user_ptr(data_ptr);
> > +	remain = size;
> > +	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> > +
> > +	mutex_unlock(&dev->struct_mutex);
> > +	if (likely(!i915.prefault_disable))
> > +		ret = fault_in_multipages_writeable(user_data, remain);
> > +
> > +	/*
> > +	 * page_offset = offset within page
> > +	 * page_base = page offset within aperture
> > +	 */
> > +	page_offset = offset_in_page(offset);
> > +	page_base = offset & PAGE_MASK;
> > +
> > +	while (remain > 0) {
> > +		/* page_length = bytes to copy for this page */
> > +		page_length = remain;
> > +		if ((page_offset + remain) > PAGE_SIZE)
> > +			page_length = PAGE_SIZE - page_offset;
> > +
> > +		/* This is a slow read/write as it tries to read from
> > +		 * and write to user memory which may result into page
> > +		 * faults
> > +		 */
> > +		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> > +				       page_offset, user_data,
> > +				       page_length, false);
> > +
> > +		if (ret) {
> > +			ret = -EFAULT;
> > +			break;
> > +		}
> > +
> > +		remain -= page_length;
> > +		user_data += page_length;
> > +		page_base += page_length;
> > +		page_offset = 0;
> > +	}
> > +
> > +	mutex_lock(&dev->struct_mutex);
> > +
> > +out_unpin:
> > +	i915_gem_object_ggtt_unpin(obj);
> > +out:
> > +	return ret;
> > +}
> > +
> >   static int
> >   i915_gem_shmem_pread(struct drm_device *dev,
> >   		     struct drm_i915_gem_object *obj,
> > @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
> >   		goto out;
> >   	}
> >
> > -	/* prime objects have no backing filp to GEM pread/pwrite
> > -	 * pages from.
> > -	 */
> > -	if (!obj->base.filp) {
> > -		ret = -EINVAL;
> > -		goto out;
> > -	}
> > -
> >   	trace_i915_gem_object_pread(obj, args->offset, args->size);
> >
> > -	ret = i915_gem_shmem_pread(dev, obj, args, file);
> > +	/* pread for non shmem backed objects */
> > +	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> > +		ret = i915_gem_gtt_copy(dev, obj, args->size,
> > +					args->offset, args->data_ptr);
> > +	else
> > +		ret = i915_gem_shmem_pread(dev, obj, args, file);
> 
> Hm, it will end up calling i915_gem_shmem_pread for non-shmem backed 
> objects if tiling is set. Sounds wrong to me unless I am missing something?
> 
Thanks for pointing it out, need to add a check there.
> >
> >   out:
> >   	drm_gem_object_unreference(&obj->base);
> > @@ -789,10 +879,12 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
> >   			 struct drm_i915_gem_pwrite *args,
> >   			 struct drm_file *file)
> >   {
> > +	struct drm_device *dev = obj->base.dev;
> >   	struct drm_mm_node node;
> >   	uint64_t remain, offset;
> >   	char __user *user_data;
> >   	int ret;
> > +	bool faulted = false;
> >
> >   	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> >   	if (ret) {
> > @@ -851,11 +943,29 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
> >   		/* If we get a fault while copying data, then (presumably) our
> >   		 * source page isn't available.  Return the error and we'll
> >   		 * retry in the slow path.
> > +		 * If the object is non-shmem backed, we retry again with the
> > +		 * path that handles page fault.
> >   		 */
> > -		if (fast_user_write(i915->gtt.mappable, page_base,
> > -				    page_offset, user_data, page_length)) {
> > -			ret = -EFAULT;
> > -			goto out_flush;
> > +		if (faulted || fast_user_write(i915->gtt.mappable,
> > +						page_base, page_offset,
> > +						user_data, page_length)) {
> > +			if (!obj->base.filp) {
> > +				faulted = true;
> > +				mutex_unlock(&dev->struct_mutex);
> > +				if (slow_user_access(i915->gtt.mappable,
> > +						     page_base,
> > +						     page_offset, user_data,
> > +						     page_length, true)) {
> > +					ret = -EFAULT;
> > +					mutex_lock(&dev->struct_mutex);
> > +					goto out_flush;
> > +				}
> > +
> > +				mutex_lock(&dev->struct_mutex);
> > +			} else {
> > +				ret = -EFAULT;
> > +				goto out_flush;
> > +			}
> >   		}
> 
> Some questions:
> 
> 1. What is the advantage of doing the slow access for non-shmem backed 
> objects inside a single loop, as opposed to extracting it in a separate 
> function?
> 
> For example i915_gem_gtt_pwrite_slow ? Then it could have been called 
> from i915_gem_pwrite_ioctl depending on the master if statement there, > fallback etc.
> 
> I think it would be clearer unless there is a special reason it makes 
> sense to go with the fast path first and then switch to slow path at the 
> point first fault is hit.
I am ready for any of the approach, but Chris suggested to extend
pwrite_fast as it is already being used for faster pwrites.

Chris,
Would it be better to do a pwrite to non-shmem backed objects via a
separate function?

 
> 2. I have noticed the shmem pwrite slowpath makes explicit mention of 
> potential changes to the object domain while the lock was dropped and 
> takes care of flushing the cache in that case.
> 
> Is this something this path should do as well, or if not why not?
I do not think that this path needs to take care of flushing the cache,
as for stolen backed objects are not accessible to CPU hence no
possibility of it being in the CPU cache atleast for the stolen-backed
objects. 
For other non-shmem backed objects (dmabuf, usrptr, phys), may need some
inputs from Chris on how to handle it.

Thanks, Ankit

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-09 19:39     ` Dave Gordon
@ 2015-12-10 11:12       ` Ankitprasad Sharma
  2015-12-10 18:18         ` Dave Gordon
  2015-12-11 18:15       ` Daniel Vetter
  1 sibling, 1 reply; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-10 11:12 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Wed, 2015-12-09 at 19:39 +0000, Dave Gordon wrote:
> On 09/12/15 16:15, Tvrtko Ursulin wrote:
> >
> > Hi,
> >
> > On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> >> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >>
> >> This patch adds support for extending the pread/pwrite functionality
> >> for objects not backed by shmem. The access will be made through
> >> gtt interface. This will cover objects backed by stolen memory as well
> >> as other non-shmem backed objects.
> >>
> >> v2: Drop locks around slow_user_access, prefault the pages before
> >> access (Chris)
> >>
> >> v3: Rebased to the latest drm-intel-nightly (Ankit)
> >>
> >> v4: Moved page base & offset calculations outside the copy loop,
> >> corrected data types for size and offset variables, corrected if-else
> >> braces format (Tvrtko/kerneldocs)
> >>
> >> v5: Enabled pread/pwrite for all non-shmem backed objects including
> >> without tiling restrictions (Ankit)
> >>
> >> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >>
> >> v7: Updated commit message, Renamed i915_gem_gtt_read to
> >> i915_gem_gtt_copy,
> >> added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >>
> >> v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> >> non-shmem backed objects (Tvrtko)
> >>
> >> Testcase: igt/gem_stolen
> >>
> >> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem.c | 151
> >> +++++++++++++++++++++++++++++++++-------
> >>   1 file changed, 127 insertions(+), 24 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> >> b/drivers/gpu/drm/i915/i915_gem.c
> >> index ed97de6..68ed67a 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >> @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int
> >> shmem_page_offset, int page_length,
> >>       return ret ? - EFAULT : 0;
> >>   }
> >>
> >> +static inline uint64_t
> >> +slow_user_access(struct io_mapping *mapping,
> >> +         uint64_t page_base, int page_offset,
> >> +         char __user *user_data,
> >> +         int length, bool pwrite)
> >> +{
> >> +    void __iomem *vaddr_inatomic;
> >> +    void *vaddr;
> >> +    uint64_t unwritten;
> >> +
> >> +    vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> >> +    /* We can use the cpu mem copy function because this is X86. */
> >> +    vaddr = (void __force *)vaddr_inatomic + page_offset;
> >> +    if (pwrite)
> >> +        unwritten = __copy_from_user(vaddr, user_data, length);
> >> +    else
> >> +        unwritten = __copy_to_user(user_data, vaddr, length);
> >> +
> >> +    io_mapping_unmap(vaddr_inatomic);
> >> +    return unwritten;
> >> +}
> >> +
> >> +static int
> >> +i915_gem_gtt_copy(struct drm_device *dev,
> >> +           struct drm_i915_gem_object *obj, uint64_t size,
> >> +           uint64_t data_offset, uint64_t data_ptr)
> >> +{
> >> +    struct drm_i915_private *dev_priv = dev->dev_private;
> >> +    char __user *user_data;
> >> +    uint64_t remain;
> >> +    uint64_t offset, page_base;
> >> +    int page_offset, page_length, ret = 0;
> >> +
> >> +    ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> >> +    if (ret)
> >> +        goto out;
> >> +
> >> +    ret = i915_gem_object_set_to_gtt_domain(obj, false);
> >> +    if (ret)
> >> +        goto out_unpin;
> >> +
> >> +    ret = i915_gem_object_put_fence(obj);
> >> +    if (ret)
> >> +        goto out_unpin;
> >> +
> >> +    user_data = to_user_ptr(data_ptr);
> >> +    remain = size;
> >> +    offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> >> +
> >> +    mutex_unlock(&dev->struct_mutex);
> >> +    if (likely(!i915.prefault_disable))
> >> +        ret = fault_in_multipages_writeable(user_data, remain);
> >> +
> >> +    /*
> >> +     * page_offset = offset within page
> >> +     * page_base = page offset within aperture
> >> +     */
> >> +    page_offset = offset_in_page(offset);
> >> +    page_base = offset & PAGE_MASK;
> >> +
> >> +    while (remain > 0) {
> >> +        /* page_length = bytes to copy for this page */
> >> +        page_length = remain;
> >> +        if ((page_offset + remain) > PAGE_SIZE)
> >> +            page_length = PAGE_SIZE - page_offset;
> >> +
> >> +        /* This is a slow read/write as it tries to read from
> >> +         * and write to user memory which may result into page
> >> +         * faults
> >> +         */
> >> +        ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> >> +                       page_offset, user_data,
> >> +                       page_length, false);
> >> +
> >> +        if (ret) {
> >> +            ret = -EFAULT;
> >> +            break;
> >> +        }
> >> +
> >> +        remain -= page_length;
> >> +        user_data += page_length;
> >> +        page_base += page_length;
> >> +        page_offset = 0;
> >> +    }
> >> +
> >> +    mutex_lock(&dev->struct_mutex);
> >> +
> >> +out_unpin:
> >> +    i915_gem_object_ggtt_unpin(obj);
> >> +out:
> >> +    return ret;
> >> +}
> >> +
> >>   static int
> >>   i915_gem_shmem_pread(struct drm_device *dev,
> >>                struct drm_i915_gem_object *obj,
> >> @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev,
> >> void *data,
> >>           goto out;
> >>       }
> >>
> >> -    /* prime objects have no backing filp to GEM pread/pwrite
> >> -     * pages from.
> >> -     */
> >> -    if (!obj->base.filp) {
> >> -        ret = -EINVAL;
> >> -        goto out;
> >> -    }
> >> -
> >>       trace_i915_gem_object_pread(obj, args->offset, args->size);
> >>
> >> -    ret = i915_gem_shmem_pread(dev, obj, args, file);
> >> +    /* pread for non shmem backed objects */
> >> +    if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> >> +        ret = i915_gem_gtt_copy(dev, obj, args->size,
> >> +                    args->offset, args->data_ptr);
> >> +    else
> >> +        ret = i915_gem_shmem_pread(dev, obj, args, file);
> >
> > Hm, it will end up calling i915_gem_shmem_pread for non-shmem backed
> > objects if tiling is set. Sounds wrong to me unless I am missing something?
> 
> Which GEM objects have obj->base.filp set? Is it ONLY regular gtt-type 
> objects? What about (phys, stolen, userptr, dmabuf, ...?) Which of these 
> is the alternate path going to work with?
Only shmem backed objects have obj->base.filp set, filp pointing to the
shmem file. For all other non-shmem backed objects (stolen, userptr,
dmabuf) we use the alternate path.

-Ankit

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-09 17:25   ` Tvrtko Ursulin
  2015-12-09 19:24     ` Ville Syrjälä
@ 2015-12-10 13:17     ` Ankitprasad Sharma
  1 sibling, 0 replies; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-10 13:17 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Wed, 2015-12-09 at 17:25 +0000, Tvrtko Ursulin wrote:
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> >
> > Ville reminded us that stolen memory is not preserved across
> > hibernation, and a result of this was that context objects now being
> > allocated from stolen were being corrupted on S4 and promptly hanging
> > the GPU on resume.
> >
> > We want to utilise stolen for as much as possible (nothing else will use
> > that wasted memory otherwise), so we need a strategy for handling
> > general objects allocated from stolen and hibernation. A simple solution
> > is to do a CPU copy through the GTT of the stolen object into a fresh
> > shmemfs backing store and thenceforth treat it as a normal objects. This
> > can be refined in future to either use a GPU copy to avoid the slow
> > uncached reads (though it's hibernation!) and recreate stolen objects
> > upon resume/first-use. For now, a simple approach should suffice for
> > testing the object migration.
> >
> > v2:
> > Swap PTE for pinned bindings over to the shmemfs. This adds a
> > complicated dance, but is required as many stolen objects are likely to
> > be pinned for use by the hardware. Swapping the PTEs should not result
> > in externally visible behaviour, as each PTE update should be atomic and
> > the two pages identical. (danvet)
> >
> > safe-by-default, or the principle of least surprise. We need a new flag
> > to mark objects that we can wilfully discard and recreate across
> > hibernation. (danvet)
> >
> > Just use the global_list rather than invent a new stolen_list. This is
> > the slowpath hibernate and so adding a new list and the associated
> > complexity isn't worth it.
> >
> > v3: Rebased on drm-intel-nightly (Ankit)
> >
> > v4: Use insert_page to map stolen memory backed pages for migration to
> > shmem (Chris)
> >
> > v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
> >   drivers/gpu/drm/i915/i915_drv.h         |   7 +
> >   drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
> >   drivers/gpu/drm/i915/intel_display.c    |   3 +
> >   drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
> >   drivers/gpu/drm/i915/intel_pm.c         |   2 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
> >   7 files changed, 261 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 9f55209..2bb9e9e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
> >   	return i915_drm_suspend(drm_dev);
> >   }
> >
> > +static int i915_pm_freeze(struct device *dev)
> > +{
> > +	int ret;
> > +
> > +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> > +	if (ret)
> > +		return ret;
> 
> Can we distinguish between S3 and S4 if the stolen corruption only 
> happens in S4? Not to spend all the extra effort for nothing in S3? Or 
> maybe this is not even called for S3?
For S3, i915_pm_suspend will be called. 
i915_pm_freeze will be called in the hibernation (which corresponds to
S4?)

> 
> > +
> > +	ret = i915_pm_suspend(dev);
> > +	if (ret)
> > +		return ret;
> > +
> > +	return 0;
> > +}
> > +
> >   static int i915_pm_suspend_late(struct device *dev)
> >   {
> >   	struct drm_device *drm_dev = dev_to_i915(dev)->dev;
> > @@ -1700,7 +1715,7 @@ static const struct dev_pm_ops i915_pm_ops = {
> >   	 * @restore, @restore_early : called after rebooting and restoring the
> >   	 *                            hibernation image [PMSG_RESTORE]
> >   	 */
> > -	.freeze = i915_pm_suspend,
> > +	.freeze = i915_pm_freeze,
> >   	.freeze_late = i915_pm_suspend_late,
> >   	.thaw_early = i915_pm_resume_early,
> >   	.thaw = i915_pm_resume,
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index e0b09b0..0d18b07 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2080,6 +2080,12 @@ struct drm_i915_gem_object {
> >   	 * Advice: are the backing pages purgeable?
> >   	 */
> >   	unsigned int madv:2;
> > +	/**
> > +	 * Whereas madv is for userspace, there are certain situations
> > +	 * where we want I915_MADV_DONTNEED behaviour on internal objects
> > +	 * without conflating the userspace setting.
> > +	 */
> > +	unsigned int internal_volatile:1;
> >
> >   	/**
> >   	 * Current tiling mode for the object.
> > @@ -3006,6 +3012,7 @@ int i915_gem_l3_remap(struct drm_i915_gem_request *req, int slice);
> >   void i915_gem_init_swizzling(struct drm_device *dev);
> >   void i915_gem_cleanup_ringbuffer(struct drm_device *dev);
> >   int __must_check i915_gpu_idle(struct drm_device *dev);
> > +int __must_check i915_gem_freeze(struct drm_device *dev);
> >   int __must_check i915_gem_suspend(struct drm_device *dev);
> >   void __i915_add_request(struct drm_i915_gem_request *req,
> >   			struct drm_i915_gem_object *batch_obj,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 68ed67a..1f134b0 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4511,12 +4511,27 @@ static const struct drm_i915_gem_object_ops i915_gem_object_ops = {
> >   	.put_pages = i915_gem_object_put_pages_gtt,
> >   };
> >
> > +static struct address_space *
> > +i915_gem_set_inode_gfp(struct drm_device *dev, struct file *file)
> > +{
> > +	struct address_space *mapping = file_inode(file)->i_mapping;
> > +	gfp_t mask;
> > +
> > +	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> > +	if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
> > +		/* 965gm cannot relocate objects above 4GiB. */
> > +		mask &= ~__GFP_HIGHMEM;
> > +		mask |= __GFP_DMA32;
> > +	}
> > +	mapping_set_gfp_mask(mapping, mask);
> > +
> > +	return mapping;
> > +}
> > +
> >   struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
> >   						  size_t size)
> >   {
> >   	struct drm_i915_gem_object *obj;
> > -	struct address_space *mapping;
> > -	gfp_t mask;
> >   	int ret;
> >
> >   	obj = i915_gem_object_alloc(dev);
> > @@ -4529,15 +4544,7 @@ struct drm_i915_gem_object *i915_gem_alloc_object(struct drm_device *dev,
> >   		return ERR_PTR(ret);
> >   	}
> >
> > -	mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
> > -	if (IS_CRESTLINE(dev) || IS_BROADWATER(dev)) {
> > -		/* 965gm cannot relocate objects above 4GiB. */
> > -		mask &= ~__GFP_HIGHMEM;
> > -		mask |= __GFP_DMA32;
> > -	}
> > -
> > -	mapping = file_inode(obj->base.filp)->i_mapping;
> > -	mapping_set_gfp_mask(mapping, mask);
> > +	i915_gem_set_inode_gfp(dev, obj->base.filp);
> >
> >   	i915_gem_object_init(obj, &i915_gem_object_ops);
> >
> > @@ -4714,6 +4721,209 @@ i915_gem_stop_ringbuffers(struct drm_device *dev)
> >   		dev_priv->gt.stop_ring(ring);
> >   }
> >
> > +static int
> > +i915_gem_object_migrate_stolen_to_shmemfs(struct drm_i915_gem_object *obj)
> > +{
> 
> Some documentation for this function would be good.
Yes, will do it.
> 
> > +	struct drm_i915_private *i915 = to_i915(obj->base.dev);
> > +	struct i915_vma *vma, *vn;
> > +	struct drm_mm_node node;
> > +	struct file *file;
> > +	struct address_space *mapping;
> > +	struct sg_table *stolen_pages, *shmemfs_pages;
> > +	int ret, i;
> > +
> > +	if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
> > +		return -EINVAL;
> 
> I am no expert in hibernation or swizzling but this looks really bad to me.
> 
> It is both platform and user controlled and it will cause hibernation to 
> fail in a very noisy way, correct?
> 
> At least it needs to be WARN_ON_ONCE, but if my thinking is correct it 
> should really be that either:
> 
> a) hibernation is prevented in a quieter way (DRM_ERROR, once) 
> altogether when dev_priv->mm.bit_6_swizzle_x == 
> I915_BIT_6_SWIZZLE_9_10_17, or
> 
> b) set_tiling fails on the same platforms which also support hibernation.
> 
Either we can disallow the set_tiling call if both swizzling and
hibernation is allowed on the platform or we can exit quietly.
Chris can further suggest on this.
> Comments?
> 
> > +
> > +	ret = i915_gem_object_set_to_gtt_domain(obj, false);
> > +	if (ret)
> > +		return ret;
> > +
> > +	file = shmem_file_setup("drm mm object", obj->base.size, VM_NORESERVE);
> > +	if (IS_ERR(file))
> > +		return PTR_ERR(file);
> > +	mapping = i915_gem_set_inode_gfp(obj->base.dev, file);
> > +
> > +	list_for_each_entry_safe(vma, vn, &obj->vma_list, vma_link)
> > +		if (i915_vma_unbind(vma))
> > +			continue;
> > +
> > +	if (obj->madv != I915_MADV_WILLNEED && list_empty(&obj->vma_list)) {
> > +		/* Discard the stolen reservation, and replace with
> > +		 * an unpopulated shmemfs object.
> > +		 */
> > +		obj->madv = __I915_MADV_PURGED;
> > +		goto swap_pages;
> > +	}
> 
> Maybe put a comment before this block saying "no need to copy 
> content/something for objects...", if I got it right.
> 
> > +
> > +	/* stolen objects are already pinned to prevent shrinkage */
> > +	memset(&node, 0, sizeof(node));
> > +	ret = drm_mm_insert_node_in_range_generic(&i915->gtt.base.mm,
> > +						  &node,
> > +						  4096, 0, I915_CACHE_NONE,
> > +						  0, i915->gtt.mappable_end,
> > +						  DRM_MM_SEARCH_DEFAULT,
> > +						  DRM_MM_CREATE_DEFAULT);
> > +	if (ret)
> > +		return ret;
> 
> If there is a likelyhood global gtt can be full would it be worth it 
> trying to evict something before attempting hibernation?
Yes, but it is very unlikely to happen.
> 
> Also leaks file.
Yes, will fix this.
> 
> > +
> > +	for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
> > +		struct page *page;
> > +		void *__iomem src;
> > +		void *dst;
> > +
> > +		wmb();
> 
> What is this one for? If it is for the memcpy_fromio would it be more 
> obvious to put it after that call?
As replied in a separate mail, this is to ensure following program order
strictly, minimal reordering during this loop.
> 
> > +		i915->gtt.base.insert_page(&i915->gtt.base,
> > +					   i915_gem_object_get_dma_address(obj, i),
> > +					   node.start,
> > +					   I915_CACHE_NONE,
> > +					   0);
> > +		wmb();
> > +
> > +		page =  shmem_read_mapping_page(mapping, i);
> > +		if (IS_ERR(page)) {
> > +			ret = PTR_ERR(page);
> > +			goto err_node;
> > +		}
> > +
> > +		src = io_mapping_map_atomic_wc(i915->gtt.mappable, node.start + PAGE_SIZE * i);
> > +		dst = kmap_atomic(page);
> > +		memcpy_fromio(dst, src, PAGE_SIZE);
> > +		kunmap_atomic(dst);
> > +		io_mapping_unmap_atomic(src);
> > +
> > +		page_cache_release(page);
> 
> I assume shmem_file_setup takes one reference to each page, 
> shmem_read_mapping_page another and then here we release that extra one? Or?
> 
1. Only file instantiation happens during shmem_file_setup, no pages are
allocated.
2. shmem_read_mapping_page does the allocation of pages and returns with
a refcount of 2 (1 for shmem-internal purpose and another for the
driver/caller)
3. page_cache_release releases the refcount from the driver side as we
don't necessarily want the new page to be pinned in RAM, once copy is
done.
4. Later on, if we need to pin the object, get_pages_gtt again increases
the refcount to 2. 
> > +	}
> > +
> > +	wmb();
> > +	i915->gtt.base.clear_range(&i915->gtt.base,
> > +				   node.start, node.size,
> > +				   true);
> > +	drm_mm_remove_node(&node);
> 
> Maybe move the whole copy content loop into a helper for readability?
This can be done.
> 
> > +
> > +swap_pages:
> > +	stolen_pages = obj->pages;
> > +	obj->pages = NULL;
> > +
> > +	obj->base.filp = file;
> > +	obj->base.read_domains = I915_GEM_DOMAIN_CPU;
> > +	obj->base.write_domain = I915_GEM_DOMAIN_CPU;
> > +
> > +	/* Recreate any pinned binding with pointers to the new storage */
> > +	if (!list_empty(&obj->vma_list)) {
> > +		ret = i915_gem_object_get_pages_gtt(obj);
> > +		if (ret) {
> > +			obj->pages = stolen_pages;
> > +			goto err_file;
> > +		}
> > +
> > +		ret = i915_gem_object_set_to_gtt_domain(obj, true);
> > +		if (ret) {
> > +			i915_gem_object_put_pages_gtt(obj);
> > +			obj->pages = stolen_pages;
> > +			goto err_file;
> > +		}
> > +
> > +		obj->get_page.sg = obj->pages->sgl;
> > +		obj->get_page.last = 0;
> > +
> > +		list_for_each_entry(vma, &obj->vma_list, vma_link) {
> > +			if (!drm_mm_node_allocated(&vma->node))
> > +				continue;
> > +
> > +			WARN_ON(i915_vma_bind(vma,
> > +					      obj->cache_level,
> > +					      PIN_UPDATE));
> > +		}
> > +	} else
> > +		list_del(&obj->global_list);
> 
> Hm, can it be bound if there were no VMAs?
Object will not be bound as there are no VMAs, but 'obj->global_list'
will be part of unbound list and we need to unlink that from the unbound
list.
> 
> > +
> > +	/* drop the stolen pin and backing */
> > +	shmemfs_pages = obj->pages;
> > +	obj->pages = stolen_pages;
> > +
> > +	i915_gem_object_unpin_pages(obj);
> > +	obj->ops->put_pages(obj);
> > +	if (obj->ops->release)
> > +		obj->ops->release(obj);
> > +
> > +	obj->ops = &i915_gem_object_ops;
> > +	obj->pages = shmemfs_pages;
> > +
> > +	return 0;
> > +
> > +err_node:
> > +	wmb();
> > +	i915->gtt.base.clear_range(&i915->gtt.base,
> > +				   node.start, node.size,
> > +				   true);
> > +	drm_mm_remove_node(&node);
> > +err_file:
> > +	fput(file);
> > +	obj->base.filp = NULL;
> > +	return ret;
> > +}
> > +
> > +int
> > +i915_gem_freeze(struct drm_device *dev)
> > +{
> > +	/* Called before i915_gem_suspend() when hibernating */
> > +	struct drm_i915_private *i915 = to_i915(dev);
> > +	struct drm_i915_gem_object *obj, *tmp;
> > +	struct list_head *phase[] = {
> > +		&i915->mm.unbound_list, &i915->mm.bound_list, NULL
> > +	}, **p;
> > +	int ret;
> > +
> > +	ret = i915_mutex_lock_interruptible(dev);
> > +	if (ret)
> > +		return ret;
> > +	/* Across hibernation, the stolen area is not preserved.
> > +	 * Anything inside stolen must copied back to normal
> > +	 * memory if we wish to preserve it.
> > +	 */
> > +	for (p = phase; *p; p++) {
> > +		struct list_head migrate;
> > +		int ret;
> > +
> > +		INIT_LIST_HEAD(&migrate);
> > +		list_for_each_entry_safe(obj, tmp, *p, global_list) {
> > +			if (obj->stolen == NULL)
> > +				continue;
> > +
> > +			if (obj->internal_volatile)
> > +				continue;
> > +
> > +			/* In the general case, this object may only be alive
> > +			 * due to an active reference, and that may disappear
> > +			 * when we unbind any of the objects (and so wait upon
> > +			 * the GPU and retire requests). To prevent one of the
> > +			 * objects from disappearing beneath us, we need to
> > +			 * take a reference to each as we build the migration
> > +			 * list.
> > +			 *
> > +			 * This is similar to the strategy required whilst
> > +			 * shrinking or evicting objects (for the same reason).
> > +			 */
> > +			drm_gem_object_reference(&obj->base);
> > +			list_move(&obj->global_list, &migrate);
> > +		}
> > +
> > +		ret = 0;
> > +		list_for_each_entry_safe(obj, tmp, &migrate, global_list) {
> > +			if (ret == 0)
> > +				ret = i915_gem_object_migrate_stolen_to_shmemfs(obj);
> > +			drm_gem_object_unreference(&obj->base);
> > +		}
> > +		list_splice(&migrate, *p);
> 
> Hmmm are this some clever games with obj->global_list ?
If the migration was unsuccessful, we are just moving the objects back
to their original list (bound or unbound).
> 
> > +		if (ret)
> > +			break;
> > +	}
> > +
> > +	mutex_unlock(&dev->struct_mutex);
> > +	return ret;
> > +}
> > +
> >   int
> >   i915_gem_suspend(struct drm_device *dev)
> >   {
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index f281e0b..0803922 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -2549,6 +2549,9 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
> >   	if (IS_ERR(obj))
> >   		return false;
> >
> > +	/* Not to be preserved across hibernation */
> > +	obj->internal_volatile = true;
> > +
> >   	obj->tiling_mode = plane_config->tiling;
> >   	if (obj->tiling_mode == I915_TILING_X)
> >   		obj->stride = fb->pitches[0];
> > diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> > index f43681e..1d89253 100644
> > --- a/drivers/gpu/drm/i915/intel_fbdev.c
> > +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> > @@ -154,6 +154,12 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
> >   		goto out;
> >   	}
> >
> > +	/* Discard the contents of the BIOS fb across hibernation.
> > +	 * We really want to completely throwaway the earlier fbdev
> > +	 * and reconfigure it anyway.
> > +	 */
> > +	obj->internal_volatile = true;
> > +
> >   	fb = __intel_framebuffer_create(dev, &mode_cmd, obj);
> >   	if (IS_ERR(fb)) {
> >   		ret = PTR_ERR(fb);
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 03ad276..6ddc20a 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -5181,6 +5181,8 @@ static void valleyview_setup_pctx(struct drm_device *dev)
> >   	I915_WRITE(VLV_PCBR, pctx_paddr);
> >
> >   out:
> > +	/* The power context need not be preserved across hibernation */
> > +	pctx->internal_volatile = true;
> >   	DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
> >   	dev_priv->vlv_pctx = pctx;
> >   }
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 5eabaf6..370d96a 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -2090,6 +2090,12 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
> >   	if (IS_ERR(obj))
> >   		return PTR_ERR(obj);
> >
> > +	/* Ringbuffer objects are by definition volatile - only the commands
> > +	 * between HEAD and TAIL need to be preserved and whilst there are
> > +	 * any commands there, the ringbuffer is pinned by activity.
> > +	 */
> > +	obj->internal_volatile = true;
> > +
> 
> What does this mean? It gets correctly re-initialized by existing code 
> on resume? Don't see anythign specific about HEAD and TAIL in this patch.
The HEAD and TAIL will be the same for the ringbuffer before the system
goes in to hibernation, which will be taken care by vma_unbind to
complete all requests.
> 
> >   	/* mark ring buffers as read-only from GPU side by default */
> >   	obj->gt_ro = 1;
> >
> >

Thanks,
Ankit


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-10  9:43   ` Tvrtko Ursulin
@ 2015-12-10 13:17     ` Ankitprasad Sharma
  2015-12-10 14:15       ` Tvrtko Ursulin
  0 siblings, 1 reply; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-10 13:17 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Thu, 2015-12-10 at 09:43 +0000, Tvrtko Ursulin wrote:
> Hi,
> 
> Two more comments below:
> 
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> >
> > Ville reminded us that stolen memory is not preserved across
> > hibernation, and a result of this was that context objects now being
> > allocated from stolen were being corrupted on S4 and promptly hanging
> > the GPU on resume.
> >
> > We want to utilise stolen for as much as possible (nothing else will use
> > that wasted memory otherwise), so we need a strategy for handling
> > general objects allocated from stolen and hibernation. A simple solution
> > is to do a CPU copy through the GTT of the stolen object into a fresh
> > shmemfs backing store and thenceforth treat it as a normal objects. This
> > can be refined in future to either use a GPU copy to avoid the slow
> > uncached reads (though it's hibernation!) and recreate stolen objects
> > upon resume/first-use. For now, a simple approach should suffice for
> > testing the object migration.
> 
> Mention of "testing" in the commit message and absence of a path to 
> migrate the objects back to stolen memory on resume makes me think this 
> is kind of half finished and note really ready for review / merge ?
> 
> Because I don't see how it is useful to migrate it one way and never 
> move back?
I think that this is not much of a problem, as the purpose here is to
keep the object intact, to avoid breaking anything.
So as far as objects are concerned they will be in shmem and can be used
without any issue, and the stolen memory will be free again for other
usage from the user.
> 
> >
> > v2:
> > Swap PTE for pinned bindings over to the shmemfs. This adds a
> > complicated dance, but is required as many stolen objects are likely to
> > be pinned for use by the hardware. Swapping the PTEs should not result
> > in externally visible behaviour, as each PTE update should be atomic and
> > the two pages identical. (danvet)
> >
> > safe-by-default, or the principle of least surprise. We need a new flag
> > to mark objects that we can wilfully discard and recreate across
> > hibernation. (danvet)
> >
> > Just use the global_list rather than invent a new stolen_list. This is
> > the slowpath hibernate and so adding a new list and the associated
> > complexity isn't worth it.
> >
> > v3: Rebased on drm-intel-nightly (Ankit)
> >
> > v4: Use insert_page to map stolen memory backed pages for migration to
> > shmem (Chris)
> >
> > v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
> >   drivers/gpu/drm/i915/i915_drv.h         |   7 +
> >   drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
> >   drivers/gpu/drm/i915/intel_display.c    |   3 +
> >   drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
> >   drivers/gpu/drm/i915/intel_pm.c         |   2 +
> >   drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
> >   7 files changed, 261 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 9f55209..2bb9e9e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
> >   	return i915_drm_suspend(drm_dev);
> >   }
> >
> > +static int i915_pm_freeze(struct device *dev)
> > +{
> > +	int ret;
> > +
> > +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> > +	if (ret)
> > +		return ret;
> 
> One of the first steps in idling GEM seems to be idling the GPU and 
> retiring requests.
> 
> Would it also make sense to do those steps before attempting to migrate 
> the stolen objects?
Here, we do that implicitly when trying to do a vma_unbind for the
object.

Thanks,
Ankit


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-10 13:17     ` Ankitprasad Sharma
@ 2015-12-10 14:15       ` Tvrtko Ursulin
  2015-12-10 18:00         ` Dave Gordon
  2015-12-11  5:16         ` Ankitprasad Sharma
  0 siblings, 2 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-10 14:15 UTC (permalink / raw)
  To: Ankitprasad Sharma; +Cc: intel-gfx, akash.goel, shashidhar.hiremath


On 10/12/15 13:17, Ankitprasad Sharma wrote:
> On Thu, 2015-12-10 at 09:43 +0000, Tvrtko Ursulin wrote:
>> Hi,
>>
>> Two more comments below:
>>
>> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>
>>> Ville reminded us that stolen memory is not preserved across
>>> hibernation, and a result of this was that context objects now being
>>> allocated from stolen were being corrupted on S4 and promptly hanging
>>> the GPU on resume.
>>>
>>> We want to utilise stolen for as much as possible (nothing else will use
>>> that wasted memory otherwise), so we need a strategy for handling
>>> general objects allocated from stolen and hibernation. A simple solution
>>> is to do a CPU copy through the GTT of the stolen object into a fresh
>>> shmemfs backing store and thenceforth treat it as a normal objects. This
>>> can be refined in future to either use a GPU copy to avoid the slow
>>> uncached reads (though it's hibernation!) and recreate stolen objects
>>> upon resume/first-use. For now, a simple approach should suffice for
>>> testing the object migration.
>>
>> Mention of "testing" in the commit message and absence of a path to
>> migrate the objects back to stolen memory on resume makes me think this
>> is kind of half finished and note really ready for review / merge ?
>>
>> Because I don't see how it is useful to migrate it one way and never
>> move back?
> I think that this is not much of a problem, as the purpose here is to
> keep the object intact, to avoid breaking anything.
> So as far as objects are concerned they will be in shmem and can be used
> without any issue, and the stolen memory will be free again for other
> usage from the user.

I am not sure that is a good state of things.

One of the things it means is that when user wanted to create an object 
in stolen memory, after resume it will not be any more. So what is the 
point in failing stolen object creation when area is full in the first 
place? We could just return a normal object instead.

Then the question of objects which are allocated in stolen by the 
driver. Are they being re-allocated on resume or will also be stuck in 
shmemfs from then onward?

And finally, one corner case might be that shmemfs plus stolen is a 
larger sum which will be attempted to restored in shmemfs only on 
resume. Will that always work if everything is fully populated and what 
will happen if we run out of space?

At minimum all this should be discussed and explicitly documented in the 
commit message.

Would it be difficult to implement the reverse path?

>>> v2:
>>> Swap PTE for pinned bindings over to the shmemfs. This adds a
>>> complicated dance, but is required as many stolen objects are likely to
>>> be pinned for use by the hardware. Swapping the PTEs should not result
>>> in externally visible behaviour, as each PTE update should be atomic and
>>> the two pages identical. (danvet)
>>>
>>> safe-by-default, or the principle of least surprise. We need a new flag
>>> to mark objects that we can wilfully discard and recreate across
>>> hibernation. (danvet)
>>>
>>> Just use the global_list rather than invent a new stolen_list. This is
>>> the slowpath hibernate and so adding a new list and the associated
>>> complexity isn't worth it.
>>>
>>> v3: Rebased on drm-intel-nightly (Ankit)
>>>
>>> v4: Use insert_page to map stolen memory backed pages for migration to
>>> shmem (Chris)
>>>
>>> v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
>>>    drivers/gpu/drm/i915/i915_drv.h         |   7 +
>>>    drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
>>>    drivers/gpu/drm/i915/intel_display.c    |   3 +
>>>    drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
>>>    drivers/gpu/drm/i915/intel_pm.c         |   2 +
>>>    drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
>>>    7 files changed, 261 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>>> index 9f55209..2bb9e9e 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.c
>>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>>> @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
>>>    	return i915_drm_suspend(drm_dev);
>>>    }
>>>
>>> +static int i915_pm_freeze(struct device *dev)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
>>> +	if (ret)
>>> +		return ret;
>>
>> One of the first steps in idling GEM seems to be idling the GPU and
>> retiring requests.
>>
>> Would it also make sense to do those steps before attempting to migrate
>> the stolen objects?
> Here, we do that implicitly when trying to do a vma_unbind for the
> object.

Code paths are not the same so it makes me uncomfortable.  It looks more 
logical to do the migration after the existing i915_gem_suspend. It 
would mean some code duplication, true (maybe split i915_drm_suspend in 
two and call i915_gem_freeze in between), but to me it looks more like a 
proper place to do it.

Do Chris or Ville have any opinions here?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-10 14:15       ` Tvrtko Ursulin
@ 2015-12-10 18:00         ` Dave Gordon
  2015-12-11  5:19           ` Ankitprasad Sharma
  2015-12-11  5:16         ` Ankitprasad Sharma
  1 sibling, 1 reply; 47+ messages in thread
From: Dave Gordon @ 2015-12-10 18:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, Ankitprasad Sharma
  Cc: intel-gfx, akash.goel, shashidhar.hiremath

On 10/12/15 14:15, Tvrtko Ursulin wrote:
>
> On 10/12/15 13:17, Ankitprasad Sharma wrote:
>> On Thu, 2015-12-10 at 09:43 +0000, Tvrtko Ursulin wrote:
>>> Hi,
>>>
>>> Two more comments below:
>>>
>>> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
>>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>>
>>>> Ville reminded us that stolen memory is not preserved across
>>>> hibernation, and a result of this was that context objects now being
>>>> allocated from stolen were being corrupted on S4 and promptly hanging
>>>> the GPU on resume.
>>>>
>>>> We want to utilise stolen for as much as possible (nothing else will
>>>> use
>>>> that wasted memory otherwise), so we need a strategy for handling
>>>> general objects allocated from stolen and hibernation. A simple
>>>> solution
>>>> is to do a CPU copy through the GTT of the stolen object into a fresh
>>>> shmemfs backing store and thenceforth treat it as a normal objects.
>>>> This
>>>> can be refined in future to either use a GPU copy to avoid the slow
>>>> uncached reads (though it's hibernation!) and recreate stolen objects
>>>> upon resume/first-use. For now, a simple approach should suffice for
>>>> testing the object migration.
>>>
>>> Mention of "testing" in the commit message and absence of a path to
>>> migrate the objects back to stolen memory on resume makes me think this
>>> is kind of half finished and note really ready for review / merge ?
>>>
>>> Because I don't see how it is useful to migrate it one way and never
>>> move back?
>> I think that this is not much of a problem, as the purpose here is to
>> keep the object intact, to avoid breaking anything.
>> So as far as objects are concerned they will be in shmem and can be used
>> without any issue, and the stolen memory will be free again for other
>> usage from the user.
>
> I am not sure that is a good state of things.
>
> One of the things it means is that when user wanted to create an object
> in stolen memory, after resume it will not be any more. So what is the
> point in failing stolen object creation when area is full in the first
> place? We could just return a normal object instead.
>
> Then the question of objects which are allocated in stolen by the
> driver. Are they being re-allocated on resume or will also be stuck in
> shmemfs from then onward?
>
> And finally, one corner case might be that shmemfs plus stolen is a
> larger sum which will be attempted to restored in shmemfs only on
> resume. Will that always work if everything is fully populated and what
> will happen if we run out of space?
>
> At minimum all this should be discussed and explicitly documented in the
> commit message.
>
> Would it be difficult to implement the reverse path?

Please don't migrate random objects to stolen! It has all sorts of 
limitations that make it unsuitable for some types of object (e.g. 
contexts).

Only objects that were originally placed in stolen should ever be 
candidates for the reverse migration ...

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-10 11:12       ` Ankitprasad Sharma
@ 2015-12-10 18:18         ` Dave Gordon
  2015-12-11  5:22           ` Ankitprasad Sharma
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Gordon @ 2015-12-10 18:18 UTC (permalink / raw)
  To: Ankitprasad Sharma; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On 10/12/15 11:12, Ankitprasad Sharma wrote:
> On Wed, 2015-12-09 at 19:39 +0000, Dave Gordon wrote:
>> On 09/12/15 16:15, Tvrtko Ursulin wrote:
>>>
>>> Hi,
>>>
>>> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
>>>> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>>>
>>>> This patch adds support for extending the pread/pwrite functionality
>>>> for objects not backed by shmem. The access will be made through
>>>> gtt interface. This will cover objects backed by stolen memory as well
>>>> as other non-shmem backed objects.
>>>>
>>>> v2: Drop locks around slow_user_access, prefault the pages before
>>>> access (Chris)
>>>>
>>>> v3: Rebased to the latest drm-intel-nightly (Ankit)
>>>>
>>>> v4: Moved page base & offset calculations outside the copy loop,
>>>> corrected data types for size and offset variables, corrected if-else
>>>> braces format (Tvrtko/kerneldocs)
>>>>
>>>> v5: Enabled pread/pwrite for all non-shmem backed objects including
>>>> without tiling restrictions (Ankit)
>>>>
>>>> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
>>>>
>>>> v7: Updated commit message, Renamed i915_gem_gtt_read to
>>>> i915_gem_gtt_copy,
>>>> added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
>>>>
>>>> v8: Updated v7 commit message, mutex unlock around pwrite slow path for
>>>> non-shmem backed objects (Tvrtko)
>>>>
>>>> Testcase: igt/gem_stolen
>>>>
>>>> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_gem.c | 151
>>>> +++++++++++++++++++++++++++++++++-------
>>>>    1 file changed, 127 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>> index ed97de6..68ed67a 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int
>>>> shmem_page_offset, int page_length,
>>>>        return ret ? - EFAULT : 0;
>>>>    }
>>>>
>>>> +static inline uint64_t
>>>> +slow_user_access(struct io_mapping *mapping,
>>>> +         uint64_t page_base, int page_offset,
>>>> +         char __user *user_data,
>>>> +         int length, bool pwrite)
>>>> +{
>>>> +    void __iomem *vaddr_inatomic;
>>>> +    void *vaddr;
>>>> +    uint64_t unwritten;
>>>> +
>>>> +    vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
>>>> +    /* We can use the cpu mem copy function because this is X86. */
>>>> +    vaddr = (void __force *)vaddr_inatomic + page_offset;
>>>> +    if (pwrite)
>>>> +        unwritten = __copy_from_user(vaddr, user_data, length);
>>>> +    else
>>>> +        unwritten = __copy_to_user(user_data, vaddr, length);
>>>> +
>>>> +    io_mapping_unmap(vaddr_inatomic);
>>>> +    return unwritten;
>>>> +}
>>>> +
>>>> +static int
>>>> +i915_gem_gtt_copy(struct drm_device *dev,
>>>> +           struct drm_i915_gem_object *obj, uint64_t size,
>>>> +           uint64_t data_offset, uint64_t data_ptr)
>>>> +{
>>>> +    struct drm_i915_private *dev_priv = dev->dev_private;
>>>> +    char __user *user_data;
>>>> +    uint64_t remain;
>>>> +    uint64_t offset, page_base;
>>>> +    int page_offset, page_length, ret = 0;
>>>> +
>>>> +    ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
>>>> +    if (ret)
>>>> +        goto out;
>>>> +
>>>> +    ret = i915_gem_object_set_to_gtt_domain(obj, false);
>>>> +    if (ret)
>>>> +        goto out_unpin;
>>>> +
>>>> +    ret = i915_gem_object_put_fence(obj);
>>>> +    if (ret)
>>>> +        goto out_unpin;
>>>> +
>>>> +    user_data = to_user_ptr(data_ptr);
>>>> +    remain = size;
>>>> +    offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
>>>> +
>>>> +    mutex_unlock(&dev->struct_mutex);
>>>> +    if (likely(!i915.prefault_disable))
>>>> +        ret = fault_in_multipages_writeable(user_data, remain);
>>>> +
>>>> +    /*
>>>> +     * page_offset = offset within page
>>>> +     * page_base = page offset within aperture
>>>> +     */
>>>> +    page_offset = offset_in_page(offset);
>>>> +    page_base = offset & PAGE_MASK;
>>>> +
>>>> +    while (remain > 0) {
>>>> +        /* page_length = bytes to copy for this page */
>>>> +        page_length = remain;
>>>> +        if ((page_offset + remain) > PAGE_SIZE)
>>>> +            page_length = PAGE_SIZE - page_offset;
>>>> +
>>>> +        /* This is a slow read/write as it tries to read from
>>>> +         * and write to user memory which may result into page
>>>> +         * faults
>>>> +         */
>>>> +        ret = slow_user_access(dev_priv->gtt.mappable, page_base,
>>>> +                       page_offset, user_data,
>>>> +                       page_length, false);
>>>> +
>>>> +        if (ret) {
>>>> +            ret = -EFAULT;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        remain -= page_length;
>>>> +        user_data += page_length;
>>>> +        page_base += page_length;
>>>> +        page_offset = 0;
>>>> +    }
>>>> +
>>>> +    mutex_lock(&dev->struct_mutex);
>>>> +
>>>> +out_unpin:
>>>> +    i915_gem_object_ggtt_unpin(obj);
>>>> +out:
>>>> +    return ret;
>>>> +}
>>>> +
>>>>    static int
>>>>    i915_gem_shmem_pread(struct drm_device *dev,
>>>>                 struct drm_i915_gem_object *obj,
>>>> @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev,
>>>> void *data,
>>>>            goto out;
>>>>        }
>>>>
>>>> -    /* prime objects have no backing filp to GEM pread/pwrite
>>>> -     * pages from.
>>>> -     */
>>>> -    if (!obj->base.filp) {
>>>> -        ret = -EINVAL;
>>>> -        goto out;
>>>> -    }
>>>> -
>>>>        trace_i915_gem_object_pread(obj, args->offset, args->size);
>>>>
>>>> -    ret = i915_gem_shmem_pread(dev, obj, args, file);
>>>> +    /* pread for non shmem backed objects */
>>>> +    if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
>>>> +        ret = i915_gem_gtt_copy(dev, obj, args->size,
>>>> +                    args->offset, args->data_ptr);
>>>> +    else
>>>> +        ret = i915_gem_shmem_pread(dev, obj, args, file);
>>>
>>> Hm, it will end up calling i915_gem_shmem_pread for non-shmem backed
>>> objects if tiling is set. Sounds wrong to me unless I am missing something?
>>
>> Which GEM objects have obj->base.filp set? Is it ONLY regular gtt-type
>> objects? What about (phys, stolen, userptr, dmabuf, ...?) Which of these
>> is the alternate path going to work with?
> Only shmem backed objects have obj->base.filp set, filp pointing to the
> shmem file. For all other non-shmem backed objects (stolen, userptr,
> dmabuf) we use the alternate path.
>
> -Ankit

But 'phys' objects DO have 'filp' set. Which path is expected to work 
for them?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-10 14:15       ` Tvrtko Ursulin
  2015-12-10 18:00         ` Dave Gordon
@ 2015-12-11  5:16         ` Ankitprasad Sharma
  2015-12-11 12:33           ` Tvrtko Ursulin
  1 sibling, 1 reply; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-11  5:16 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Thu, 2015-12-10 at 14:15 +0000, Tvrtko Ursulin wrote:
> On 10/12/15 13:17, Ankitprasad Sharma wrote:
> > On Thu, 2015-12-10 at 09:43 +0000, Tvrtko Ursulin wrote:
> >> Hi,
> >>
> >> Two more comments below:
> >>
> >> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> >>> From: Chris Wilson <chris@chris-wilson.co.uk>
> >>>
> >>> Ville reminded us that stolen memory is not preserved across
> >>> hibernation, and a result of this was that context objects now being
> >>> allocated from stolen were being corrupted on S4 and promptly hanging
> >>> the GPU on resume.
> >>>
> >>> We want to utilise stolen for as much as possible (nothing else will use
> >>> that wasted memory otherwise), so we need a strategy for handling
> >>> general objects allocated from stolen and hibernation. A simple solution
> >>> is to do a CPU copy through the GTT of the stolen object into a fresh
> >>> shmemfs backing store and thenceforth treat it as a normal objects. This
> >>> can be refined in future to either use a GPU copy to avoid the slow
> >>> uncached reads (though it's hibernation!) and recreate stolen objects
> >>> upon resume/first-use. For now, a simple approach should suffice for
> >>> testing the object migration.
> >>
> >> Mention of "testing" in the commit message and absence of a path to
> >> migrate the objects back to stolen memory on resume makes me think this
> >> is kind of half finished and note really ready for review / merge ?
> >>
> >> Because I don't see how it is useful to migrate it one way and never
> >> move back?
> > I think that this is not much of a problem, as the purpose here is to
> > keep the object intact, to avoid breaking anything.
> > So as far as objects are concerned they will be in shmem and can be used
> > without any issue, and the stolen memory will be free again for other
> > usage from the user.
> 
> I am not sure that is a good state of things.
> 
> One of the things it means is that when user wanted to create an object 
> in stolen memory, after resume it will not be any more. So what is the 
> point in failing stolen object creation when area is full in the first 
> place? We could just return a normal object instead.
I agree with you, but the absence of a reverse path will not affect the
user in any way, though the user may be under the wrong impression that
the buffer is residing inside the stolen area.

> 
> Then the question of objects which are allocated in stolen by the 
> driver. Are they being re-allocated on resume or will also be stuck in 
> shmemfs from then onward?
Objects allocated by the driver need not be preserved (we use a
internal_volatile flag for those). These are not migrated to the shmemfs
and are later re-populated by the driver, when used again after resume.
> 
> And finally, one corner case might be that shmemfs plus stolen is a 
> larger sum which will be attempted to restored in shmemfs only on 
> resume. Will that always work if everything is fully populated and what 
> will happen if we run out of space?
As per my understanding,
shmemfs size will get increased, due to migration, before the
hibernation itself. And if not everything from shmemfs can be stored in
RAM, swap-out will take care of it.
Whatever was stored in the RAM will be restored on resume, rest all will
remain in the swap.
> 
> At minimum all this should be discussed and explicitly documented in the 
> commit message.
> 
> Would it be difficult to implement the reverse path?
I will try to explore the reverse path as well. But that can be
submitted separately as a follow-up patch.
> 
> >>> v2:
> >>> Swap PTE for pinned bindings over to the shmemfs. This adds a
> >>> complicated dance, but is required as many stolen objects are likely to
> >>> be pinned for use by the hardware. Swapping the PTEs should not result
> >>> in externally visible behaviour, as each PTE update should be atomic and
> >>> the two pages identical. (danvet)
> >>>
> >>> safe-by-default, or the principle of least surprise. We need a new flag
> >>> to mark objects that we can wilfully discard and recreate across
> >>> hibernation. (danvet)
> >>>
> >>> Just use the global_list rather than invent a new stolen_list. This is
> >>> the slowpath hibernate and so adding a new list and the associated
> >>> complexity isn't worth it.
> >>>
> >>> v3: Rebased on drm-intel-nightly (Ankit)
> >>>
> >>> v4: Use insert_page to map stolen memory backed pages for migration to
> >>> shmem (Chris)
> >>>
> >>> v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
> >>>
> >>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
> >>>    drivers/gpu/drm/i915/i915_drv.h         |   7 +
> >>>    drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
> >>>    drivers/gpu/drm/i915/intel_display.c    |   3 +
> >>>    drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
> >>>    drivers/gpu/drm/i915/intel_pm.c         |   2 +
> >>>    drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
> >>>    7 files changed, 261 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> >>> index 9f55209..2bb9e9e 100644
> >>> --- a/drivers/gpu/drm/i915/i915_drv.c
> >>> +++ b/drivers/gpu/drm/i915/i915_drv.c
> >>> @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
> >>>    	return i915_drm_suspend(drm_dev);
> >>>    }
> >>>
> >>> +static int i915_pm_freeze(struct device *dev)
> >>> +{
> >>> +	int ret;
> >>> +
> >>> +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
> >>> +	if (ret)
> >>> +		return ret;
> >>
> >> One of the first steps in idling GEM seems to be idling the GPU and
> >> retiring requests.
> >>
> >> Would it also make sense to do those steps before attempting to migrate
> >> the stolen objects?
> > Here, we do that implicitly when trying to do a vma_unbind for the
> > object.
> 
> Code paths are not the same so it makes me uncomfortable.  It looks more 
> logical to do the migration after the existing i915_gem_suspend. It 
> would mean some code duplication, true (maybe split i915_drm_suspend in 
> two and call i915_gem_freeze in between), but to me it looks more like a 
> proper place to do it.
> 
All inactive stolen objects will be migrated immediately and the active
ones will be implicitly synchronized in vma_unbind.
Would it be more appropriate to rename i915_gem_freeze to
i915_gem_migrate_stolen?

But anyway, we will wait for Chris or Ville's inputs.
> Do Chris or Ville have any opinions here?
> 
> Regards,
> 
> Tvrtko

Thanks,
Ankit

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-10 18:00         ` Dave Gordon
@ 2015-12-11  5:19           ` Ankitprasad Sharma
  0 siblings, 0 replies; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-11  5:19 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Thu, 2015-12-10 at 18:00 +0000, Dave Gordon wrote:
> On 10/12/15 14:15, Tvrtko Ursulin wrote:
> >
> > On 10/12/15 13:17, Ankitprasad Sharma wrote:
> >> On Thu, 2015-12-10 at 09:43 +0000, Tvrtko Ursulin wrote:
> >>> Hi,
> >>>
> >>> Two more comments below:
> >>>
> >>> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> >>>> From: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>
> >>>> Ville reminded us that stolen memory is not preserved across
> >>>> hibernation, and a result of this was that context objects now being
> >>>> allocated from stolen were being corrupted on S4 and promptly hanging
> >>>> the GPU on resume.
> >>>>
> >>>> We want to utilise stolen for as much as possible (nothing else will
> >>>> use
> >>>> that wasted memory otherwise), so we need a strategy for handling
> >>>> general objects allocated from stolen and hibernation. A simple
> >>>> solution
> >>>> is to do a CPU copy through the GTT of the stolen object into a fresh
> >>>> shmemfs backing store and thenceforth treat it as a normal objects.
> >>>> This
> >>>> can be refined in future to either use a GPU copy to avoid the slow
> >>>> uncached reads (though it's hibernation!) and recreate stolen objects
> >>>> upon resume/first-use. For now, a simple approach should suffice for
> >>>> testing the object migration.
> >>>
> >>> Mention of "testing" in the commit message and absence of a path to
> >>> migrate the objects back to stolen memory on resume makes me think this
> >>> is kind of half finished and note really ready for review / merge ?
> >>>
> >>> Because I don't see how it is useful to migrate it one way and never
> >>> move back?
> >> I think that this is not much of a problem, as the purpose here is to
> >> keep the object intact, to avoid breaking anything.
> >> So as far as objects are concerned they will be in shmem and can be used
> >> without any issue, and the stolen memory will be free again for other
> >> usage from the user.
> >
> > I am not sure that is a good state of things.
> >
> > One of the things it means is that when user wanted to create an object
> > in stolen memory, after resume it will not be any more. So what is the
> > point in failing stolen object creation when area is full in the first
> > place? We could just return a normal object instead.
> >
> > Then the question of objects which are allocated in stolen by the
> > driver. Are they being re-allocated on resume or will also be stuck in
> > shmemfs from then onward?
> >
> > And finally, one corner case might be that shmemfs plus stolen is a
> > larger sum which will be attempted to restored in shmemfs only on
> > resume. Will that always work if everything is fully populated and what
> > will happen if we run out of space?
> >
> > At minimum all this should be discussed and explicitly documented in the
> > commit message.
> >
> > Would it be difficult to implement the reverse path?
> 
> Please don't migrate random objects to stolen! It has all sorts of 
> limitations that make it unsuitable for some types of object (e.g. 
> contexts).
> 
> Only objects that were originally placed in stolen should ever be 
> candidates for the reverse migration ...
Yes, obviously. We will consider only those objects which were
originally placed in stolen area.
> 
> .Dave.
> 
Thanks,
Ankit



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-10 18:18         ` Dave Gordon
@ 2015-12-11  5:22           ` Ankitprasad Sharma
  0 siblings, 0 replies; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-11  5:22 UTC (permalink / raw)
  To: Dave Gordon; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Thu, 2015-12-10 at 18:18 +0000, Dave Gordon wrote:
> On 10/12/15 11:12, Ankitprasad Sharma wrote:
> > On Wed, 2015-12-09 at 19:39 +0000, Dave Gordon wrote:
> >> On 09/12/15 16:15, Tvrtko Ursulin wrote:
> >>>
> >>> Hi,
> >>>
> >>> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> >>>> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >>>>
> >>>> This patch adds support for extending the pread/pwrite functionality
> >>>> for objects not backed by shmem. The access will be made through
> >>>> gtt interface. This will cover objects backed by stolen memory as well
> >>>> as other non-shmem backed objects.
> >>>>
> >>>> v2: Drop locks around slow_user_access, prefault the pages before
> >>>> access (Chris)
> >>>>
> >>>> v3: Rebased to the latest drm-intel-nightly (Ankit)
> >>>>
> >>>> v4: Moved page base & offset calculations outside the copy loop,
> >>>> corrected data types for size and offset variables, corrected if-else
> >>>> braces format (Tvrtko/kerneldocs)
> >>>>
> >>>> v5: Enabled pread/pwrite for all non-shmem backed objects including
> >>>> without tiling restrictions (Ankit)
> >>>>
> >>>> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >>>>
> >>>> v7: Updated commit message, Renamed i915_gem_gtt_read to
> >>>> i915_gem_gtt_copy,
> >>>> added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >>>>
> >>>> v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> >>>> non-shmem backed objects (Tvrtko)
> >>>>
> >>>> Testcase: igt/gem_stolen
> >>>>
> >>>> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/i915_gem.c | 151
> >>>> +++++++++++++++++++++++++++++++++-------
> >>>>    1 file changed, 127 insertions(+), 24 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> >>>> b/drivers/gpu/drm/i915/i915_gem.c
> >>>> index ed97de6..68ed67a 100644
> >>>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>> @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int
> >>>> shmem_page_offset, int page_length,
> >>>>        return ret ? - EFAULT : 0;
> >>>>    }
> >>>>
> >>>> +static inline uint64_t
> >>>> +slow_user_access(struct io_mapping *mapping,
> >>>> +         uint64_t page_base, int page_offset,
> >>>> +         char __user *user_data,
> >>>> +         int length, bool pwrite)
> >>>> +{
> >>>> +    void __iomem *vaddr_inatomic;
> >>>> +    void *vaddr;
> >>>> +    uint64_t unwritten;
> >>>> +
> >>>> +    vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> >>>> +    /* We can use the cpu mem copy function because this is X86. */
> >>>> +    vaddr = (void __force *)vaddr_inatomic + page_offset;
> >>>> +    if (pwrite)
> >>>> +        unwritten = __copy_from_user(vaddr, user_data, length);
> >>>> +    else
> >>>> +        unwritten = __copy_to_user(user_data, vaddr, length);
> >>>> +
> >>>> +    io_mapping_unmap(vaddr_inatomic);
> >>>> +    return unwritten;
> >>>> +}
> >>>> +
> >>>> +static int
> >>>> +i915_gem_gtt_copy(struct drm_device *dev,
> >>>> +           struct drm_i915_gem_object *obj, uint64_t size,
> >>>> +           uint64_t data_offset, uint64_t data_ptr)
> >>>> +{
> >>>> +    struct drm_i915_private *dev_priv = dev->dev_private;
> >>>> +    char __user *user_data;
> >>>> +    uint64_t remain;
> >>>> +    uint64_t offset, page_base;
> >>>> +    int page_offset, page_length, ret = 0;
> >>>> +
> >>>> +    ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> >>>> +    if (ret)
> >>>> +        goto out;
> >>>> +
> >>>> +    ret = i915_gem_object_set_to_gtt_domain(obj, false);
> >>>> +    if (ret)
> >>>> +        goto out_unpin;
> >>>> +
> >>>> +    ret = i915_gem_object_put_fence(obj);
> >>>> +    if (ret)
> >>>> +        goto out_unpin;
> >>>> +
> >>>> +    user_data = to_user_ptr(data_ptr);
> >>>> +    remain = size;
> >>>> +    offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> >>>> +
> >>>> +    mutex_unlock(&dev->struct_mutex);
> >>>> +    if (likely(!i915.prefault_disable))
> >>>> +        ret = fault_in_multipages_writeable(user_data, remain);
> >>>> +
> >>>> +    /*
> >>>> +     * page_offset = offset within page
> >>>> +     * page_base = page offset within aperture
> >>>> +     */
> >>>> +    page_offset = offset_in_page(offset);
> >>>> +    page_base = offset & PAGE_MASK;
> >>>> +
> >>>> +    while (remain > 0) {
> >>>> +        /* page_length = bytes to copy for this page */
> >>>> +        page_length = remain;
> >>>> +        if ((page_offset + remain) > PAGE_SIZE)
> >>>> +            page_length = PAGE_SIZE - page_offset;
> >>>> +
> >>>> +        /* This is a slow read/write as it tries to read from
> >>>> +         * and write to user memory which may result into page
> >>>> +         * faults
> >>>> +         */
> >>>> +        ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> >>>> +                       page_offset, user_data,
> >>>> +                       page_length, false);
> >>>> +
> >>>> +        if (ret) {
> >>>> +            ret = -EFAULT;
> >>>> +            break;
> >>>> +        }
> >>>> +
> >>>> +        remain -= page_length;
> >>>> +        user_data += page_length;
> >>>> +        page_base += page_length;
> >>>> +        page_offset = 0;
> >>>> +    }
> >>>> +
> >>>> +    mutex_lock(&dev->struct_mutex);
> >>>> +
> >>>> +out_unpin:
> >>>> +    i915_gem_object_ggtt_unpin(obj);
> >>>> +out:
> >>>> +    return ret;
> >>>> +}
> >>>> +
> >>>>    static int
> >>>>    i915_gem_shmem_pread(struct drm_device *dev,
> >>>>                 struct drm_i915_gem_object *obj,
> >>>> @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev,
> >>>> void *data,
> >>>>            goto out;
> >>>>        }
> >>>>
> >>>> -    /* prime objects have no backing filp to GEM pread/pwrite
> >>>> -     * pages from.
> >>>> -     */
> >>>> -    if (!obj->base.filp) {
> >>>> -        ret = -EINVAL;
> >>>> -        goto out;
> >>>> -    }
> >>>> -
> >>>>        trace_i915_gem_object_pread(obj, args->offset, args->size);
> >>>>
> >>>> -    ret = i915_gem_shmem_pread(dev, obj, args, file);
> >>>> +    /* pread for non shmem backed objects */
> >>>> +    if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> >>>> +        ret = i915_gem_gtt_copy(dev, obj, args->size,
> >>>> +                    args->offset, args->data_ptr);
> >>>> +    else
> >>>> +        ret = i915_gem_shmem_pread(dev, obj, args, file);
> >>>
> >>> Hm, it will end up calling i915_gem_shmem_pread for non-shmem backed
> >>> objects if tiling is set. Sounds wrong to me unless I am missing something?
> >>
> >> Which GEM objects have obj->base.filp set? Is it ONLY regular gtt-type
> >> objects? What about (phys, stolen, userptr, dmabuf, ...?) Which of these
> >> is the alternate path going to work with?
> > Only shmem backed objects have obj->base.filp set, filp pointing to the
> > shmem file. For all other non-shmem backed objects (stolen, userptr,
> > dmabuf) we use the alternate path.
> >
> > -Ankit
> 
> But 'phys' objects DO have 'filp' set. Which path is expected to work 
> for them?
Sorry. Yes, phys objects also have filp set. So they won't follow the
alternate path.

> .Dave.
Thanks,
Ankit




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects
  2015-12-09 14:06   ` Tvrtko Ursulin
@ 2015-12-11 11:22     ` Ankitprasad Sharma
  2015-12-11 12:19       ` Tvrtko Ursulin
  0 siblings, 1 reply; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-12-11 11:22 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Wed, 2015-12-09 at 14:06 +0000, Tvrtko Ursulin wrote:
> Hi,
> 
> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >
> > Extend the drm_i915_gem_create structure to add support for
> > creating Stolen memory backed objects. Added a new flag through
> > which user can specify the preference to allocate the object from
> > stolen memory, which if set, an attempt will be made to allocate
> > the object from stolen memory subject to the availability of
> > free space in the stolen region.
> >
> > v2: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko)
> >
> > v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko)
> > Corrected function arguments ordering (Chris)
> >
> > v5: Corrected function name (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_dma.c        |  3 +++
> >   drivers/gpu/drm/i915/i915_drv.h        |  2 +-
> >   drivers/gpu/drm/i915/i915_gem.c        | 30 +++++++++++++++++++++++++++---
> >   drivers/gpu/drm/i915/i915_gem_stolen.c |  4 ++--
> >   include/uapi/drm/i915_drm.h            | 16 ++++++++++++++++
> >   5 files changed, 49 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> > index ffcb9c6..6927c7e 100644
> > --- a/drivers/gpu/drm/i915/i915_dma.c
> > +++ b/drivers/gpu/drm/i915/i915_dma.c
> > @@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
> >   	case I915_PARAM_HAS_RESOURCE_STREAMER:
> >   		value = HAS_RESOURCE_STREAMER(dev);
> >   		break;
> > +	case I915_PARAM_CREATE_VERSION:
> > +		value = 2;
> > +		break;
> >   	default:
> >   		DRM_DEBUG("Unknown parameter %d\n", param->param);
> >   		return -EINVAL;
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 8e554d3..d45274e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -3213,7 +3213,7 @@ void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
> >   int i915_gem_init_stolen(struct drm_device *dev);
> >   void i915_gem_cleanup_stolen(struct drm_device *dev);
> >   struct drm_i915_gem_object *
> > -i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
> > +i915_gem_object_create_stolen(struct drm_device *dev, u64 size);
> >   struct drm_i915_gem_object *
> >   i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
> >   					       u32 stolen_offset,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index d57e850..296e63f 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -375,6 +375,7 @@ static int
> >   i915_gem_create(struct drm_file *file,
> >   		struct drm_device *dev,
> >   		uint64_t size,
> > +		uint32_t flags,
> >   		uint32_t *handle_p)
> >   {
> >   	struct drm_i915_gem_object *obj;
> > @@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file,
> >   	if (size == 0)
> >   		return -EINVAL;
> >
> > +	if (flags & __I915_CREATE_UNKNOWN_FLAGS)
> > +		return -EINVAL;
> > +
> >   	/* Allocate the new object */
> > -	obj = i915_gem_alloc_object(dev, size);
> > +	if (flags & I915_CREATE_PLACEMENT_STOLEN) {
> > +		mutex_lock(&dev->struct_mutex);
> > +		obj = i915_gem_object_create_stolen(dev, size);
> > +		if (!obj) {
> > +			mutex_unlock(&dev->struct_mutex);
> > +			return -ENOMEM;
> > +		}
> > +
> > +		/* Always clear fresh buffers before handing to userspace */
> > +		ret = i915_gem_object_clear(obj);
> > +		if (ret) {
> > +			drm_gem_object_unreference(&obj->base);
> > +			mutex_unlock(&dev->struct_mutex);
> > +			return ret;
> > +		}
> > +
> > +		mutex_unlock(&dev->struct_mutex);
> > +	} else {
> > +		obj = i915_gem_alloc_object(dev, size);
> > +	}
> > +
> >   	if (obj == NULL)
> >   		return -ENOMEM;
> >
> > @@ -409,7 +433,7 @@ i915_gem_dumb_create(struct drm_file *file,
> >   	args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
> >   	args->size = args->pitch * args->height;
> >   	return i915_gem_create(file, dev,
> > -			       args->size, &args->handle);
> > +			       args->size, 0, &args->handle);
> >   }
> >
> >   /**
> > @@ -422,7 +446,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
> >   	struct drm_i915_gem_create *args = data;
> >
> >   	return i915_gem_create(file, dev,
> > -			       args->size, &args->handle);
> > +			       args->size, args->flags, &args->handle);
> >   }
> >
> >   static inline int
> > diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
> > index 598ed2f..b98a3bf 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> > @@ -583,7 +583,7 @@ cleanup:
> >   }
> >
> >   struct drm_i915_gem_object *
> > -i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
> > +i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
> >   {
> >   	struct drm_i915_private *dev_priv = dev->dev_private;
> >   	struct drm_i915_gem_object *obj;
> > @@ -593,7 +593,7 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
> >   	if (!drm_mm_initialized(&dev_priv->mm.stolen))
> >   		return NULL;
> >
> > -	DRM_DEBUG_KMS("creating stolen object: size=%x\n", size);
> > +	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
> >   	if (size == 0)
> >   		return NULL;
> >
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 67cebe6..8e7e3a4 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -356,6 +356,7 @@ typedef struct drm_i915_irq_wait {
> >   #define I915_PARAM_EU_TOTAL		 34
> >   #define I915_PARAM_HAS_GPU_RESET	 35
> >   #define I915_PARAM_HAS_RESOURCE_STREAMER 36
> > +#define I915_PARAM_CREATE_VERSION	 37
> >
> >   typedef struct drm_i915_getparam {
> >   	__s32 param;
> > @@ -455,6 +456,21 @@ struct drm_i915_gem_create {
> >   	 */
> >   	__u32 handle;
> >   	__u32 pad;
> > +	/**
> > +	 * Requested flags (currently used for placement
> > +	 * (which memory domain))
> > +	 *
> > +	 * You can request that the object be created from special memory
> > +	 * rather than regular system pages using this parameter. Such
> > +	 * irregular objects may have certain restrictions (such as CPU
> > +	 * access to a stolen object is verboten).
> > +	 *
> > +	 * This can be used in the future for other purposes too
> > +	 * e.g. specifying tiling/caching/madvise
> > +	 */
> > +	__u32 flags;
> > +#define I915_CREATE_PLACEMENT_STOLEN 	(1<<0) /* Cannot use CPU mmaps */
> > +#define __I915_CREATE_UNKNOWN_FLAGS	-(I915_CREATE_PLACEMENT_STOLEN << 1)
> 
> I've asked in another reply, now that userspace can create a stolen 
> object, what happens if it tries to use it for a batch buffer?
> 
> Can it end up in the relocate_entry_cpu with a batch buffer allocated 
> from stolen, which would then call i915_gem_object_get_page and crash?
Thanks for pointing it out.
Yes, this is definitely a possibility, if we allocate batchbuffers from
the stolen region. I have started working on that, to do
relocate_entry_stolen() if the object is allocated from stolen.
> 
> >   };
> >
> >   struct drm_i915_gem_pread {
> >
> 
> Regards,
> 
> Tvrtko
Thanks,
Ankit

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects
  2015-12-11 11:22     ` Ankitprasad Sharma
@ 2015-12-11 12:19       ` Tvrtko Ursulin
  2015-12-11 12:49         ` Dave Gordon
  0 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-11 12:19 UTC (permalink / raw)
  To: Ankitprasad Sharma; +Cc: intel-gfx, akash.goel, shashidhar.hiremath


On 11/12/15 11:22, Ankitprasad Sharma wrote:
> On Wed, 2015-12-09 at 14:06 +0000, Tvrtko Ursulin wrote:
>> Hi,
>>
>> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
>>> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>>
>>> Extend the drm_i915_gem_create structure to add support for
>>> creating Stolen memory backed objects. Added a new flag through
>>> which user can specify the preference to allocate the object from
>>> stolen memory, which if set, an attempt will be made to allocate
>>> the object from stolen memory subject to the availability of
>>> free space in the stolen region.
>>>
>>> v2: Rebased to the latest drm-intel-nightly (Ankit)
>>>
>>> v3: Changed versioning of GEM_CREATE param, added new comments (Tvrtko)
>>>
>>> v4: Changed size from 32b to 64b to prevent userspace overflow (Tvrtko)
>>> Corrected function arguments ordering (Chris)
>>>
>>> v5: Corrected function name (Chris)
>>>
>>> Testcase: igt/gem_stolen
>>>
>>> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_dma.c        |  3 +++
>>>    drivers/gpu/drm/i915/i915_drv.h        |  2 +-
>>>    drivers/gpu/drm/i915/i915_gem.c        | 30 +++++++++++++++++++++++++++---
>>>    drivers/gpu/drm/i915/i915_gem_stolen.c |  4 ++--
>>>    include/uapi/drm/i915_drm.h            | 16 ++++++++++++++++
>>>    5 files changed, 49 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
>>> index ffcb9c6..6927c7e 100644
>>> --- a/drivers/gpu/drm/i915/i915_dma.c
>>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>>> @@ -170,6 +170,9 @@ static int i915_getparam(struct drm_device *dev, void *data,
>>>    	case I915_PARAM_HAS_RESOURCE_STREAMER:
>>>    		value = HAS_RESOURCE_STREAMER(dev);
>>>    		break;
>>> +	case I915_PARAM_CREATE_VERSION:
>>> +		value = 2;
>>> +		break;
>>>    	default:
>>>    		DRM_DEBUG("Unknown parameter %d\n", param->param);
>>>    		return -EINVAL;
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index 8e554d3..d45274e 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -3213,7 +3213,7 @@ void i915_gem_stolen_remove_node(struct drm_i915_private *dev_priv,
>>>    int i915_gem_init_stolen(struct drm_device *dev);
>>>    void i915_gem_cleanup_stolen(struct drm_device *dev);
>>>    struct drm_i915_gem_object *
>>> -i915_gem_object_create_stolen(struct drm_device *dev, u32 size);
>>> +i915_gem_object_create_stolen(struct drm_device *dev, u64 size);
>>>    struct drm_i915_gem_object *
>>>    i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
>>>    					       u32 stolen_offset,
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index d57e850..296e63f 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -375,6 +375,7 @@ static int
>>>    i915_gem_create(struct drm_file *file,
>>>    		struct drm_device *dev,
>>>    		uint64_t size,
>>> +		uint32_t flags,
>>>    		uint32_t *handle_p)
>>>    {
>>>    	struct drm_i915_gem_object *obj;
>>> @@ -385,8 +386,31 @@ i915_gem_create(struct drm_file *file,
>>>    	if (size == 0)
>>>    		return -EINVAL;
>>>
>>> +	if (flags & __I915_CREATE_UNKNOWN_FLAGS)
>>> +		return -EINVAL;
>>> +
>>>    	/* Allocate the new object */
>>> -	obj = i915_gem_alloc_object(dev, size);
>>> +	if (flags & I915_CREATE_PLACEMENT_STOLEN) {
>>> +		mutex_lock(&dev->struct_mutex);
>>> +		obj = i915_gem_object_create_stolen(dev, size);
>>> +		if (!obj) {
>>> +			mutex_unlock(&dev->struct_mutex);
>>> +			return -ENOMEM;
>>> +		}
>>> +
>>> +		/* Always clear fresh buffers before handing to userspace */
>>> +		ret = i915_gem_object_clear(obj);
>>> +		if (ret) {
>>> +			drm_gem_object_unreference(&obj->base);
>>> +			mutex_unlock(&dev->struct_mutex);
>>> +			return ret;
>>> +		}
>>> +
>>> +		mutex_unlock(&dev->struct_mutex);
>>> +	} else {
>>> +		obj = i915_gem_alloc_object(dev, size);
>>> +	}
>>> +
>>>    	if (obj == NULL)
>>>    		return -ENOMEM;
>>>
>>> @@ -409,7 +433,7 @@ i915_gem_dumb_create(struct drm_file *file,
>>>    	args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
>>>    	args->size = args->pitch * args->height;
>>>    	return i915_gem_create(file, dev,
>>> -			       args->size, &args->handle);
>>> +			       args->size, 0, &args->handle);
>>>    }
>>>
>>>    /**
>>> @@ -422,7 +446,7 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
>>>    	struct drm_i915_gem_create *args = data;
>>>
>>>    	return i915_gem_create(file, dev,
>>> -			       args->size, &args->handle);
>>> +			       args->size, args->flags, &args->handle);
>>>    }
>>>
>>>    static inline int
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> index 598ed2f..b98a3bf 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
>>> @@ -583,7 +583,7 @@ cleanup:
>>>    }
>>>
>>>    struct drm_i915_gem_object *
>>> -i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
>>> +i915_gem_object_create_stolen(struct drm_device *dev, u64 size)
>>>    {
>>>    	struct drm_i915_private *dev_priv = dev->dev_private;
>>>    	struct drm_i915_gem_object *obj;
>>> @@ -593,7 +593,7 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 size)
>>>    	if (!drm_mm_initialized(&dev_priv->mm.stolen))
>>>    		return NULL;
>>>
>>> -	DRM_DEBUG_KMS("creating stolen object: size=%x\n", size);
>>> +	DRM_DEBUG_KMS("creating stolen object: size=%llx\n", size);
>>>    	if (size == 0)
>>>    		return NULL;
>>>
>>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>>> index 67cebe6..8e7e3a4 100644
>>> --- a/include/uapi/drm/i915_drm.h
>>> +++ b/include/uapi/drm/i915_drm.h
>>> @@ -356,6 +356,7 @@ typedef struct drm_i915_irq_wait {
>>>    #define I915_PARAM_EU_TOTAL		 34
>>>    #define I915_PARAM_HAS_GPU_RESET	 35
>>>    #define I915_PARAM_HAS_RESOURCE_STREAMER 36
>>> +#define I915_PARAM_CREATE_VERSION	 37
>>>
>>>    typedef struct drm_i915_getparam {
>>>    	__s32 param;
>>> @@ -455,6 +456,21 @@ struct drm_i915_gem_create {
>>>    	 */
>>>    	__u32 handle;
>>>    	__u32 pad;
>>> +	/**
>>> +	 * Requested flags (currently used for placement
>>> +	 * (which memory domain))
>>> +	 *
>>> +	 * You can request that the object be created from special memory
>>> +	 * rather than regular system pages using this parameter. Such
>>> +	 * irregular objects may have certain restrictions (such as CPU
>>> +	 * access to a stolen object is verboten).
>>> +	 *
>>> +	 * This can be used in the future for other purposes too
>>> +	 * e.g. specifying tiling/caching/madvise
>>> +	 */
>>> +	__u32 flags;
>>> +#define I915_CREATE_PLACEMENT_STOLEN 	(1<<0) /* Cannot use CPU mmaps */
>>> +#define __I915_CREATE_UNKNOWN_FLAGS	-(I915_CREATE_PLACEMENT_STOLEN << 1)
>>
>> I've asked in another reply, now that userspace can create a stolen
>> object, what happens if it tries to use it for a batch buffer?
>>
>> Can it end up in the relocate_entry_cpu with a batch buffer allocated
>> from stolen, which would then call i915_gem_object_get_page and crash?
> Thanks for pointing it out.
> Yes, this is definitely a possibility, if we allocate batchbuffers from
> the stolen region. I have started working on that, to do
> relocate_entry_stolen() if the object is allocated from stolen.

Or perhaps it would be OK to just fail the execbuf?

Just thinking to simplify things. Is it required (or expected) that 
users will need or want to create batch buffers from stolen?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation
  2015-12-11  5:16         ` Ankitprasad Sharma
@ 2015-12-11 12:33           ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-12-11 12:33 UTC (permalink / raw)
  To: Ankitprasad Sharma; +Cc: intel-gfx, akash.goel, shashidhar.hiremath


On 11/12/15 05:16, Ankitprasad Sharma wrote:
> On Thu, 2015-12-10 at 14:15 +0000, Tvrtko Ursulin wrote:
>> On 10/12/15 13:17, Ankitprasad Sharma wrote:
>>> On Thu, 2015-12-10 at 09:43 +0000, Tvrtko Ursulin wrote:
>>>> Hi,
>>>>
>>>> Two more comments below:
>>>>
>>>> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
>>>>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>
>>>>> Ville reminded us that stolen memory is not preserved across
>>>>> hibernation, and a result of this was that context objects now being
>>>>> allocated from stolen were being corrupted on S4 and promptly hanging
>>>>> the GPU on resume.
>>>>>
>>>>> We want to utilise stolen for as much as possible (nothing else will use
>>>>> that wasted memory otherwise), so we need a strategy for handling
>>>>> general objects allocated from stolen and hibernation. A simple solution
>>>>> is to do a CPU copy through the GTT of the stolen object into a fresh
>>>>> shmemfs backing store and thenceforth treat it as a normal objects. This
>>>>> can be refined in future to either use a GPU copy to avoid the slow
>>>>> uncached reads (though it's hibernation!) and recreate stolen objects
>>>>> upon resume/first-use. For now, a simple approach should suffice for
>>>>> testing the object migration.
>>>>
>>>> Mention of "testing" in the commit message and absence of a path to
>>>> migrate the objects back to stolen memory on resume makes me think this
>>>> is kind of half finished and note really ready for review / merge ?
>>>>
>>>> Because I don't see how it is useful to migrate it one way and never
>>>> move back?
>>> I think that this is not much of a problem, as the purpose here is to
>>> keep the object intact, to avoid breaking anything.
>>> So as far as objects are concerned they will be in shmem and can be used
>>> without any issue, and the stolen memory will be free again for other
>>> usage from the user.
>>
>> I am not sure that is a good state of things.
>>
>> One of the things it means is that when user wanted to create an object
>> in stolen memory, after resume it will not be any more. So what is the
>> point in failing stolen object creation when area is full in the first
>> place? We could just return a normal object instead.
> I agree with you, but the absence of a reverse path will not affect the
> user in any way, though the user may be under the wrong impression that
> the buffer is residing inside the stolen area.

Yes, and since suspend-resume is rather a frequent use case it brings 
the whole point of stolen into question for me unless implemented fully.

>
>>
>> Then the question of objects which are allocated in stolen by the
>> driver. Are they being re-allocated on resume or will also be stuck in
>> shmemfs from then onward?
> Objects allocated by the driver need not be preserved (we use a
> internal_volatile flag for those). These are not migrated to the shmemfs
> and are later re-populated by the driver, when used again after resume.

Good then, it wasn't clear to me from that comment about HEAD and TAIL etc.

>>
>> And finally, one corner case might be that shmemfs plus stolen is a
>> larger sum which will be attempted to restored in shmemfs only on
>> resume. Will that always work if everything is fully populated and what
>> will happen if we run out of space?
> As per my understanding,
> shmemfs size will get increased, due to migration, before the
> hibernation itself. And if not everything from shmemfs can be stored in
> RAM, swap-out will take care of it.
> Whatever was stored in the RAM will be restored on resume, rest all will
> remain in the swap.

Yes but still there is a change it won't fit is someone is running at 
the limit and maybe with no swap, no?

>>
>> At minimum all this should be discussed and explicitly documented in the
>> commit message.
>>
>> Would it be difficult to implement the reverse path?
> I will try to explore the reverse path as well. But that can be
> submitted separately as a follow-up patch.

I don't think so because of what I wrote above. Especially since Ville 
said corruption can even happen in S3, but even if we only think of S4, 
for me that is such a common on frequent use-case that having the 
ability to request object be in stolen, only for that to be silently 
changed on resume is not worth it.

SO until the migration both ways is there I don't see what is the point 
of allowing stolen objects and even failing the creation if there is not 
enough space there.

>>
>>>>> v2:
>>>>> Swap PTE for pinned bindings over to the shmemfs. This adds a
>>>>> complicated dance, but is required as many stolen objects are likely to
>>>>> be pinned for use by the hardware. Swapping the PTEs should not result
>>>>> in externally visible behaviour, as each PTE update should be atomic and
>>>>> the two pages identical. (danvet)
>>>>>
>>>>> safe-by-default, or the principle of least surprise. We need a new flag
>>>>> to mark objects that we can wilfully discard and recreate across
>>>>> hibernation. (danvet)
>>>>>
>>>>> Just use the global_list rather than invent a new stolen_list. This is
>>>>> the slowpath hibernate and so adding a new list and the associated
>>>>> complexity isn't worth it.
>>>>>
>>>>> v3: Rebased on drm-intel-nightly (Ankit)
>>>>>
>>>>> v4: Use insert_page to map stolen memory backed pages for migration to
>>>>> shmem (Chris)
>>>>>
>>>>> v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>>>> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
>>>>>     drivers/gpu/drm/i915/i915_drv.h         |   7 +
>>>>>     drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
>>>>>     drivers/gpu/drm/i915/intel_display.c    |   3 +
>>>>>     drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
>>>>>     drivers/gpu/drm/i915/intel_pm.c         |   2 +
>>>>>     drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
>>>>>     7 files changed, 261 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>>>>> index 9f55209..2bb9e9e 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>>>>> @@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
>>>>>     	return i915_drm_suspend(drm_dev);
>>>>>     }
>>>>>
>>>>> +static int i915_pm_freeze(struct device *dev)
>>>>> +{
>>>>> +	int ret;
>>>>> +
>>>>> +	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>
>>>> One of the first steps in idling GEM seems to be idling the GPU and
>>>> retiring requests.
>>>>
>>>> Would it also make sense to do those steps before attempting to migrate
>>>> the stolen objects?
>>> Here, we do that implicitly when trying to do a vma_unbind for the
>>> object.
>>
>> Code paths are not the same so it makes me uncomfortable.  It looks more
>> logical to do the migration after the existing i915_gem_suspend. It
>> would mean some code duplication, true (maybe split i915_drm_suspend in
>> two and call i915_gem_freeze in between), but to me it looks more like a
>> proper place to do it.
>>
> All inactive stolen objects will be migrated immediately and the active
> ones will be implicitly synchronized in vma_unbind.
> Would it be more appropriate to rename i915_gem_freeze to
> i915_gem_migrate_stolen?
>
> But anyway, we will wait for Chris or Ville's inputs.

Ping! :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects
  2015-12-11 12:19       ` Tvrtko Ursulin
@ 2015-12-11 12:49         ` Dave Gordon
  2015-12-11 18:13           ` Daniel Vetter
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Gordon @ 2015-12-11 12:49 UTC (permalink / raw)
  To: Tvrtko Ursulin, Ankitprasad Sharma
  Cc: intel-gfx, akash.goel, shashidhar.hiremath

On 11/12/15 12:19, Tvrtko Ursulin wrote:
>
> On 11/12/15 11:22, Ankitprasad Sharma wrote:
>> On Wed, 2015-12-09 at 14:06 +0000, Tvrtko Ursulin wrote:
>>> Hi,
>>>
>>> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
>>>> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>>>
[snip!]
>>>> +    /**
>>>> +     * Requested flags (currently used for placement
>>>> +     * (which memory domain))
>>>> +     *
>>>> +     * You can request that the object be created from special memory
>>>> +     * rather than regular system pages using this parameter. Such
>>>> +     * irregular objects may have certain restrictions (such as CPU
>>>> +     * access to a stolen object is verboten).
>>>> +     *
>>>> +     * This can be used in the future for other purposes too
>>>> +     * e.g. specifying tiling/caching/madvise
>>>> +     */
>>>> +    __u32 flags;
>>>> +#define I915_CREATE_PLACEMENT_STOLEN     (1<<0) /* Cannot use CPU
>>>> mmaps */
>>>> +#define __I915_CREATE_UNKNOWN_FLAGS
>>>> -(I915_CREATE_PLACEMENT_STOLEN << 1)
>>>
>>> I've asked in another reply, now that userspace can create a stolen
>>> object, what happens if it tries to use it for a batch buffer?
>>>
>>> Can it end up in the relocate_entry_cpu with a batch buffer allocated
>>> from stolen, which would then call i915_gem_object_get_page and crash?
>> Thanks for pointing it out.
>> Yes, this is definitely a possibility, if we allocate batchbuffers from
>> the stolen region. I have started working on that, to do
>> relocate_entry_stolen() if the object is allocated from stolen.
>
> Or perhaps it would be OK to just fail the execbuf?
>
> Just thinking to simplify things. Is it required (or expected) that
> users will need or want to create batch buffers from stolen?
>
> Regards,
> Tvrtko

Let's NOT have batchbuffers in stolen. Or anywhere else exotic, just in 
regular shmfs-backed GEM objects (no phys, userptr, or dma_buf either).
And I'd rather contexts and ringbuffers weren't placed there either, 
because the CPU needs to write those all the time. All special-purpose 
GEM objects should be usable ONLY as data buffers for the GPU, or for 
CPU access with pread/pwrite. The objects that the kernel needs to 
understand and manipulate (contexts, ringbuffers, and batches) should 
always be default (shmfs-backed) GEM objects, so that we don't have to 
propagate the understanding of all the exceptional cases into a 
multitude of different kernel functions.

Oh, and I'd suggest that once we have more than two GEM object types, 
the pread/pwrite operations should be extracted and turned into vfuncs 
rather than adding complexity to the common ioctl/shmfs path.

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects
  2015-12-11 12:49         ` Dave Gordon
@ 2015-12-11 18:13           ` Daniel Vetter
  0 siblings, 0 replies; 47+ messages in thread
From: Daniel Vetter @ 2015-12-11 18:13 UTC (permalink / raw)
  To: Dave Gordon
  Cc: intel-gfx, shashidhar.hiremath, akash.goel, Ankitprasad Sharma

On Fri, Dec 11, 2015 at 12:49:37PM +0000, Dave Gordon wrote:
> On 11/12/15 12:19, Tvrtko Ursulin wrote:
> >
> >On 11/12/15 11:22, Ankitprasad Sharma wrote:
> >>On Wed, 2015-12-09 at 14:06 +0000, Tvrtko Ursulin wrote:
> >>>Hi,
> >>>
> >>>On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> >>>>From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >>>>
> [snip!]
> >>>>+    /**
> >>>>+     * Requested flags (currently used for placement
> >>>>+     * (which memory domain))
> >>>>+     *
> >>>>+     * You can request that the object be created from special memory
> >>>>+     * rather than regular system pages using this parameter. Such
> >>>>+     * irregular objects may have certain restrictions (such as CPU
> >>>>+     * access to a stolen object is verboten).
> >>>>+     *
> >>>>+     * This can be used in the future for other purposes too
> >>>>+     * e.g. specifying tiling/caching/madvise
> >>>>+     */
> >>>>+    __u32 flags;
> >>>>+#define I915_CREATE_PLACEMENT_STOLEN     (1<<0) /* Cannot use CPU
> >>>>mmaps */
> >>>>+#define __I915_CREATE_UNKNOWN_FLAGS
> >>>>-(I915_CREATE_PLACEMENT_STOLEN << 1)
> >>>
> >>>I've asked in another reply, now that userspace can create a stolen
> >>>object, what happens if it tries to use it for a batch buffer?
> >>>
> >>>Can it end up in the relocate_entry_cpu with a batch buffer allocated
> >>>from stolen, which would then call i915_gem_object_get_page and crash?
> >>Thanks for pointing it out.
> >>Yes, this is definitely a possibility, if we allocate batchbuffers from
> >>the stolen region. I have started working on that, to do
> >>relocate_entry_stolen() if the object is allocated from stolen.
> >
> >Or perhaps it would be OK to just fail the execbuf?
> >
> >Just thinking to simplify things. Is it required (or expected) that
> >users will need or want to create batch buffers from stolen?
> >
> >Regards,
> >Tvrtko
> 
> Let's NOT have batchbuffers in stolen. Or anywhere else exotic, just in
> regular shmfs-backed GEM objects (no phys, userptr, or dma_buf either).
> And I'd rather contexts and ringbuffers weren't placed there either, because
> the CPU needs to write those all the time. All special-purpose GEM objects
> should be usable ONLY as data buffers for the GPU, or for CPU access with
> pread/pwrite. The objects that the kernel needs to understand and manipulate
> (contexts, ringbuffers, and batches) should always be default (shmfs-backed)
> GEM objects, so that we don't have to propagate the understanding of all the
> exceptional cases into a multitude of different kernel functions.

Yeah, rejecting stolen batches makes sense I'd say.

> Oh, and I'd suggest that once we have more than two GEM object types, the
> pread/pwrite operations should be extracted and turned into vfuncs rather
> than adding complexity to the common ioctl/shmfs path.

While we discuss clenups around obj backing storage abstraction: Another
thing worth considering is completing our extraction of the different
types of obj into files: We already have dma-buf, stolen, userptr, and
could extract shmem and phys_obj. Then pull them all together into a
section about gem backing storage types in the docbook.

Should at least allow the next person to see through this maze without
first reading a few thousand git commits ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-09 19:39     ` Dave Gordon
  2015-12-10 11:12       ` Ankitprasad Sharma
@ 2015-12-11 18:15       ` Daniel Vetter
  2015-12-15 16:22         ` Dave Gordon
  1 sibling, 1 reply; 47+ messages in thread
From: Daniel Vetter @ 2015-12-11 18:15 UTC (permalink / raw)
  To: Dave Gordon
  Cc: intel-gfx, shashidhar.hiremath, akash.goel, ankitprasad.r.sharma

On Wed, Dec 09, 2015 at 07:39:56PM +0000, Dave Gordon wrote:
> On 09/12/15 16:15, Tvrtko Ursulin wrote:
> >
> >Hi,
> >
> >On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
> >>From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >>
> >>This patch adds support for extending the pread/pwrite functionality
> >>for objects not backed by shmem. The access will be made through
> >>gtt interface. This will cover objects backed by stolen memory as well
> >>as other non-shmem backed objects.
> >>
> >>v2: Drop locks around slow_user_access, prefault the pages before
> >>access (Chris)
> >>
> >>v3: Rebased to the latest drm-intel-nightly (Ankit)
> >>
> >>v4: Moved page base & offset calculations outside the copy loop,
> >>corrected data types for size and offset variables, corrected if-else
> >>braces format (Tvrtko/kerneldocs)
> >>
> >>v5: Enabled pread/pwrite for all non-shmem backed objects including
> >>without tiling restrictions (Ankit)
> >>
> >>v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >>
> >>v7: Updated commit message, Renamed i915_gem_gtt_read to
> >>i915_gem_gtt_copy,
> >>added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
> >>
> >>v8: Updated v7 commit message, mutex unlock around pwrite slow path for
> >>non-shmem backed objects (Tvrtko)
> >>
> >>Testcase: igt/gem_stolen
> >>
> >>Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >>---
> >>  drivers/gpu/drm/i915/i915_gem.c | 151
> >>+++++++++++++++++++++++++++++++++-------
> >>  1 file changed, 127 insertions(+), 24 deletions(-)
> >>
> >>diff --git a/drivers/gpu/drm/i915/i915_gem.c
> >>b/drivers/gpu/drm/i915/i915_gem.c
> >>index ed97de6..68ed67a 100644
> >>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>@@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int
> >>shmem_page_offset, int page_length,
> >>      return ret ? - EFAULT : 0;
> >>  }
> >>
> >>+static inline uint64_t
> >>+slow_user_access(struct io_mapping *mapping,
> >>+         uint64_t page_base, int page_offset,
> >>+         char __user *user_data,
> >>+         int length, bool pwrite)
> >>+{
> >>+    void __iomem *vaddr_inatomic;
> >>+    void *vaddr;
> >>+    uint64_t unwritten;
> >>+
> >>+    vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> >>+    /* We can use the cpu mem copy function because this is X86. */
> >>+    vaddr = (void __force *)vaddr_inatomic + page_offset;
> >>+    if (pwrite)
> >>+        unwritten = __copy_from_user(vaddr, user_data, length);
> >>+    else
> >>+        unwritten = __copy_to_user(user_data, vaddr, length);
> >>+
> >>+    io_mapping_unmap(vaddr_inatomic);
> >>+    return unwritten;
> >>+}
> >>+
> >>+static int
> >>+i915_gem_gtt_copy(struct drm_device *dev,
> >>+           struct drm_i915_gem_object *obj, uint64_t size,
> >>+           uint64_t data_offset, uint64_t data_ptr)
> >>+{
> >>+    struct drm_i915_private *dev_priv = dev->dev_private;
> >>+    char __user *user_data;
> >>+    uint64_t remain;
> >>+    uint64_t offset, page_base;
> >>+    int page_offset, page_length, ret = 0;
> >>+
> >>+    ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> >>+    if (ret)
> >>+        goto out;
> >>+
> >>+    ret = i915_gem_object_set_to_gtt_domain(obj, false);
> >>+    if (ret)
> >>+        goto out_unpin;
> >>+
> >>+    ret = i915_gem_object_put_fence(obj);
> >>+    if (ret)
> >>+        goto out_unpin;
> >>+
> >>+    user_data = to_user_ptr(data_ptr);
> >>+    remain = size;
> >>+    offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> >>+
> >>+    mutex_unlock(&dev->struct_mutex);
> >>+    if (likely(!i915.prefault_disable))
> >>+        ret = fault_in_multipages_writeable(user_data, remain);
> >>+
> >>+    /*
> >>+     * page_offset = offset within page
> >>+     * page_base = page offset within aperture
> >>+     */
> >>+    page_offset = offset_in_page(offset);
> >>+    page_base = offset & PAGE_MASK;
> >>+
> >>+    while (remain > 0) {
> >>+        /* page_length = bytes to copy for this page */
> >>+        page_length = remain;
> >>+        if ((page_offset + remain) > PAGE_SIZE)
> >>+            page_length = PAGE_SIZE - page_offset;
> >>+
> >>+        /* This is a slow read/write as it tries to read from
> >>+         * and write to user memory which may result into page
> >>+         * faults
> >>+         */
> >>+        ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> >>+                       page_offset, user_data,
> >>+                       page_length, false);
> >>+
> >>+        if (ret) {
> >>+            ret = -EFAULT;
> >>+            break;
> >>+        }
> >>+
> >>+        remain -= page_length;
> >>+        user_data += page_length;
> >>+        page_base += page_length;
> >>+        page_offset = 0;
> >>+    }
> >>+
> >>+    mutex_lock(&dev->struct_mutex);
> >>+
> >>+out_unpin:
> >>+    i915_gem_object_ggtt_unpin(obj);
> >>+out:
> >>+    return ret;
> >>+}
> >>+
> >>  static int
> >>  i915_gem_shmem_pread(struct drm_device *dev,
> >>               struct drm_i915_gem_object *obj,
> >>@@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev,
> >>void *data,
> >>          goto out;
> >>      }
> >>
> >>-    /* prime objects have no backing filp to GEM pread/pwrite
> >>-     * pages from.
> >>-     */
> >>-    if (!obj->base.filp) {
> >>-        ret = -EINVAL;
> >>-        goto out;
> >>-    }
> >>-
> >>      trace_i915_gem_object_pread(obj, args->offset, args->size);
> >>
> >>-    ret = i915_gem_shmem_pread(dev, obj, args, file);
> >>+    /* pread for non shmem backed objects */
> >>+    if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> >>+        ret = i915_gem_gtt_copy(dev, obj, args->size,
> >>+                    args->offset, args->data_ptr);
> >>+    else
> >>+        ret = i915_gem_shmem_pread(dev, obj, args, file);
> >
> >Hm, it will end up calling i915_gem_shmem_pread for non-shmem backed
> >objects if tiling is set. Sounds wrong to me unless I am missing something?
> 
> Which GEM objects have obj->base.filp set? Is it ONLY regular gtt-type

obj->base.filp is for shmem backed stuff. gtt is irrelevant for backing
storage, well except if you can't read the shmem stuff directly with the
cpu the only way is to go through a gtt device mapping.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-12-11 18:15       ` Daniel Vetter
@ 2015-12-15 16:22         ` Dave Gordon
  0 siblings, 0 replies; 47+ messages in thread
From: Dave Gordon @ 2015-12-15 16:22 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 11/12/15 18:15, Daniel Vetter wrote:
> On Wed, Dec 09, 2015 at 07:39:56PM +0000, Dave Gordon wrote:
>> On 09/12/15 16:15, Tvrtko Ursulin wrote:
>>>
>>> Hi,
>>>
>>> On 09/12/15 12:46, ankitprasad.r.sharma@intel.com wrote:
>>>> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>>>
>>>> This patch adds support for extending the pread/pwrite functionality
>>>> for objects not backed by shmem. The access will be made through
>>>> gtt interface. This will cover objects backed by stolen memory as well
>>>> as other non-shmem backed objects.
>>>>
>>>> v2: Drop locks around slow_user_access, prefault the pages before
>>>> access (Chris)
>>>>
>>>> v3: Rebased to the latest drm-intel-nightly (Ankit)
>>>>
>>>> v4: Moved page base & offset calculations outside the copy loop,
>>>> corrected data types for size and offset variables, corrected if-else
>>>> braces format (Tvrtko/kerneldocs)
>>>>
>>>> v5: Enabled pread/pwrite for all non-shmem backed objects including
>>>> without tiling restrictions (Ankit)
>>>>
>>>> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
>>>>
>>>> v7: Updated commit message, Renamed i915_gem_gtt_read to
>>>> i915_gem_gtt_copy,
>>>> added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
>>>>
>>>> v8: Updated v7 commit message, mutex unlock around pwrite slow path for
>>>> non-shmem backed objects (Tvrtko)
>>>>
>>>> Testcase: igt/gem_stolen
>>>>
>>>> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_gem.c | 151
>>>> +++++++++++++++++++++++++++++++++-------
>>>>   1 file changed, 127 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
[snip!]
>>>>   static int
>>>>   i915_gem_shmem_pread(struct drm_device *dev,
>>>>                struct drm_i915_gem_object *obj,
>>>> @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev,
>>>> void *data,
>>>>           goto out;
>>>>       }
>>>>
>>>> -    /* prime objects have no backing filp to GEM pread/pwrite
>>>> -     * pages from.
>>>> -     */
>>>> -    if (!obj->base.filp) {
>>>> -        ret = -EINVAL;
>>>> -        goto out;
>>>> -    }
>>>> -
>>>>       trace_i915_gem_object_pread(obj, args->offset, args->size);
>>>>
>>>> -    ret = i915_gem_shmem_pread(dev, obj, args, file);
>>>> +    /* pread for non shmem backed objects */
>>>> +    if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
>>>> +        ret = i915_gem_gtt_copy(dev, obj, args->size,
>>>> +                    args->offset, args->data_ptr);
>>>> +    else
>>>> +        ret = i915_gem_shmem_pread(dev, obj, args, file);
>>>
>>> Hm, it will end up calling i915_gem_shmem_pread for non-shmem backed
>>> objects if tiling is set. Sounds wrong to me unless I am missing something?
>>
>> Which GEM objects have obj->base.filp set? Is it ONLY regular gtt-type
>
> obj->base.filp is for shmem backed stuff. gtt is irrelevant for backing
> storage, well except if you can't read the shmem stuff directly with the
> cpu the only way is to go through a gtt device mapping.
> -Daniel

So obj->base.filp is set for both phys and shmem (default) object types; 
I called the latter a "gtt" type just because the get/put_pages() 
functions have "gtt" in their names). But I note that the naming of GEM 
object vfuncs (and vfunc tables) isn't consistent :( Maybe they should 
be named "i915_gem_object_{get,put}_pages_shmem()", and the table would 
then be i915_gem_object_shmem_ops :)

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-11-13 17:23   ` Tvrtko Ursulin
@ 2015-11-20  9:30     ` Ankitprasad Sharma
  0 siblings, 0 replies; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-11-20  9:30 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Fri, 2015-11-13 at 17:23 +0000, Tvrtko Ursulin wrote:
> 
> On 11/11/15 10:36, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >
> > This patch adds support for extending the pread/pwrite functionality
> > for objects not backed by shmem. The access will be made through
> > gtt interface. This will cover objects backed by stolen memory as well
> > as other non-shmem backed objects.
> >
> > v2: Drop locks around slow_user_access, prefault the pages before
> > access (Chris)
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: Moved page base & offset calculations outside the copy loop,
> > corrected data types for size and offset variables, corrected if-else
> > braces format (Tvrtko/kerneldocs)
> >
> > v5: Enabled pread/pwrite for all non-shmem backed objects including
> > without tiling restrictions (Ankit)
> >
> > v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >
> > v7: Updated commit message (Tvrtko)
> 
> Since v6 you have also renamed i915_gem_gtt_read to i915_gem_gtt_copy 
> and added the pwrite slow path so the commit should say that.
Yes, I need to update this.
> 
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 146 +++++++++++++++++++++++++++++++++-------
> >   1 file changed, 122 insertions(+), 24 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 2d8c9e0..e0b9502 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
> >   	return ret ? - EFAULT : 0;
> >   }
> >
> > +static inline uint64_t
> > +slow_user_access(struct io_mapping *mapping,
> > +		 uint64_t page_base, int page_offset,
> > +		 char __user *user_data,
> > +		 int length, bool pwrite)
> > +{
> > +	void __iomem *vaddr_inatomic;
> > +	void *vaddr;
> > +	uint64_t unwritten;
> > +
> > +	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> > +	/* We can use the cpu mem copy function because this is X86. */
> > +	vaddr = (void __force *)vaddr_inatomic + page_offset;
> > +	if (pwrite)
> > +		unwritten = __copy_from_user(vaddr, user_data, length);
> > +	else
> > +		unwritten = __copy_to_user(user_data, vaddr, length);
> > +
> > +	io_mapping_unmap(vaddr_inatomic);
> > +	return unwritten;
> > +}
> > +
> > +static int
> > +i915_gem_gtt_copy(struct drm_device *dev,
> > +		   struct drm_i915_gem_object *obj, uint64_t size,
> > +		   uint64_t data_offset, uint64_t data_ptr)
> > +{
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	char __user *user_data;
> > +	uint64_t remain;
> > +	uint64_t offset, page_base;
> > +	int page_offset, page_length, ret = 0;
> > +
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> > +	if (ret)
> > +		goto out;
> > +
> > +	ret = i915_gem_object_set_to_gtt_domain(obj, false);
> > +	if (ret)
> > +		goto out_unpin;
> > +
> > +	ret = i915_gem_object_put_fence(obj);
> > +	if (ret)
> > +		goto out_unpin;
> > +
> > +	user_data = to_user_ptr(data_ptr);
> > +	remain = size;
> > +	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> > +
> > +	mutex_unlock(&dev->struct_mutex);
> > +	if (likely(!i915.prefault_disable))
> > +		ret = fault_in_multipages_writeable(user_data, remain);
> > +
> > +	/*
> > +	 * page_offset = offset within page
> > +	 * page_base = page offset within aperture
> > +	 */
> > +	page_offset = offset_in_page(offset);
> > +	page_base = offset & PAGE_MASK;
> > +
> > +	while (remain > 0) {
> > +		/* page_length = bytes to copy for this page */
> > +		page_length = remain;
> > +		if ((page_offset + remain) > PAGE_SIZE)
> > +			page_length = PAGE_SIZE - page_offset;
> > +
> > +		/* This is a slow read/write as it tries to read from
> > +		 * and write to user memory which may result into page
> > +		 * faults
> > +		 */
> > +		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> > +				       page_offset, user_data,
> > +				       page_length, false);
> > +
> > +		if (ret) {
> > +			ret = -EFAULT;
> > +			break;
> > +		}
> > +
> > +		remain -= page_length;
> > +		user_data += page_length;
> > +		page_base += page_length;
> > +		page_offset = 0;
> > +	}
> > +
> > +	mutex_lock(&dev->struct_mutex);
> > +
> > +out_unpin:
> > +	i915_gem_object_ggtt_unpin(obj);
> > +out:
> > +	return ret;
> > +}
> > +
> >   static int
> >   i915_gem_shmem_pread(struct drm_device *dev,
> >   		     struct drm_i915_gem_object *obj,
> > @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
> >   		goto out;
> >   	}
> >
> > -	/* prime objects have no backing filp to GEM pread/pwrite
> > -	 * pages from.
> > -	 */
> > -	if (!obj->base.filp) {
> > -		ret = -EINVAL;
> > -		goto out;
> > -	}
> > -
> >   	trace_i915_gem_object_pread(obj, args->offset, args->size);
> >
> > -	ret = i915_gem_shmem_pread(dev, obj, args, file);
> > +	/* pread for non shmem backed objects */
> > +	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> > +		ret = i915_gem_gtt_copy(dev, obj, args->size,
> > +					args->offset, args->data_ptr);
> > +	else
> > +		ret = i915_gem_shmem_pread(dev, obj, args, file);
> >
> >   out:
> >   	drm_gem_object_unreference(&obj->base);
> > @@ -793,6 +883,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
> >   	uint64_t remain, offset;
> >   	char __user *user_data;
> >   	int ret;
> > +	bool faulted = false;
> >
> >   	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> >   	if (ret) {
> > @@ -851,11 +942,25 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
> >   		/* If we get a fault while copying data, then (presumably) our
> >   		 * source page isn't available.  Return the error and we'll
> >   		 * retry in the slow path.
> > +		 * If the object is non-shmem backed, we retry again with the
> > +		 * path that handles page fault.
> >   		 */
> > -		if (fast_user_write(i915->gtt.mappable, page_base,
> > -				    page_offset, user_data, page_length)) {
> > -			ret = -EFAULT;
> > -			goto out_flush;
> > +		if (faulted || fast_user_write(i915->gtt.mappable,
> > +						page_base, page_offset,
> > +						user_data, page_length)) {
> > +			if (!obj->base.filp) {
> > +				faulted = true;
> > +				if (slow_user_access(i915->gtt.mappable,
> > +						     page_base,
> > +						     page_offset, user_data,
> > +						     page_length, true)) {
> > +					ret = -EFAULT;
> > +					goto out_flush;
> 
> I have chatted with Chris about this since I wasn't sure if you two were 
> cooking this new code behind the scenes.
> 
> Anyway, it is required to drop the struct_mutex before attempting the 
> slow path.
Yes, I need to drop the mutex before attempting this path.
> 
> Chris suggests a new test case to cover this, doing a pwrite from a gtt 
> mmap which should trigger the locking inversion.
> 
> > +				}
> > +			} else {
> > +				ret = -EFAULT;
> > +				goto out_flush;
> > +			}
> >   		}
> >
> >   		remain -= page_length;
> > @@ -1121,14 +1226,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> >   		goto out;
> >   	}
> >
> > -	/* prime objects have no backing filp to GEM pread/pwrite
> > -	 * pages from.
> > -	 */
> > -	if (!obj->base.filp) {
> > -		ret = -EINVAL;
> > -		goto out;
> > -	}
> > -
> >   	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
> >
> >   	ret = -EFAULT;
> > @@ -1139,8 +1236,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> >   	 * perspective, requiring manual detiling by the client.
> >   	 */
> >   	if (obj->tiling_mode == I915_TILING_NONE &&
> > -	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> > -	    cpu_write_needs_clflush(obj)) {
> > +	    (!obj->base.filp ||
> > +	    (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> > +	    cpu_write_needs_clflush(obj)))) {
> 
> For stolen objects we don't have the same tiling limitation as for 
> normal ones?
No, we have the same tiling restrictions shmem as well as stolen backed
objects.

Thanks,
Ankit



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-11-11 10:36 ` [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem " ankitprasad.r.sharma
@ 2015-11-13 17:23   ` Tvrtko Ursulin
  2015-11-20  9:30     ` Ankitprasad Sharma
  0 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-11-13 17:23 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath



On 11/11/15 10:36, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> This patch adds support for extending the pread/pwrite functionality
> for objects not backed by shmem. The access will be made through
> gtt interface. This will cover objects backed by stolen memory as well
> as other non-shmem backed objects.
>
> v2: Drop locks around slow_user_access, prefault the pages before
> access (Chris)
>
> v3: Rebased to the latest drm-intel-nightly (Ankit)
>
> v4: Moved page base & offset calculations outside the copy loop,
> corrected data types for size and offset variables, corrected if-else
> braces format (Tvrtko/kerneldocs)
>
> v5: Enabled pread/pwrite for all non-shmem backed objects including
> without tiling restrictions (Ankit)
>
> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
>
> v7: Updated commit message (Tvrtko)

Since v6 you have also renamed i915_gem_gtt_read to i915_gem_gtt_copy 
and added the pwrite slow path so the commit should say that.

>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 146 +++++++++++++++++++++++++++++++++-------
>   1 file changed, 122 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 2d8c9e0..e0b9502 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
>   	return ret ? - EFAULT : 0;
>   }
>
> +static inline uint64_t
> +slow_user_access(struct io_mapping *mapping,
> +		 uint64_t page_base, int page_offset,
> +		 char __user *user_data,
> +		 int length, bool pwrite)
> +{
> +	void __iomem *vaddr_inatomic;
> +	void *vaddr;
> +	uint64_t unwritten;
> +
> +	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> +	/* We can use the cpu mem copy function because this is X86. */
> +	vaddr = (void __force *)vaddr_inatomic + page_offset;
> +	if (pwrite)
> +		unwritten = __copy_from_user(vaddr, user_data, length);
> +	else
> +		unwritten = __copy_to_user(user_data, vaddr, length);
> +
> +	io_mapping_unmap(vaddr_inatomic);
> +	return unwritten;
> +}
> +
> +static int
> +i915_gem_gtt_copy(struct drm_device *dev,
> +		   struct drm_i915_gem_object *obj, uint64_t size,
> +		   uint64_t data_offset, uint64_t data_ptr)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	char __user *user_data;
> +	uint64_t remain;
> +	uint64_t offset, page_base;
> +	int page_offset, page_length, ret = 0;
> +
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> +	if (ret)
> +		goto out;
> +
> +	ret = i915_gem_object_set_to_gtt_domain(obj, false);
> +	if (ret)
> +		goto out_unpin;
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto out_unpin;
> +
> +	user_data = to_user_ptr(data_ptr);
> +	remain = size;
> +	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +	if (likely(!i915.prefault_disable))
> +		ret = fault_in_multipages_writeable(user_data, remain);
> +
> +	/*
> +	 * page_offset = offset within page
> +	 * page_base = page offset within aperture
> +	 */
> +	page_offset = offset_in_page(offset);
> +	page_base = offset & PAGE_MASK;
> +
> +	while (remain > 0) {
> +		/* page_length = bytes to copy for this page */
> +		page_length = remain;
> +		if ((page_offset + remain) > PAGE_SIZE)
> +			page_length = PAGE_SIZE - page_offset;
> +
> +		/* This is a slow read/write as it tries to read from
> +		 * and write to user memory which may result into page
> +		 * faults
> +		 */
> +		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> +				       page_offset, user_data,
> +				       page_length, false);
> +
> +		if (ret) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		remain -= page_length;
> +		user_data += page_length;
> +		page_base += page_length;
> +		page_offset = 0;
> +	}
> +
> +	mutex_lock(&dev->struct_mutex);
> +
> +out_unpin:
> +	i915_gem_object_ggtt_unpin(obj);
> +out:
> +	return ret;
> +}
> +
>   static int
>   i915_gem_shmem_pread(struct drm_device *dev,
>   		     struct drm_i915_gem_object *obj,
> @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
>   		goto out;
>   	}
>
> -	/* prime objects have no backing filp to GEM pread/pwrite
> -	 * pages from.
> -	 */
> -	if (!obj->base.filp) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
>   	trace_i915_gem_object_pread(obj, args->offset, args->size);
>
> -	ret = i915_gem_shmem_pread(dev, obj, args, file);
> +	/* pread for non shmem backed objects */
> +	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> +		ret = i915_gem_gtt_copy(dev, obj, args->size,
> +					args->offset, args->data_ptr);
> +	else
> +		ret = i915_gem_shmem_pread(dev, obj, args, file);
>
>   out:
>   	drm_gem_object_unreference(&obj->base);
> @@ -793,6 +883,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
>   	uint64_t remain, offset;
>   	char __user *user_data;
>   	int ret;
> +	bool faulted = false;
>
>   	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
>   	if (ret) {
> @@ -851,11 +942,25 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
>   		/* If we get a fault while copying data, then (presumably) our
>   		 * source page isn't available.  Return the error and we'll
>   		 * retry in the slow path.
> +		 * If the object is non-shmem backed, we retry again with the
> +		 * path that handles page fault.
>   		 */
> -		if (fast_user_write(i915->gtt.mappable, page_base,
> -				    page_offset, user_data, page_length)) {
> -			ret = -EFAULT;
> -			goto out_flush;
> +		if (faulted || fast_user_write(i915->gtt.mappable,
> +						page_base, page_offset,
> +						user_data, page_length)) {
> +			if (!obj->base.filp) {
> +				faulted = true;
> +				if (slow_user_access(i915->gtt.mappable,
> +						     page_base,
> +						     page_offset, user_data,
> +						     page_length, true)) {
> +					ret = -EFAULT;
> +					goto out_flush;

I have chatted with Chris about this since I wasn't sure if you two were 
cooking this new code behind the scenes.

Anyway, it is required to drop the struct_mutex before attempting the 
slow path.

Chris suggests a new test case to cover this, doing a pwrite from a gtt 
mmap which should trigger the locking inversion.

> +				}
> +			} else {
> +				ret = -EFAULT;
> +				goto out_flush;
> +			}
>   		}
>
>   		remain -= page_length;
> @@ -1121,14 +1226,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>   		goto out;
>   	}
>
> -	/* prime objects have no backing filp to GEM pread/pwrite
> -	 * pages from.
> -	 */
> -	if (!obj->base.filp) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
>   	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
>
>   	ret = -EFAULT;
> @@ -1139,8 +1236,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>   	 * perspective, requiring manual detiling by the client.
>   	 */
>   	if (obj->tiling_mode == I915_TILING_NONE &&
> -	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> -	    cpu_write_needs_clflush(obj)) {
> +	    (!obj->base.filp ||
> +	    (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> +	    cpu_write_needs_clflush(obj)))) {

For stolen objects we don't have the same tiling limitation as for 
normal ones?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-11-11 10:36 [PATCH v9 0/6] Support for creating/using Stolen memory backed objects ankitprasad.r.sharma
@ 2015-11-11 10:36 ` ankitprasad.r.sharma
  2015-11-13 17:23   ` Tvrtko Ursulin
  0 siblings, 1 reply; 47+ messages in thread
From: ankitprasad.r.sharma @ 2015-11-11 10:36 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ankitprasad Sharma, akash.goel, shashidhar.hiremath

From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.

v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)

v3: Rebased to the latest drm-intel-nightly (Ankit)

v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)

v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)

v6: Using pwrite_fast for non-shmem backed objects as well (Chris)

v7: Updated commit message (Tvrtko)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 146 +++++++++++++++++++++++++++++++++-------
 1 file changed, 122 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2d8c9e0..e0b9502 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
 	return ret ? - EFAULT : 0;
 }
 
+static inline uint64_t
+slow_user_access(struct io_mapping *mapping,
+		 uint64_t page_base, int page_offset,
+		 char __user *user_data,
+		 int length, bool pwrite)
+{
+	void __iomem *vaddr_inatomic;
+	void *vaddr;
+	uint64_t unwritten;
+
+	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
+	/* We can use the cpu mem copy function because this is X86. */
+	vaddr = (void __force *)vaddr_inatomic + page_offset;
+	if (pwrite)
+		unwritten = __copy_from_user(vaddr, user_data, length);
+	else
+		unwritten = __copy_to_user(user_data, vaddr, length);
+
+	io_mapping_unmap(vaddr_inatomic);
+	return unwritten;
+}
+
+static int
+i915_gem_gtt_copy(struct drm_device *dev,
+		   struct drm_i915_gem_object *obj, uint64_t size,
+		   uint64_t data_offset, uint64_t data_ptr)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	char __user *user_data;
+	uint64_t remain;
+	uint64_t offset, page_base;
+	int page_offset, page_length, ret = 0;
+
+	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
+	if (ret)
+		goto out;
+
+	ret = i915_gem_object_set_to_gtt_domain(obj, false);
+	if (ret)
+		goto out_unpin;
+
+	ret = i915_gem_object_put_fence(obj);
+	if (ret)
+		goto out_unpin;
+
+	user_data = to_user_ptr(data_ptr);
+	remain = size;
+	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
+
+	mutex_unlock(&dev->struct_mutex);
+	if (likely(!i915.prefault_disable))
+		ret = fault_in_multipages_writeable(user_data, remain);
+
+	/*
+	 * page_offset = offset within page
+	 * page_base = page offset within aperture
+	 */
+	page_offset = offset_in_page(offset);
+	page_base = offset & PAGE_MASK;
+
+	while (remain > 0) {
+		/* page_length = bytes to copy for this page */
+		page_length = remain;
+		if ((page_offset + remain) > PAGE_SIZE)
+			page_length = PAGE_SIZE - page_offset;
+
+		/* This is a slow read/write as it tries to read from
+		 * and write to user memory which may result into page
+		 * faults
+		 */
+		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
+				       page_offset, user_data,
+				       page_length, false);
+
+		if (ret) {
+			ret = -EFAULT;
+			break;
+		}
+
+		remain -= page_length;
+		user_data += page_length;
+		page_base += page_length;
+		page_offset = 0;
+	}
+
+	mutex_lock(&dev->struct_mutex);
+
+out_unpin:
+	i915_gem_object_ggtt_unpin(obj);
+out:
+	return ret;
+}
+
 static int
 i915_gem_shmem_pread(struct drm_device *dev,
 		     struct drm_i915_gem_object *obj,
@@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	/* prime objects have no backing filp to GEM pread/pwrite
-	 * pages from.
-	 */
-	if (!obj->base.filp) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	trace_i915_gem_object_pread(obj, args->offset, args->size);
 
-	ret = i915_gem_shmem_pread(dev, obj, args, file);
+	/* pread for non shmem backed objects */
+	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
+		ret = i915_gem_gtt_copy(dev, obj, args->size,
+					args->offset, args->data_ptr);
+	else
+		ret = i915_gem_shmem_pread(dev, obj, args, file);
 
 out:
 	drm_gem_object_unreference(&obj->base);
@@ -793,6 +883,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 	uint64_t remain, offset;
 	char __user *user_data;
 	int ret;
+	bool faulted = false;
 
 	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
 	if (ret) {
@@ -851,11 +942,25 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_private *i915,
 		/* If we get a fault while copying data, then (presumably) our
 		 * source page isn't available.  Return the error and we'll
 		 * retry in the slow path.
+		 * If the object is non-shmem backed, we retry again with the
+		 * path that handles page fault.
 		 */
-		if (fast_user_write(i915->gtt.mappable, page_base,
-				    page_offset, user_data, page_length)) {
-			ret = -EFAULT;
-			goto out_flush;
+		if (faulted || fast_user_write(i915->gtt.mappable,
+						page_base, page_offset,
+						user_data, page_length)) {
+			if (!obj->base.filp) {
+				faulted = true;
+				if (slow_user_access(i915->gtt.mappable,
+						     page_base,
+						     page_offset, user_data,
+						     page_length, true)) {
+					ret = -EFAULT;
+					goto out_flush;
+				}
+			} else {
+				ret = -EFAULT;
+				goto out_flush;
+			}
 		}
 
 		remain -= page_length;
@@ -1121,14 +1226,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	/* prime objects have no backing filp to GEM pread/pwrite
-	 * pages from.
-	 */
-	if (!obj->base.filp) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
 
 	ret = -EFAULT;
@@ -1139,8 +1236,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	 * perspective, requiring manual detiling by the client.
 	 */
 	if (obj->tiling_mode == I915_TILING_NONE &&
-	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
-	    cpu_write_needs_clflush(obj)) {
+	    (!obj->base.filp ||
+	    (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
+	    cpu_write_needs_clflush(obj)))) {
 		ret = i915_gem_gtt_pwrite_fast(dev_priv, obj, args, file);
 		/* Note that the gtt paths might fail with non-page-backed user
 		 * pointers (e.g. gtt mappings when moving data between
@@ -1150,7 +1248,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	if (ret == -EFAULT || ret == -ENOSPC) {
 		if (obj->phys_handle)
 			ret = i915_gem_phys_pwrite(obj, args, file);
-		else
+		else if (obj->base.filp)
 			ret = i915_gem_shmem_pwrite(dev, obj, args, file);
 	}
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-10-08 13:56   ` Tvrtko Ursulin
@ 2015-10-28 11:18     ` Ankitprasad Sharma
  0 siblings, 0 replies; 47+ messages in thread
From: Ankitprasad Sharma @ 2015-10-28 11:18 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx, akash.goel, shashidhar.hiremath

On Thu, 2015-10-08 at 14:56 +0100, Tvrtko Ursulin wrote:
> Hi,
> 
> On 08/10/15 07:24, ankitprasad.r.sharma@intel.com wrote:
> > From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> >
> > This patch adds support for extending the pread/pwrite functionality
> > for objects not backed by shmem. The access will be made through
> > gtt interface.
> > This will cover prime objects as well as stolen memory backed objects
> > but for userptr objects it is still forbidden.
> 
> Where is the part which forbids it for userptr objects?
In version 5, updated the patch handle pwrite/pread for all non-shmem
backed objects, including userptr objects

Will update the Commit message
> 
> > v2: Drop locks around slow_user_access, prefault the pages before
> > access (Chris)
> >
> > v3: Rebased to the latest drm-intel-nightly (Ankit)
> >
> > v4: Moved page base & offset calculations outside the copy loop,
> > corrected data types for size and offset variables, corrected if-else
> > braces format (Tvrtko/kerneldocs)
> >
> > v5: Enabled pread/pwrite for all non-shmem backed objects including
> > without tiling restrictions (Ankit)
> >
> > v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
> >
> > Testcase: igt/gem_stolen
> >
> > Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 125 +++++++++++++++++++++++++++++++++-------
> >   1 file changed, 104 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 91a2e97..2c94e22 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
> >   	return ret ? - EFAULT : 0;
> >   }
> >
> > +static inline uint64_t
> > +slow_user_access(struct io_mapping *mapping,
> > +		 uint64_t page_base, int page_offset,
> > +		 char __user *user_data,
> > +		 int length, bool pwrite)
> > +{
> > +	void __iomem *vaddr_inatomic;
> > +	void *vaddr;
> > +	uint64_t unwritten;
> > +
> > +	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> > +	/* We can use the cpu mem copy function because this is X86. */
> > +	vaddr = (void __force *)vaddr_inatomic + page_offset;
> > +	if (pwrite)
> > +		unwritten = __copy_from_user(vaddr, user_data, length);
> > +	else
> > +		unwritten = __copy_to_user(user_data, vaddr, length);
> > +
> > +	io_mapping_unmap(vaddr_inatomic);
> > +	return unwritten;
> > +}
> > +
> > +static int
> > +i915_gem_gtt_pread(struct drm_device *dev,
> > +		   struct drm_i915_gem_object *obj, uint64_t size,
> > +		   uint64_t data_offset, uint64_t data_ptr)
> > +{
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	char __user *user_data;
> > +	uint64_t remain;
> > +	uint64_t offset, page_base;
> > +	int page_offset, page_length, ret = 0;
> > +
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> > +	if (ret)
> > +		goto out;
> > +
> > +	ret = i915_gem_object_set_to_gtt_domain(obj, false);
> > +	if (ret)
> > +		goto out_unpin;
> > +
> > +	ret = i915_gem_object_put_fence(obj);
> > +	if (ret)
> > +		goto out_unpin;
> > +
> > +	user_data = to_user_ptr(data_ptr);
> > +	remain = size;
> > +	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> > +
> > +	mutex_unlock(&dev->struct_mutex);
> > +	if (likely(!i915.prefault_disable))
> > +		ret = fault_in_multipages_writeable(user_data, remain);
> > +
> > +	/*
> > +	 * page_offset = offset within page
> > +	 * page_base = page offset within aperture
> > +	 */
> > +	page_offset = offset_in_page(offset);
> > +	page_base = offset & PAGE_MASK;
> > +
> > +	while (remain > 0) {
> > +		/* page_length = bytes to copy for this page */
> > +		page_length = remain;
> > +		if ((page_offset + remain) > PAGE_SIZE)
> > +			page_length = PAGE_SIZE - page_offset;
> > +
> > +		/* This is a slow read/write as it tries to read from
> > +		 * and write to user memory which may result into page
> > +		 * faults
> > +		 */
> > +		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> > +				       page_offset, user_data,
> > +				       page_length, false);
> > +
> > +		if (ret) {
> > +			ret = -EFAULT;
> > +			break;
> > +		}
> > +
> > +		remain -= page_length;
> > +		user_data += page_length;
> > +		page_base += page_length;
> > +		page_offset = 0;
> > +	}
> > +
> > +	mutex_lock(&dev->struct_mutex);
> > +
> > +out_unpin:
> > +	i915_gem_object_ggtt_unpin(obj);
> > +out:
> > +	return ret;
> > +}
> > +
> >   static int
> >   i915_gem_shmem_pread(struct drm_device *dev,
> >   		     struct drm_i915_gem_object *obj,
> > @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
> >   		goto out;
> >   	}
> >
> > -	/* prime objects have no backing filp to GEM pread/pwrite
> > -	 * pages from.
> > -	 */
> > -	if (!obj->base.filp) {
> > -		ret = -EINVAL;
> > -		goto out;
> > -	}
> > -
> >   	trace_i915_gem_object_pread(obj, args->offset, args->size);
> >
> > -	ret = i915_gem_shmem_pread(dev, obj, args, file);
> > +	/* pread for non shmem backed objects */
> > +	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> > +		ret = i915_gem_gtt_pread(dev, obj, args->size,
> > +					 args->offset, args->data_ptr);
> > +	else
> > +		ret = i915_gem_shmem_pread(dev, obj, args, file);
> >
> >   out:
> >   	drm_gem_object_unreference(&obj->base);
> > @@ -795,7 +885,7 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
> >   	char __user *user_data;
> >   	int page_offset, page_length, ret;
> >
> > -	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> > +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> 
> Why is this needed?
This was Chris' suggestion. This change can go as a separate patch, if
needed. I do not think pwrite/pread has any dependency on this.
Need Chris to respond on this.
> 
> >   	if (ret)
> >   		goto out;
> >
> > @@ -1090,14 +1180,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> >   		goto out;
> >   	}
> >
> > -	/* prime objects have no backing filp to GEM pread/pwrite
> > -	 * pages from.
> > -	 */
> > -	if (!obj->base.filp) {
> > -		ret = -EINVAL;
> > -		goto out;
> > -	}
> > -
> >   	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
> >
> >   	ret = -EFAULT;
> > @@ -1108,8 +1190,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> >   	 * perspective, requiring manual detiling by the client.
> >   	 */
> >   	if (obj->tiling_mode == I915_TILING_NONE &&
> > -	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> > -	    cpu_write_needs_clflush(obj)) {
> > +	    (!obj->base.filp ||
> > +	    (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> > +	    cpu_write_needs_clflush(obj)))) {
> >   		ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file);
> 
> So the pwrite path will fail if a page fault happens, as opposed to the 
> pread path which makes an effort to handle it. What is the reason for 
> this asymmetry in the API? Or I am missing something?
I had earlier implemented the pwrite and pread maintaining the symmetry
in the API. After couple of revisions we landed on this implementation.
Need Chris to respond on this.


Thanks,
Ankit

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-10-08  6:24 ` [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem " ankitprasad.r.sharma
@ 2015-10-08 13:56   ` Tvrtko Ursulin
  2015-10-28 11:18     ` Ankitprasad Sharma
  0 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2015-10-08 13:56 UTC (permalink / raw)
  To: ankitprasad.r.sharma, intel-gfx; +Cc: akash.goel, shashidhar.hiremath


Hi,

On 08/10/15 07:24, ankitprasad.r.sharma@intel.com wrote:
> From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
>
> This patch adds support for extending the pread/pwrite functionality
> for objects not backed by shmem. The access will be made through
> gtt interface.
> This will cover prime objects as well as stolen memory backed objects
> but for userptr objects it is still forbidden.

Where is the part which forbids it for userptr objects?

> v2: Drop locks around slow_user_access, prefault the pages before
> access (Chris)
>
> v3: Rebased to the latest drm-intel-nightly (Ankit)
>
> v4: Moved page base & offset calculations outside the copy loop,
> corrected data types for size and offset variables, corrected if-else
> braces format (Tvrtko/kerneldocs)
>
> v5: Enabled pread/pwrite for all non-shmem backed objects including
> without tiling restrictions (Ankit)
>
> v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
>
> Testcase: igt/gem_stolen
>
> Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 125 +++++++++++++++++++++++++++++++++-------
>   1 file changed, 104 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 91a2e97..2c94e22 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
>   	return ret ? - EFAULT : 0;
>   }
>
> +static inline uint64_t
> +slow_user_access(struct io_mapping *mapping,
> +		 uint64_t page_base, int page_offset,
> +		 char __user *user_data,
> +		 int length, bool pwrite)
> +{
> +	void __iomem *vaddr_inatomic;
> +	void *vaddr;
> +	uint64_t unwritten;
> +
> +	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
> +	/* We can use the cpu mem copy function because this is X86. */
> +	vaddr = (void __force *)vaddr_inatomic + page_offset;
> +	if (pwrite)
> +		unwritten = __copy_from_user(vaddr, user_data, length);
> +	else
> +		unwritten = __copy_to_user(user_data, vaddr, length);
> +
> +	io_mapping_unmap(vaddr_inatomic);
> +	return unwritten;
> +}
> +
> +static int
> +i915_gem_gtt_pread(struct drm_device *dev,
> +		   struct drm_i915_gem_object *obj, uint64_t size,
> +		   uint64_t data_offset, uint64_t data_ptr)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	char __user *user_data;
> +	uint64_t remain;
> +	uint64_t offset, page_base;
> +	int page_offset, page_length, ret = 0;
> +
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> +	if (ret)
> +		goto out;
> +
> +	ret = i915_gem_object_set_to_gtt_domain(obj, false);
> +	if (ret)
> +		goto out_unpin;
> +
> +	ret = i915_gem_object_put_fence(obj);
> +	if (ret)
> +		goto out_unpin;
> +
> +	user_data = to_user_ptr(data_ptr);
> +	remain = size;
> +	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> +
> +	mutex_unlock(&dev->struct_mutex);
> +	if (likely(!i915.prefault_disable))
> +		ret = fault_in_multipages_writeable(user_data, remain);
> +
> +	/*
> +	 * page_offset = offset within page
> +	 * page_base = page offset within aperture
> +	 */
> +	page_offset = offset_in_page(offset);
> +	page_base = offset & PAGE_MASK;
> +
> +	while (remain > 0) {
> +		/* page_length = bytes to copy for this page */
> +		page_length = remain;
> +		if ((page_offset + remain) > PAGE_SIZE)
> +			page_length = PAGE_SIZE - page_offset;
> +
> +		/* This is a slow read/write as it tries to read from
> +		 * and write to user memory which may result into page
> +		 * faults
> +		 */
> +		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> +				       page_offset, user_data,
> +				       page_length, false);
> +
> +		if (ret) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		remain -= page_length;
> +		user_data += page_length;
> +		page_base += page_length;
> +		page_offset = 0;
> +	}
> +
> +	mutex_lock(&dev->struct_mutex);
> +
> +out_unpin:
> +	i915_gem_object_ggtt_unpin(obj);
> +out:
> +	return ret;
> +}
> +
>   static int
>   i915_gem_shmem_pread(struct drm_device *dev,
>   		     struct drm_i915_gem_object *obj,
> @@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
>   		goto out;
>   	}
>
> -	/* prime objects have no backing filp to GEM pread/pwrite
> -	 * pages from.
> -	 */
> -	if (!obj->base.filp) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
>   	trace_i915_gem_object_pread(obj, args->offset, args->size);
>
> -	ret = i915_gem_shmem_pread(dev, obj, args, file);
> +	/* pread for non shmem backed objects */
> +	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
> +		ret = i915_gem_gtt_pread(dev, obj, args->size,
> +					 args->offset, args->data_ptr);
> +	else
> +		ret = i915_gem_shmem_pread(dev, obj, args, file);
>
>   out:
>   	drm_gem_object_unreference(&obj->base);
> @@ -795,7 +885,7 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
>   	char __user *user_data;
>   	int page_offset, page_length, ret;
>
> -	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
> +	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);

Why is this needed?

>   	if (ret)
>   		goto out;
>
> @@ -1090,14 +1180,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>   		goto out;
>   	}
>
> -	/* prime objects have no backing filp to GEM pread/pwrite
> -	 * pages from.
> -	 */
> -	if (!obj->base.filp) {
> -		ret = -EINVAL;
> -		goto out;
> -	}
> -
>   	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
>
>   	ret = -EFAULT;
> @@ -1108,8 +1190,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>   	 * perspective, requiring manual detiling by the client.
>   	 */
>   	if (obj->tiling_mode == I915_TILING_NONE &&
> -	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> -	    cpu_write_needs_clflush(obj)) {
> +	    (!obj->base.filp ||
> +	    (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> +	    cpu_write_needs_clflush(obj)))) {
>   		ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file);

So the pwrite path will fail if a page fault happens, as opposed to the 
pread path which makes an effort to handle it. What is the reason for 
this asymmetry in the API? Or I am missing something?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects
  2015-10-08  6:24 [PATCH v8 0/6] Support for creating/using Stolen memory " ankitprasad.r.sharma
@ 2015-10-08  6:24 ` ankitprasad.r.sharma
  2015-10-08 13:56   ` Tvrtko Ursulin
  0 siblings, 1 reply; 47+ messages in thread
From: ankitprasad.r.sharma @ 2015-10-08  6:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ankitprasad Sharma, akash.goel, shashidhar.hiremath

From: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>

This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface.
This will cover prime objects as well as stolen memory backed objects
but for userptr objects it is still forbidden.

v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)

v3: Rebased to the latest drm-intel-nightly (Ankit)

v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)

v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)

v6: Using pwrite_fast for non-shmem backed objects as well (Chris)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 125 +++++++++++++++++++++++++++++++++-------
 1 file changed, 104 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 91a2e97..2c94e22 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -614,6 +614,99 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length,
 	return ret ? - EFAULT : 0;
 }
 
+static inline uint64_t
+slow_user_access(struct io_mapping *mapping,
+		 uint64_t page_base, int page_offset,
+		 char __user *user_data,
+		 int length, bool pwrite)
+{
+	void __iomem *vaddr_inatomic;
+	void *vaddr;
+	uint64_t unwritten;
+
+	vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
+	/* We can use the cpu mem copy function because this is X86. */
+	vaddr = (void __force *)vaddr_inatomic + page_offset;
+	if (pwrite)
+		unwritten = __copy_from_user(vaddr, user_data, length);
+	else
+		unwritten = __copy_to_user(user_data, vaddr, length);
+
+	io_mapping_unmap(vaddr_inatomic);
+	return unwritten;
+}
+
+static int
+i915_gem_gtt_pread(struct drm_device *dev,
+		   struct drm_i915_gem_object *obj, uint64_t size,
+		   uint64_t data_offset, uint64_t data_ptr)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	char __user *user_data;
+	uint64_t remain;
+	uint64_t offset, page_base;
+	int page_offset, page_length, ret = 0;
+
+	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
+	if (ret)
+		goto out;
+
+	ret = i915_gem_object_set_to_gtt_domain(obj, false);
+	if (ret)
+		goto out_unpin;
+
+	ret = i915_gem_object_put_fence(obj);
+	if (ret)
+		goto out_unpin;
+
+	user_data = to_user_ptr(data_ptr);
+	remain = size;
+	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
+
+	mutex_unlock(&dev->struct_mutex);
+	if (likely(!i915.prefault_disable))
+		ret = fault_in_multipages_writeable(user_data, remain);
+
+	/*
+	 * page_offset = offset within page
+	 * page_base = page offset within aperture
+	 */
+	page_offset = offset_in_page(offset);
+	page_base = offset & PAGE_MASK;
+
+	while (remain > 0) {
+		/* page_length = bytes to copy for this page */
+		page_length = remain;
+		if ((page_offset + remain) > PAGE_SIZE)
+			page_length = PAGE_SIZE - page_offset;
+
+		/* This is a slow read/write as it tries to read from
+		 * and write to user memory which may result into page
+		 * faults
+		 */
+		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
+				       page_offset, user_data,
+				       page_length, false);
+
+		if (ret) {
+			ret = -EFAULT;
+			break;
+		}
+
+		remain -= page_length;
+		user_data += page_length;
+		page_base += page_length;
+		page_offset = 0;
+	}
+
+	mutex_lock(&dev->struct_mutex);
+
+out_unpin:
+	i915_gem_object_ggtt_unpin(obj);
+out:
+	return ret;
+}
+
 static int
 i915_gem_shmem_pread(struct drm_device *dev,
 		     struct drm_i915_gem_object *obj,
@@ -737,17 +830,14 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	/* prime objects have no backing filp to GEM pread/pwrite
-	 * pages from.
-	 */
-	if (!obj->base.filp) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	trace_i915_gem_object_pread(obj, args->offset, args->size);
 
-	ret = i915_gem_shmem_pread(dev, obj, args, file);
+	/* pread for non shmem backed objects */
+	if (!obj->base.filp && obj->tiling_mode == I915_TILING_NONE)
+		ret = i915_gem_gtt_pread(dev, obj, args->size,
+					 args->offset, args->data_ptr);
+	else
+		ret = i915_gem_shmem_pread(dev, obj, args, file);
 
 out:
 	drm_gem_object_unreference(&obj->base);
@@ -795,7 +885,7 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 	char __user *user_data;
 	int page_offset, page_length, ret;
 
-	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE | PIN_NONBLOCK);
+	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
 	if (ret)
 		goto out;
 
@@ -1090,14 +1180,6 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 		goto out;
 	}
 
-	/* prime objects have no backing filp to GEM pread/pwrite
-	 * pages from.
-	 */
-	if (!obj->base.filp) {
-		ret = -EINVAL;
-		goto out;
-	}
-
 	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
 
 	ret = -EFAULT;
@@ -1108,8 +1190,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	 * perspective, requiring manual detiling by the client.
 	 */
 	if (obj->tiling_mode == I915_TILING_NONE &&
-	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
-	    cpu_write_needs_clflush(obj)) {
+	    (!obj->base.filp ||
+	    (obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
+	    cpu_write_needs_clflush(obj)))) {
 		ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file);
 		/* Note that the gtt paths might fail with non-page-backed user
 		 * pointers (e.g. gtt mappings when moving data between
@@ -1119,7 +1202,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	if (ret == -EFAULT || ret == -ENOSPC) {
 		if (obj->phys_handle)
 			ret = i915_gem_phys_pwrite(obj, args, file);
-		else
+		else if (obj->base.filp)
 			ret = i915_gem_shmem_pwrite(dev, obj, args, file);
 	}
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2015-12-15 16:22 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-09 12:46 [PATCH v10 0/6] Support for creating/using Stolen memory backed objects ankitprasad.r.sharma
2015-12-09 12:46 ` [PATCH 1/6] drm/i915: Clearing buffer objects via CPU/GTT ankitprasad.r.sharma
2015-12-09 13:26   ` Dave Gordon
2015-12-10 10:02     ` Ankitprasad Sharma
2015-12-09 13:30   ` Tvrtko Ursulin
2015-12-09 13:57   ` Tvrtko Ursulin
2015-12-10 10:23     ` Ankitprasad Sharma
2015-12-09 13:57   ` Chris Wilson
2015-12-10 10:27     ` Ankitprasad Sharma
2015-12-09 12:46 ` [PATCH 2/6] drm/i915: Support for creating Stolen memory backed objects ankitprasad.r.sharma
2015-12-09 14:06   ` Tvrtko Ursulin
2015-12-11 11:22     ` Ankitprasad Sharma
2015-12-11 12:19       ` Tvrtko Ursulin
2015-12-11 12:49         ` Dave Gordon
2015-12-11 18:13           ` Daniel Vetter
2015-12-09 12:46 ` [PATCH 3/6] drm/i915: Propagating correct error codes to the userspace ankitprasad.r.sharma
2015-12-09 15:10   ` Tvrtko Ursulin
2015-12-09 12:46 ` [PATCH 4/6] drm/i915: Add support for stealing purgable stolen pages ankitprasad.r.sharma
2015-12-09 15:40   ` Tvrtko Ursulin
2015-12-09 12:46 ` [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem backed objects ankitprasad.r.sharma
2015-12-09 16:15   ` Tvrtko Ursulin
2015-12-09 19:39     ` Dave Gordon
2015-12-10 11:12       ` Ankitprasad Sharma
2015-12-10 18:18         ` Dave Gordon
2015-12-11  5:22           ` Ankitprasad Sharma
2015-12-11 18:15       ` Daniel Vetter
2015-12-15 16:22         ` Dave Gordon
2015-12-10 10:54     ` Ankitprasad Sharma
2015-12-10 11:00       ` Ankitprasad Sharma
2015-12-09 12:46 ` [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation ankitprasad.r.sharma
2015-12-09 17:25   ` Tvrtko Ursulin
2015-12-09 19:24     ` Ville Syrjälä
2015-12-10 13:17     ` Ankitprasad Sharma
2015-12-09 19:35   ` Dave Gordon
2015-12-10  9:43   ` Tvrtko Ursulin
2015-12-10 13:17     ` Ankitprasad Sharma
2015-12-10 14:15       ` Tvrtko Ursulin
2015-12-10 18:00         ` Dave Gordon
2015-12-11  5:19           ` Ankitprasad Sharma
2015-12-11  5:16         ` Ankitprasad Sharma
2015-12-11 12:33           ` Tvrtko Ursulin
  -- strict thread matches above, loose matches on Subject: below --
2015-11-11 10:36 [PATCH v9 0/6] Support for creating/using Stolen memory backed objects ankitprasad.r.sharma
2015-11-11 10:36 ` [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem " ankitprasad.r.sharma
2015-11-13 17:23   ` Tvrtko Ursulin
2015-11-20  9:30     ` Ankitprasad Sharma
2015-10-08  6:24 [PATCH v8 0/6] Support for creating/using Stolen memory " ankitprasad.r.sharma
2015-10-08  6:24 ` [PATCH 5/6] drm/i915: Support for pread/pwrite from/to non shmem " ankitprasad.r.sharma
2015-10-08 13:56   ` Tvrtko Ursulin
2015-10-28 11:18     ` Ankitprasad Sharma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.