All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] pwrite/pread rework
@ 2011-11-06 19:13 ` Daniel Vetter
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

Hi all,

I've finally managed to clean up my pwrite/pread rework. Major changes since the
first submission:
- fixed a bunch of bugs, some of them discovered by the intel-gpu-tools gem test
  suite
- extracted the clflush helper into common drm code as suggested by Chris
  Wilson.
- extended the prefault helpers in pagemap.h instead of rolling our own. Also
  add a patch to not prefault for writes (which change the userspace memory)
  before we're committed to the write. Both address issues raised by Keith
  Packard.

Why this is cool stuff:
- fixes the spurious -EFAULTS when handing in a pointer to a gem gtt mappings.
  See https://bugs.freedesktop.org/show_bug.cgi?id=38115
- Kills a bunch of code by dropping the practically useless ranged cpu read
  domain tracking and integrating the slow paths into the fast paths.
- Fixes pwrite/pread to use snooped cpu access on llc machines, hereby decently
  speeding up microbenchmarks.
- Paves the way for further clever tricks to reduce pressure on the mappable gtt
  area.

Reviews and comments highly welcome.

Yours, Daniel

Daniel Vetter (13):
  drm/i915: fall through pwrite_gtt_slow to the shmem slow path
  drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
  drm/i915: rewrite shmem_pread_slow to use copy_to_user
  drm/i915: merge shmem_pwrite slow&fast-path
  drm/i915: merge shmem_pread slow&fast-path
  drm: add helper to clflush a virtual address range
  drm/i915: move clflushing into shmem_pread
  drm/i915: kill ranged cpu read domain support
  drm/i915: don't use gtt_pwrite on LLC cached objects
  drm/i915: don't call shmem_read_mapping unnecessarily
  mm: extend prefault helpers to fault in more than PAGE_SIZE
  drm/i915: drop gtt slowpath
  drm/i915: don't clobber userspace memory before commiting to the
    pread

 drivers/gpu/drm/drm_cache.c     |   23 ++
 drivers/gpu/drm/i915/i915_drv.h |    7 -
 drivers/gpu/drm/i915/i915_gem.c |  824 +++++++++++----------------------------
 include/drm/drmP.h              |    1 +
 include/linux/pagemap.h         |   28 +-
 5 files changed, 273 insertions(+), 610 deletions(-)

-- 
1.7.6.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 00/13] pwrite/pread rework
@ 2011-11-06 19:13 ` Daniel Vetter
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, linux-kernel, dri-devel

Hi all,

I've finally managed to clean up my pwrite/pread rework. Major changes since the
first submission:
- fixed a bunch of bugs, some of them discovered by the intel-gpu-tools gem test
  suite
- extracted the clflush helper into common drm code as suggested by Chris
  Wilson.
- extended the prefault helpers in pagemap.h instead of rolling our own. Also
  add a patch to not prefault for writes (which change the userspace memory)
  before we're committed to the write. Both address issues raised by Keith
  Packard.

Why this is cool stuff:
- fixes the spurious -EFAULTS when handing in a pointer to a gem gtt mappings.
  See https://bugs.freedesktop.org/show_bug.cgi?id=38115
- Kills a bunch of code by dropping the practically useless ranged cpu read
  domain tracking and integrating the slow paths into the fast paths.
- Fixes pwrite/pread to use snooped cpu access on llc machines, hereby decently
  speeding up microbenchmarks.
- Paves the way for further clever tricks to reduce pressure on the mappable gtt
  area.

Reviews and comments highly welcome.

Yours, Daniel

Daniel Vetter (13):
  drm/i915: fall through pwrite_gtt_slow to the shmem slow path
  drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
  drm/i915: rewrite shmem_pread_slow to use copy_to_user
  drm/i915: merge shmem_pwrite slow&fast-path
  drm/i915: merge shmem_pread slow&fast-path
  drm: add helper to clflush a virtual address range
  drm/i915: move clflushing into shmem_pread
  drm/i915: kill ranged cpu read domain support
  drm/i915: don't use gtt_pwrite on LLC cached objects
  drm/i915: don't call shmem_read_mapping unnecessarily
  mm: extend prefault helpers to fault in more than PAGE_SIZE
  drm/i915: drop gtt slowpath
  drm/i915: don't clobber userspace memory before commiting to the
    pread

 drivers/gpu/drm/drm_cache.c     |   23 ++
 drivers/gpu/drm/i915/i915_drv.h |    7 -
 drivers/gpu/drm/i915/i915_gem.c |  824 +++++++++++----------------------------
 include/drm/drmP.h              |    1 +
 include/linux/pagemap.h         |   28 +-
 5 files changed, 273 insertions(+), 610 deletions(-)

-- 
1.7.6.4

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/13] drm/i915: fall through pwrite_gtt_slow to the shmem slow path
  2011-11-06 19:13 ` Daniel Vetter
@ 2011-11-06 19:13   ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

The gtt_pwrite slowpath grabs the userspace memory with
get_user_pages. This will not work for non-page backed memory, like a
gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
-EFAULT in the gtt paths.

Now the shmem paths have exactly the same problem, but this way we
only need to rearrange the code in one write path.

v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a838597..6fa24bc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -996,10 +996,11 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	 * pread/pwrite currently are reading and writing from the CPU
 	 * perspective, requiring manual detiling by the client.
 	 */
-	if (obj->phys_obj)
+	if (obj->phys_obj) {
 		ret = i915_gem_phys_pwrite(dev, obj, args, file);
-	else if (obj->gtt_space &&
-		 obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+		goto out;
+	} else if (obj->gtt_space &&
+		   obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
 		ret = i915_gem_object_pin(obj, 0, true);
 		if (ret)
 			goto out;
@@ -1018,18 +1019,24 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 
 out_unpin:
 		i915_gem_object_unpin(obj);
-	} else {
-		ret = i915_gem_object_set_to_cpu_domain(obj, 1);
-		if (ret)
-			goto out;
 
-		ret = -EFAULT;
-		if (!i915_gem_object_needs_bit17_swizzle(obj))
-			ret = i915_gem_shmem_pwrite_fast(dev, obj, args, file);
-		if (ret == -EFAULT)
-			ret = i915_gem_shmem_pwrite_slow(dev, obj, args, file);
+		if (ret != -EFAULT)
+			goto out;
+		/* Fall through to the shmfs paths because the gtt paths might
+		 * fail with non-page-backed user pointers (e.g. gtt mappings
+		 * when moving data between textures). */
 	}
 
+	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
+	if (ret)
+		goto out;
+
+	ret = -EFAULT;
+	if (!i915_gem_object_needs_bit17_swizzle(obj))
+		ret = i915_gem_shmem_pwrite_fast(dev, obj, args, file);
+	if (ret == -EFAULT)
+		ret = i915_gem_shmem_pwrite_slow(dev, obj, args, file);
+
 out:
 	drm_gem_object_unreference(&obj->base);
 unlock:
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 01/13] drm/i915: fall through pwrite_gtt_slow to the shmem slow path
@ 2011-11-06 19:13   ` Daniel Vetter
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, linux-kernel, dri-devel

The gtt_pwrite slowpath grabs the userspace memory with
get_user_pages. This will not work for non-page backed memory, like a
gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
-EFAULT in the gtt paths.

Now the shmem paths have exactly the same problem, but this way we
only need to rearrange the code in one write path.

v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |   31 +++++++++++++++++++------------
 1 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a838597..6fa24bc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -996,10 +996,11 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	 * pread/pwrite currently are reading and writing from the CPU
 	 * perspective, requiring manual detiling by the client.
 	 */
-	if (obj->phys_obj)
+	if (obj->phys_obj) {
 		ret = i915_gem_phys_pwrite(dev, obj, args, file);
-	else if (obj->gtt_space &&
-		 obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+		goto out;
+	} else if (obj->gtt_space &&
+		   obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
 		ret = i915_gem_object_pin(obj, 0, true);
 		if (ret)
 			goto out;
@@ -1018,18 +1019,24 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 
 out_unpin:
 		i915_gem_object_unpin(obj);
-	} else {
-		ret = i915_gem_object_set_to_cpu_domain(obj, 1);
-		if (ret)
-			goto out;
 
-		ret = -EFAULT;
-		if (!i915_gem_object_needs_bit17_swizzle(obj))
-			ret = i915_gem_shmem_pwrite_fast(dev, obj, args, file);
-		if (ret == -EFAULT)
-			ret = i915_gem_shmem_pwrite_slow(dev, obj, args, file);
+		if (ret != -EFAULT)
+			goto out;
+		/* Fall through to the shmfs paths because the gtt paths might
+		 * fail with non-page-backed user pointers (e.g. gtt mappings
+		 * when moving data between textures). */
 	}
 
+	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
+	if (ret)
+		goto out;
+
+	ret = -EFAULT;
+	if (!i915_gem_object_needs_bit17_swizzle(obj))
+		ret = i915_gem_shmem_pwrite_fast(dev, obj, args, file);
+	if (ret == -EFAULT)
+		ret = i915_gem_shmem_pwrite_slow(dev, obj, args, file);
+
 out:
 	drm_gem_object_unreference(&obj->base);
 unlock:
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
  2011-11-06 19:13 ` Daniel Vetter
@ 2011-11-06 19:13   ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.

To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:

- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
  already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
  pretty much undefined, now it just got a bit worse. set_tiling is
  only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
  read/write access, e.g. for synchronization), we might leave unflushed
  data in the cpu caches. The clflush_object at the end of pwrite_slow
  takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
  reinstate backing storage for truncated objects. The check at the end
  of pwrite_slow takes care of this.

v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |  126 ++++++++++++++++++++-------------------
 1 files changed, 64 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6fa24bc..f9efebb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -58,6 +58,7 @@ static void i915_gem_free_object_tail(struct drm_i915_gem_object *obj);
 
 static int i915_gem_inactive_shrink(struct shrinker *shrinker,
 				    struct shrink_control *sc);
+static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);
 
 /* some bookkeeping */
 static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv,
@@ -385,6 +386,32 @@ i915_gem_shmem_pread_fast(struct drm_device *dev,
 	return 0;
 }
 
+static inline int
+__copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset,
+			  const char *cpu_vaddr,
+			  int length)
+{
+	int ret, cpu_offset = 0;
+
+	while (length > 0) {
+		int cacheline_end = ALIGN(gpu_offset + 1, 64);
+		int this_length = min(cacheline_end - gpu_offset, length);
+		int swizzled_gpu_offset = gpu_offset ^ 64;
+
+		ret = __copy_from_user(gpu_vaddr + swizzled_gpu_offset,
+				       cpu_vaddr + cpu_offset,
+				       this_length);
+		if (ret)
+			return ret + length;
+
+		cpu_offset += this_length;
+		gpu_offset += this_length;
+		length -= this_length;
+	}
+
+	return 0;
+}
+
 /**
  * This is the fallback shmem pread path, which allocates temporary storage
  * in kernel space to copy_to_user into outside of the struct_mutex, so we
@@ -841,71 +868,36 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 			   struct drm_file *file)
 {
 	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
-	struct mm_struct *mm = current->mm;
-	struct page **user_pages;
 	ssize_t remain;
-	loff_t offset, pinned_pages, i;
-	loff_t first_data_page, last_data_page, num_pages;
-	int shmem_page_offset;
-	int data_page_index,  data_page_offset;
-	int page_length;
-	int ret;
-	uint64_t data_ptr = args->data_ptr;
-	int do_bit17_swizzling;
+	loff_t offset;
+	char __user *user_data;
+	int shmem_page_offset, page_length, ret;
+	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
 
+	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
 
-	/* Pin the user pages containing the data.  We can't fault while
-	 * holding the struct mutex, and all of the pwrite implementations
-	 * want to hold it while dereferencing the user data.
-	 */
-	first_data_page = data_ptr / PAGE_SIZE;
-	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
-	num_pages = last_data_page - first_data_page + 1;
-
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
-
-	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 0, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
-	if (ret)
-		goto out;
-
-	do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
+	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
 
 	offset = args->offset;
 	obj->dirty = 1;
 
+	mutex_unlock(&dev->struct_mutex);
+
 	while (remain > 0) {
 		struct page *page;
+		char *vaddr;
 
 		/* Operation in this page
 		 *
 		 * shmem_page_offset = offset within page in shmem file
-		 * data_page_index = page number in get_user_pages return
-		 * data_page_offset = offset with data_page_index page.
 		 * page_length = bytes to copy for this page
 		 */
 		shmem_page_offset = offset_in_page(offset);
-		data_page_index = data_ptr / PAGE_SIZE - first_data_page;
-		data_page_offset = offset_in_page(data_ptr);
 
 		page_length = remain;
 		if ((shmem_page_offset + page_length) > PAGE_SIZE)
 			page_length = PAGE_SIZE - shmem_page_offset;
-		if ((data_page_offset + page_length) > PAGE_SIZE)
-			page_length = PAGE_SIZE - data_page_offset;
 
 		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
 		if (IS_ERR(page)) {
@@ -913,34 +905,44 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 			goto out;
 		}
 
-		if (do_bit17_swizzling) {
-			slow_shmem_bit17_copy(page,
-					      shmem_page_offset,
-					      user_pages[data_page_index],
-					      data_page_offset,
-					      page_length,
-					      0);
-		} else {
-			slow_shmem_copy(page,
-					shmem_page_offset,
-					user_pages[data_page_index],
-					data_page_offset,
-					page_length);
-		}
+		page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
+
+		vaddr = kmap(page);
+		if (obj_do_bit17_swizzling && page_do_bit17_swizzling)
+			ret = __copy_from_user_swizzled(vaddr, shmem_page_offset,
+							user_data,
+							page_length);
+		else
+			ret = __copy_from_user(vaddr + shmem_page_offset,
+					       user_data,
+					       page_length);
+		kunmap(page);
 
 		set_page_dirty(page);
 		mark_page_accessed(page);
 		page_cache_release(page);
 
+		if (ret) {
+			ret = -EFAULT;
+			goto out;
+		}
+
 		remain -= page_length;
-		data_ptr += page_length;
+		user_data += page_length;
 		offset += page_length;
 	}
 
 out:
-	for (i = 0; i < pinned_pages; i++)
-		page_cache_release(user_pages[i]);
-	drm_free_large(user_pages);
+	mutex_lock(&dev->struct_mutex);
+	/* Fixup: Kill any reinstated backing storage pages */
+	if (obj->madv == __I915_MADV_PURGED)
+		i915_gem_object_truncate(obj);
+	/* and flush dirty cachelines in case the object isn't in the cpu write
+	 * domain anymore. */
+	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+		i915_gem_clflush_object(obj);
+		intel_gtt_chipset_flush();
+	}
 
 	return ret;
 }
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
@ 2011-11-06 19:13   ` Daniel Vetter
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, linux-kernel, dri-devel

... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.

To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:

- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
  already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
  pretty much undefined, now it just got a bit worse. set_tiling is
  only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
  read/write access, e.g. for synchronization), we might leave unflushed
  data in the cpu caches. The clflush_object at the end of pwrite_slow
  takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
  reinstate backing storage for truncated objects. The check at the end
  of pwrite_slow takes care of this.

v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |  126 ++++++++++++++++++++-------------------
 1 files changed, 64 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6fa24bc..f9efebb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -58,6 +58,7 @@ static void i915_gem_free_object_tail(struct drm_i915_gem_object *obj);
 
 static int i915_gem_inactive_shrink(struct shrinker *shrinker,
 				    struct shrink_control *sc);
+static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);
 
 /* some bookkeeping */
 static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv,
@@ -385,6 +386,32 @@ i915_gem_shmem_pread_fast(struct drm_device *dev,
 	return 0;
 }
 
+static inline int
+__copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset,
+			  const char *cpu_vaddr,
+			  int length)
+{
+	int ret, cpu_offset = 0;
+
+	while (length > 0) {
+		int cacheline_end = ALIGN(gpu_offset + 1, 64);
+		int this_length = min(cacheline_end - gpu_offset, length);
+		int swizzled_gpu_offset = gpu_offset ^ 64;
+
+		ret = __copy_from_user(gpu_vaddr + swizzled_gpu_offset,
+				       cpu_vaddr + cpu_offset,
+				       this_length);
+		if (ret)
+			return ret + length;
+
+		cpu_offset += this_length;
+		gpu_offset += this_length;
+		length -= this_length;
+	}
+
+	return 0;
+}
+
 /**
  * This is the fallback shmem pread path, which allocates temporary storage
  * in kernel space to copy_to_user into outside of the struct_mutex, so we
@@ -841,71 +868,36 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 			   struct drm_file *file)
 {
 	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
-	struct mm_struct *mm = current->mm;
-	struct page **user_pages;
 	ssize_t remain;
-	loff_t offset, pinned_pages, i;
-	loff_t first_data_page, last_data_page, num_pages;
-	int shmem_page_offset;
-	int data_page_index,  data_page_offset;
-	int page_length;
-	int ret;
-	uint64_t data_ptr = args->data_ptr;
-	int do_bit17_swizzling;
+	loff_t offset;
+	char __user *user_data;
+	int shmem_page_offset, page_length, ret;
+	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
 
+	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
 
-	/* Pin the user pages containing the data.  We can't fault while
-	 * holding the struct mutex, and all of the pwrite implementations
-	 * want to hold it while dereferencing the user data.
-	 */
-	first_data_page = data_ptr / PAGE_SIZE;
-	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
-	num_pages = last_data_page - first_data_page + 1;
-
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
-
-	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 0, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
-	if (ret)
-		goto out;
-
-	do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
+	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
 
 	offset = args->offset;
 	obj->dirty = 1;
 
+	mutex_unlock(&dev->struct_mutex);
+
 	while (remain > 0) {
 		struct page *page;
+		char *vaddr;
 
 		/* Operation in this page
 		 *
 		 * shmem_page_offset = offset within page in shmem file
-		 * data_page_index = page number in get_user_pages return
-		 * data_page_offset = offset with data_page_index page.
 		 * page_length = bytes to copy for this page
 		 */
 		shmem_page_offset = offset_in_page(offset);
-		data_page_index = data_ptr / PAGE_SIZE - first_data_page;
-		data_page_offset = offset_in_page(data_ptr);
 
 		page_length = remain;
 		if ((shmem_page_offset + page_length) > PAGE_SIZE)
 			page_length = PAGE_SIZE - shmem_page_offset;
-		if ((data_page_offset + page_length) > PAGE_SIZE)
-			page_length = PAGE_SIZE - data_page_offset;
 
 		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
 		if (IS_ERR(page)) {
@@ -913,34 +905,44 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 			goto out;
 		}
 
-		if (do_bit17_swizzling) {
-			slow_shmem_bit17_copy(page,
-					      shmem_page_offset,
-					      user_pages[data_page_index],
-					      data_page_offset,
-					      page_length,
-					      0);
-		} else {
-			slow_shmem_copy(page,
-					shmem_page_offset,
-					user_pages[data_page_index],
-					data_page_offset,
-					page_length);
-		}
+		page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
+
+		vaddr = kmap(page);
+		if (obj_do_bit17_swizzling && page_do_bit17_swizzling)
+			ret = __copy_from_user_swizzled(vaddr, shmem_page_offset,
+							user_data,
+							page_length);
+		else
+			ret = __copy_from_user(vaddr + shmem_page_offset,
+					       user_data,
+					       page_length);
+		kunmap(page);
 
 		set_page_dirty(page);
 		mark_page_accessed(page);
 		page_cache_release(page);
 
+		if (ret) {
+			ret = -EFAULT;
+			goto out;
+		}
+
 		remain -= page_length;
-		data_ptr += page_length;
+		user_data += page_length;
 		offset += page_length;
 	}
 
 out:
-	for (i = 0; i < pinned_pages; i++)
-		page_cache_release(user_pages[i]);
-	drm_free_large(user_pages);
+	mutex_lock(&dev->struct_mutex);
+	/* Fixup: Kill any reinstated backing storage pages */
+	if (obj->madv == __I915_MADV_PURGED)
+		i915_gem_object_truncate(obj);
+	/* and flush dirty cachelines in case the object isn't in the cpu write
+	 * domain anymore. */
+	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+		i915_gem_clflush_object(obj);
+		intel_gtt_chipset_flush();
+	}
 
 	return ret;
 }
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/13] drm/i915: rewrite shmem_pread_slow to use copy_to_user
  2011-11-06 19:13 ` Daniel Vetter
                   ` (2 preceding siblings ...)
  (?)
@ 2011-11-06 19:13 ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

Like for shmem_pwrite_slow. The only difference is that because we
read data, we can leave the fetched cachelines in the cpu: In the case
that the object isn't in the cpu read domain anymore, the clflush for
the next cpu read domain invalidation will simply drop these
cachelines.

slow_shmem_bit17_copy is now ununsed, so kill it.

With this patch tests/gem_mmap_gtt now actually works.

v2: add __ to copy_to_user_swizzled as suggested by Chris Wilson.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |  172 +++++++++++++--------------------------
 1 files changed, 56 insertions(+), 116 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f9efebb..be1dd4f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -277,55 +277,6 @@ slow_shmem_copy(struct page *dst_page,
 	kunmap(dst_page);
 }
 
-static inline void
-slow_shmem_bit17_copy(struct page *gpu_page,
-		      int gpu_offset,
-		      struct page *cpu_page,
-		      int cpu_offset,
-		      int length,
-		      int is_read)
-{
-	char *gpu_vaddr, *cpu_vaddr;
-
-	/* Use the unswizzled path if this page isn't affected. */
-	if ((page_to_phys(gpu_page) & (1 << 17)) == 0) {
-		if (is_read)
-			return slow_shmem_copy(cpu_page, cpu_offset,
-					       gpu_page, gpu_offset, length);
-		else
-			return slow_shmem_copy(gpu_page, gpu_offset,
-					       cpu_page, cpu_offset, length);
-	}
-
-	gpu_vaddr = kmap(gpu_page);
-	cpu_vaddr = kmap(cpu_page);
-
-	/* Copy the data, XORing A6 with A17 (1). The user already knows he's
-	 * XORing with the other bits (A9 for Y, A9 and A10 for X)
-	 */
-	while (length > 0) {
-		int cacheline_end = ALIGN(gpu_offset + 1, 64);
-		int this_length = min(cacheline_end - gpu_offset, length);
-		int swizzled_gpu_offset = gpu_offset ^ 64;
-
-		if (is_read) {
-			memcpy(cpu_vaddr + cpu_offset,
-			       gpu_vaddr + swizzled_gpu_offset,
-			       this_length);
-		} else {
-			memcpy(gpu_vaddr + swizzled_gpu_offset,
-			       cpu_vaddr + cpu_offset,
-			       this_length);
-		}
-		cpu_offset += this_length;
-		gpu_offset += this_length;
-		length -= this_length;
-	}
-
-	kunmap(cpu_page);
-	kunmap(gpu_page);
-}
-
 /**
  * This is the fast shmem pread path, which attempts to copy_from_user directly
  * from the backing pages of the object to the user's address space.  On a
@@ -387,6 +338,32 @@ i915_gem_shmem_pread_fast(struct drm_device *dev,
 }
 
 static inline int
+__copy_to_user_swizzled(char __user *cpu_vaddr,
+			const char *gpu_vaddr, int gpu_offset,
+			int length)
+{
+	int ret, cpu_offset = 0;
+
+	while (length > 0) {
+		int cacheline_end = ALIGN(gpu_offset + 1, 64);
+		int this_length = min(cacheline_end - gpu_offset, length);
+		int swizzled_gpu_offset = gpu_offset ^ 64;
+
+		ret = __copy_to_user(cpu_vaddr + cpu_offset,
+				     gpu_vaddr + swizzled_gpu_offset,
+				     this_length);
+		if (ret)
+			return ret + length;
+
+		cpu_offset += this_length;
+		gpu_offset += this_length;
+		length -= this_length;
+	}
+
+	return 0;
+}
+
+static inline int
 __copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset,
 			  const char *cpu_vaddr,
 			  int length)
@@ -425,72 +402,34 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 			  struct drm_file *file)
 {
 	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
-	struct mm_struct *mm = current->mm;
-	struct page **user_pages;
+	char __user *user_data;
 	ssize_t remain;
-	loff_t offset, pinned_pages, i;
-	loff_t first_data_page, last_data_page, num_pages;
-	int shmem_page_offset;
-	int data_page_index, data_page_offset;
-	int page_length;
-	int ret;
-	uint64_t data_ptr = args->data_ptr;
-	int do_bit17_swizzling;
+	loff_t offset;
+	int shmem_page_offset, page_length, ret;
+	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
 
+	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
 
-	/* Pin the user pages containing the data.  We can't fault while
-	 * holding the struct mutex, yet we want to hold it while
-	 * dereferencing the user data.
-	 */
-	first_data_page = data_ptr / PAGE_SIZE;
-	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
-	num_pages = last_data_page - first_data_page + 1;
+	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
 
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
+	offset = args->offset;
 
 	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 1, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	ret = i915_gem_object_set_cpu_read_domain_range(obj,
-							args->offset,
-							args->size);
-	if (ret)
-		goto out;
-
-	do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
-
-	offset = args->offset;
 
 	while (remain > 0) {
 		struct page *page;
+		char *vaddr;
 
 		/* Operation in this page
 		 *
 		 * shmem_page_offset = offset within page in shmem file
-		 * data_page_index = page number in get_user_pages return
-		 * data_page_offset = offset with data_page_index page.
 		 * page_length = bytes to copy for this page
 		 */
 		shmem_page_offset = offset_in_page(offset);
-		data_page_index = data_ptr / PAGE_SIZE - first_data_page;
-		data_page_offset = offset_in_page(data_ptr);
-
 		page_length = remain;
 		if ((shmem_page_offset + page_length) > PAGE_SIZE)
 			page_length = PAGE_SIZE - shmem_page_offset;
-		if ((data_page_offset + page_length) > PAGE_SIZE)
-			page_length = PAGE_SIZE - data_page_offset;
 
 		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
 		if (IS_ERR(page)) {
@@ -498,36 +437,37 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 			goto out;
 		}
 
-		if (do_bit17_swizzling) {
-			slow_shmem_bit17_copy(page,
-					      shmem_page_offset,
-					      user_pages[data_page_index],
-					      data_page_offset,
-					      page_length,
-					      1);
-		} else {
-			slow_shmem_copy(user_pages[data_page_index],
-					data_page_offset,
-					page,
-					shmem_page_offset,
-					page_length);
-		}
+		page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
+
+		vaddr = kmap(page);
+		if (obj_do_bit17_swizzling && page_do_bit17_swizzling)
+			ret = __copy_to_user_swizzled(user_data,
+						      vaddr, shmem_page_offset,
+						      page_length);
+		else
+			ret = __copy_to_user(user_data,
+					     vaddr + shmem_page_offset,
+					     page_length);
+		kunmap(page);
 
 		mark_page_accessed(page);
 		page_cache_release(page);
 
+		if (ret) {
+			ret = -EFAULT;
+			goto out;
+		}
+
 		remain -= page_length;
-		data_ptr += page_length;
+		user_data += page_length;
 		offset += page_length;
 	}
 
 out:
-	for (i = 0; i < pinned_pages; i++) {
-		SetPageDirty(user_pages[i]);
-		mark_page_accessed(user_pages[i]);
-		page_cache_release(user_pages[i]);
-	}
-	drm_free_large(user_pages);
+	mutex_lock(&dev->struct_mutex);
+	/* Fixup: Kill any reinstated backing storage pages */
+	if (obj->madv == __I915_MADV_PURGED)
+		i915_gem_object_truncate(obj);
 
 	return ret;
 }
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/13] drm/i915: merge shmem_pwrite slow&fast-path
  2011-11-06 19:13 ` Daniel Vetter
                   ` (3 preceding siblings ...)
  (?)
@ 2011-11-06 19:13 ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

With the previous rewrite, they've become essential identical.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |  126 ++++++++++----------------------------
 1 files changed, 33 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index be1dd4f..52fc0b5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -728,84 +728,11 @@ out_unpin_pages:
 	return ret;
 }
 
-/**
- * This is the fast shmem pwrite path, which attempts to directly
- * copy_from_user into the kmapped pages backing the object.
- */
-static int
-i915_gem_shmem_pwrite_fast(struct drm_device *dev,
-			   struct drm_i915_gem_object *obj,
-			   struct drm_i915_gem_pwrite *args,
-			   struct drm_file *file)
-{
-	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
-	ssize_t remain;
-	loff_t offset;
-	char __user *user_data;
-	int page_offset, page_length;
-
-	user_data = (char __user *) (uintptr_t) args->data_ptr;
-	remain = args->size;
-
-	offset = args->offset;
-	obj->dirty = 1;
-
-	while (remain > 0) {
-		struct page *page;
-		char *vaddr;
-		int ret;
-
-		/* Operation in this page
-		 *
-		 * page_offset = offset within page
-		 * page_length = bytes to copy for this page
-		 */
-		page_offset = offset_in_page(offset);
-		page_length = remain;
-		if ((page_offset + remain) > PAGE_SIZE)
-			page_length = PAGE_SIZE - page_offset;
-
-		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
-		if (IS_ERR(page))
-			return PTR_ERR(page);
-
-		vaddr = kmap_atomic(page);
-		ret = __copy_from_user_inatomic(vaddr + page_offset,
-						user_data,
-						page_length);
-		kunmap_atomic(vaddr);
-
-		set_page_dirty(page);
-		mark_page_accessed(page);
-		page_cache_release(page);
-
-		/* If we get a fault while copying data, then (presumably) our
-		 * source page isn't available.  Return the error and we'll
-		 * retry in the slow path.
-		 */
-		if (ret)
-			return -EFAULT;
-
-		remain -= page_length;
-		user_data += page_length;
-		offset += page_length;
-	}
-
-	return 0;
-}
-
-/**
- * This is the fallback shmem pwrite path, which uses get_user_pages to pin
- * the memory and maps it using kmap_atomic for copying.
- *
- * This avoids taking mmap_sem for faulting on the user's address while the
- * struct_mutex is held.
- */
 static int
-i915_gem_shmem_pwrite_slow(struct drm_device *dev,
-			   struct drm_i915_gem_object *obj,
-			   struct drm_i915_gem_pwrite *args,
-			   struct drm_file *file)
+i915_gem_shmem_pwrite(struct drm_device *dev,
+		      struct drm_i915_gem_object *obj,
+		      struct drm_i915_gem_pwrite *args,
+		      struct drm_file *file)
 {
 	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
 	ssize_t remain;
@@ -813,6 +740,7 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 	char __user *user_data;
 	int shmem_page_offset, page_length, ret;
 	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
+	int hit_slowpath = 0;
 
 	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
@@ -822,8 +750,6 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 	offset = args->offset;
 	obj->dirty = 1;
 
-	mutex_unlock(&dev->struct_mutex);
-
 	while (remain > 0) {
 		struct page *page;
 		char *vaddr;
@@ -847,6 +773,21 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 
 		page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
 
+		if (!obj_do_bit17_swizzling || !page_do_bit17_swizzling) {
+			vaddr = kmap_atomic(page);
+			ret = __copy_from_user_inatomic(vaddr + shmem_page_offset,
+							user_data,
+							page_length);
+			kunmap_atomic(vaddr);
+
+			if (ret == 0) 
+				goto next_page;
+		}
+
+		hit_slowpath = 1;
+
+		mutex_unlock(&dev->struct_mutex);
+
 		vaddr = kmap(page);
 		if (obj_do_bit17_swizzling && page_do_bit17_swizzling)
 			ret = __copy_from_user_swizzled(vaddr, shmem_page_offset,
@@ -858,6 +799,8 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 					       page_length);
 		kunmap(page);
 
+		mutex_lock(&dev->struct_mutex);
+next_page:
 		set_page_dirty(page);
 		mark_page_accessed(page);
 		page_cache_release(page);
@@ -873,15 +816,16 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
 	}
 
 out:
-	mutex_lock(&dev->struct_mutex);
-	/* Fixup: Kill any reinstated backing storage pages */
-	if (obj->madv == __I915_MADV_PURGED)
-		i915_gem_object_truncate(obj);
-	/* and flush dirty cachelines in case the object isn't in the cpu write
-	 * domain anymore. */
-	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
-		i915_gem_clflush_object(obj);
-		intel_gtt_chipset_flush();
+	if (hit_slowpath) {
+		/* Fixup: Kill any reinstated backing storage pages */
+		if (obj->madv == __I915_MADV_PURGED)
+			i915_gem_object_truncate(obj);
+		/* and flush dirty cachelines in case the object isn't in the cpu write
+		 * domain anymore. */
+		if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
+			i915_gem_clflush_object(obj);
+			intel_gtt_chipset_flush();
+		}
 	}
 
 	return ret;
@@ -973,11 +917,7 @@ out_unpin:
 	if (ret)
 		goto out;
 
-	ret = -EFAULT;
-	if (!i915_gem_object_needs_bit17_swizzle(obj))
-		ret = i915_gem_shmem_pwrite_fast(dev, obj, args, file);
-	if (ret == -EFAULT)
-		ret = i915_gem_shmem_pwrite_slow(dev, obj, args, file);
+	ret = i915_gem_shmem_pwrite(dev, obj, args, file);
 
 out:
 	drm_gem_object_unreference(&obj->base);
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/13] drm/i915: merge shmem_pread slow&fast-path
  2011-11-06 19:13 ` Daniel Vetter
                   ` (4 preceding siblings ...)
  (?)
@ 2011-11-06 19:13 ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

With the previous rewrite, they've become essential identical.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |  108 ++++++++++-----------------------------
 1 files changed, 27 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 52fc0b5..5eec933 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -277,66 +277,6 @@ slow_shmem_copy(struct page *dst_page,
 	kunmap(dst_page);
 }
 
-/**
- * This is the fast shmem pread path, which attempts to copy_from_user directly
- * from the backing pages of the object to the user's address space.  On a
- * fault, it fails so we can fall back to i915_gem_shmem_pwrite_slow().
- */
-static int
-i915_gem_shmem_pread_fast(struct drm_device *dev,
-			  struct drm_i915_gem_object *obj,
-			  struct drm_i915_gem_pread *args,
-			  struct drm_file *file)
-{
-	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
-	ssize_t remain;
-	loff_t offset;
-	char __user *user_data;
-	int page_offset, page_length;
-
-	user_data = (char __user *) (uintptr_t) args->data_ptr;
-	remain = args->size;
-
-	offset = args->offset;
-
-	while (remain > 0) {
-		struct page *page;
-		char *vaddr;
-		int ret;
-
-		/* Operation in this page
-		 *
-		 * page_offset = offset within page
-		 * page_length = bytes to copy for this page
-		 */
-		page_offset = offset_in_page(offset);
-		page_length = remain;
-		if ((page_offset + remain) > PAGE_SIZE)
-			page_length = PAGE_SIZE - page_offset;
-
-		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
-		if (IS_ERR(page))
-			return PTR_ERR(page);
-
-		vaddr = kmap_atomic(page);
-		ret = __copy_to_user_inatomic(user_data,
-					      vaddr + page_offset,
-					      page_length);
-		kunmap_atomic(vaddr);
-
-		mark_page_accessed(page);
-		page_cache_release(page);
-		if (ret)
-			return -EFAULT;
-
-		remain -= page_length;
-		user_data += page_length;
-		offset += page_length;
-	}
-
-	return 0;
-}
-
 static inline int
 __copy_to_user_swizzled(char __user *cpu_vaddr,
 			const char *gpu_vaddr, int gpu_offset,
@@ -389,17 +329,11 @@ __copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset,
 	return 0;
 }
 
-/**
- * This is the fallback shmem pread path, which allocates temporary storage
- * in kernel space to copy_to_user into outside of the struct_mutex, so we
- * can copy out of the object's backing pages while holding the struct mutex
- * and not take page faults.
- */
 static int
-i915_gem_shmem_pread_slow(struct drm_device *dev,
-			  struct drm_i915_gem_object *obj,
-			  struct drm_i915_gem_pread *args,
-			  struct drm_file *file)
+i915_gem_shmem_pread(struct drm_device *dev,
+		     struct drm_i915_gem_object *obj,
+		     struct drm_i915_gem_pread *args,
+		     struct drm_file *file)
 {
 	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
 	char __user *user_data;
@@ -407,6 +341,7 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 	loff_t offset;
 	int shmem_page_offset, page_length, ret;
 	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
+	int hit_slowpath = 0;
 
 	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
@@ -415,8 +350,6 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 
 	offset = args->offset;
 
-	mutex_unlock(&dev->struct_mutex);
-
 	while (remain > 0) {
 		struct page *page;
 		char *vaddr;
@@ -439,6 +372,20 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 
 		page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
 
+		if (!obj_do_bit17_swizzling || !page_do_bit17_swizzling) {
+			vaddr = kmap_atomic(page);
+			ret = __copy_to_user_inatomic(user_data,
+						      vaddr + shmem_page_offset,
+						      page_length);
+			kunmap_atomic(vaddr);
+			if (ret == 0) 
+				goto next_page;
+		}
+
+		hit_slowpath = 1;
+
+		mutex_unlock(&dev->struct_mutex);
+
 		vaddr = kmap(page);
 		if (obj_do_bit17_swizzling && page_do_bit17_swizzling)
 			ret = __copy_to_user_swizzled(user_data,
@@ -450,6 +397,8 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 					     page_length);
 		kunmap(page);
 
+		mutex_lock(&dev->struct_mutex);
+next_page:
 		mark_page_accessed(page);
 		page_cache_release(page);
 
@@ -464,10 +413,11 @@ i915_gem_shmem_pread_slow(struct drm_device *dev,
 	}
 
 out:
-	mutex_lock(&dev->struct_mutex);
-	/* Fixup: Kill any reinstated backing storage pages */
-	if (obj->madv == __I915_MADV_PURGED)
-		i915_gem_object_truncate(obj);
+	if (hit_slowpath) {
+		/* Fixup: Kill any reinstated backing storage pages */
+		if (obj->madv == __I915_MADV_PURGED)
+			i915_gem_object_truncate(obj);
+	}
 
 	return ret;
 }
@@ -523,11 +473,7 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto out;
 
-	ret = -EFAULT;
-	if (!i915_gem_object_needs_bit17_swizzle(obj))
-		ret = i915_gem_shmem_pread_fast(dev, obj, args, file);
-	if (ret == -EFAULT)
-		ret = i915_gem_shmem_pread_slow(dev, obj, args, file);
+	ret = i915_gem_shmem_pread(dev, obj, args, file);
 
 out:
 	drm_gem_object_unreference(&obj->base);
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/13] drm: add helper to clflush a virtual address range
  2011-11-06 19:13 ` Daniel Vetter
                   ` (5 preceding siblings ...)
  (?)
@ 2011-11-06 19:13 ` Daniel Vetter
  2011-11-21 19:46   ` [Intel-gfx] " Ben Widawsky
  -1 siblings, 1 reply; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

Useful when the page is already mapped to copy date in/out.

Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/drm_cache.c |   23 +++++++++++++++++++++++
 include/drm/drmP.h          |    1 +
 2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 0e3bd5b..502771a 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -97,3 +97,26 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages)
 #endif
 }
 EXPORT_SYMBOL(drm_clflush_pages);
+
+void
+drm_clflush_virt_range(char *addr, unsigned long length)
+{
+#if defined(CONFIG_X86)
+	if (cpu_has_clflush) {
+		char *end = addr + length;
+		mb();
+		for (; addr < end; addr += boot_cpu_data.x86_clflush_size)
+			clflush(addr);
+		clflush(end - 1);
+		mb();
+		return;
+	}
+
+	if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0)
+		printk(KERN_ERR "Timed out waiting for cache flush.\n");
+#else
+	printk(KERN_ERR "Architecture has no drm_cache.c support\n");
+	WARN_ON_ONCE(1);
+#endif
+}
+EXPORT_SYMBOL(drm_clflush_virt_range);
diff --git a/include/drm/drmP.h b/include/drm/drmP.h
index 43538b6..a082640 100644
--- a/include/drm/drmP.h
+++ b/include/drm/drmP.h
@@ -1328,6 +1328,7 @@ extern int drm_authmagic(struct drm_device *dev, void *data,
 
 /* Cache management (drm_cache.c) */
 void drm_clflush_pages(struct page *pages[], unsigned long num_pages);
+void drm_clflush_virt_range(char *addr, unsigned long length);
 
 				/* Locking IOCTL support (drm_lock.h) */
 extern int drm_lock(struct drm_device *dev, void *data,
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/13] drm/i915: move clflushing into shmem_pread
  2011-11-06 19:13 ` Daniel Vetter
                   ` (6 preceding siblings ...)
  (?)
@ 2011-11-06 19:13 ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

This is obviously gonna slow down pread. But for a half-way realistic
micro-benchmark, it doesn't matter: Non-broken userspace reads back
data from the gpu once before the gpu again dirties it.

So all this ranged clflush tracking is just a waste of time.

No pread performance change (neglecting the dumb benchmark of
constantly reading the same data) measured.

As an added bonus, this avoids clflush on read on coherent objects.
Which means that partial preads on snb are now roughly 4x as fast.
This will be usefull for e.g. the libva encoder - when I finally get
around to fix that up.

v2: Properly sync with the gpu on LLC machines.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |   26 ++++++++++++++++++++------
 1 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5eec933..0e9cc81 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -342,12 +342,25 @@ i915_gem_shmem_pread(struct drm_device *dev,
 	int shmem_page_offset, page_length, ret;
 	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
 	int hit_slowpath = 0;
+	int needs_clflush = 0;
 
 	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
 
 	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
 
+	if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)) {
+		/* If we're not in the cpu read domain, set ourself into the gtt
+		 * read domain and manually flush cachelines (if required). This
+		 * optimizes for the case when the gpu will dirty the data
+		 * anyway again before the next pread happens. */
+		if (obj->cache_level == I915_CACHE_NONE)
+			needs_clflush = 1;
+		ret = i915_gem_object_set_to_gtt_domain(obj, false);
+		if (ret)
+			return ret;
+	}
+
 	offset = args->offset;
 
 	while (remain > 0) {
@@ -374,6 +387,9 @@ i915_gem_shmem_pread(struct drm_device *dev,
 
 		if (!obj_do_bit17_swizzling || !page_do_bit17_swizzling) {
 			vaddr = kmap_atomic(page);
+			if (needs_clflush)
+				drm_clflush_virt_range(vaddr + shmem_page_offset,
+						       page_length);
 			ret = __copy_to_user_inatomic(user_data,
 						      vaddr + shmem_page_offset,
 						      page_length);
@@ -387,6 +403,10 @@ i915_gem_shmem_pread(struct drm_device *dev,
 		mutex_unlock(&dev->struct_mutex);
 
 		vaddr = kmap(page);
+		if (needs_clflush)
+			drm_clflush_virt_range(vaddr + shmem_page_offset,
+					       page_length);
+
 		if (obj_do_bit17_swizzling && page_do_bit17_swizzling)
 			ret = __copy_to_user_swizzled(user_data,
 						      vaddr, shmem_page_offset,
@@ -467,12 +487,6 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 
 	trace_i915_gem_object_pread(obj, args->offset, args->size);
 
-	ret = i915_gem_object_set_cpu_read_domain_range(obj,
-							args->offset,
-							args->size);
-	if (ret)
-		goto out;
-
 	ret = i915_gem_shmem_pread(dev, obj, args, file);
 
 out:
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/13] drm/i915: kill ranged cpu read domain support
  2011-11-06 19:13 ` Daniel Vetter
                   ` (7 preceding siblings ...)
  (?)
@ 2011-11-06 19:13 ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

No longer needed.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_drv.h |    7 --
 drivers/gpu/drm/i915/i915_gem.c |  117 ---------------------------------------
 2 files changed, 0 insertions(+), 124 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 25036f5..fe4b680 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -861,13 +861,6 @@ struct drm_i915_gem_object {
 	/** Record of address bit 17 of each page at last unbind. */
 	unsigned long *bit_17;
 
-
-	/**
-	 * If present, while GEM_DOMAIN_CPU is in the read domain this array
-	 * flags which individual pages are valid.
-	 */
-	uint8_t *page_cpu_valid;
-
 	/** User space pin count and filp owning the pin */
 	uint32_t user_pin_count;
 	struct drm_file *pin_filp;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0e9cc81..0048917 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -41,10 +41,6 @@ static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *o
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
 static __must_check int i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj,
 							  bool write);
-static __must_check int i915_gem_object_set_cpu_read_domain_range(struct drm_i915_gem_object *obj,
-								  uint64_t offset,
-								  uint64_t size);
-static void i915_gem_object_set_to_full_cpu_read_domain(struct drm_i915_gem_object *obj);
 static __must_check int i915_gem_object_bind_to_gtt(struct drm_i915_gem_object *obj,
 						    unsigned alignment,
 						    bool map_and_fenceable);
@@ -2964,11 +2960,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	i915_gem_object_flush_gtt_write_domain(obj);
 
-	/* If we have a partially-valid cache of the object in the CPU,
-	 * finish invalidating it and free the per-page flags.
-	 */
-	i915_gem_object_set_to_full_cpu_read_domain(obj);
-
 	old_write_domain = obj->base.write_domain;
 	old_read_domains = obj->base.read_domains;
 
@@ -2999,113 +2990,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 	return 0;
 }
 
-/**
- * Moves the object from a partially CPU read to a full one.
- *
- * Note that this only resolves i915_gem_object_set_cpu_read_domain_range(),
- * and doesn't handle transitioning from !(read_domains & I915_GEM_DOMAIN_CPU).
- */
-static void
-i915_gem_object_set_to_full_cpu_read_domain(struct drm_i915_gem_object *obj)
-{
-	if (!obj->page_cpu_valid)
-		return;
-
-	/* If we're partially in the CPU read domain, finish moving it in.
-	 */
-	if (obj->base.read_domains & I915_GEM_DOMAIN_CPU) {
-		int i;
-
-		for (i = 0; i <= (obj->base.size - 1) / PAGE_SIZE; i++) {
-			if (obj->page_cpu_valid[i])
-				continue;
-			drm_clflush_pages(obj->pages + i, 1);
-		}
-	}
-
-	/* Free the page_cpu_valid mappings which are now stale, whether
-	 * or not we've got I915_GEM_DOMAIN_CPU.
-	 */
-	kfree(obj->page_cpu_valid);
-	obj->page_cpu_valid = NULL;
-}
-
-/**
- * Set the CPU read domain on a range of the object.
- *
- * The object ends up with I915_GEM_DOMAIN_CPU in its read flags although it's
- * not entirely valid.  The page_cpu_valid member of the object flags which
- * pages have been flushed, and will be respected by
- * i915_gem_object_set_to_cpu_domain() if it's called on to get a valid mapping
- * of the whole object.
- *
- * This function returns when the move is complete, including waiting on
- * flushes to occur.
- */
-static int
-i915_gem_object_set_cpu_read_domain_range(struct drm_i915_gem_object *obj,
-					  uint64_t offset, uint64_t size)
-{
-	uint32_t old_read_domains;
-	int i, ret;
-
-	if (offset == 0 && size == obj->base.size)
-		return i915_gem_object_set_to_cpu_domain(obj, 0);
-
-	ret = i915_gem_object_flush_gpu_write_domain(obj);
-	if (ret)
-		return ret;
-
-	ret = i915_gem_object_wait_rendering(obj);
-	if (ret)
-		return ret;
-
-	i915_gem_object_flush_gtt_write_domain(obj);
-
-	/* If we're already fully in the CPU read domain, we're done. */
-	if (obj->page_cpu_valid == NULL &&
-	    (obj->base.read_domains & I915_GEM_DOMAIN_CPU) != 0)
-		return 0;
-
-	/* Otherwise, create/clear the per-page CPU read domain flag if we're
-	 * newly adding I915_GEM_DOMAIN_CPU
-	 */
-	if (obj->page_cpu_valid == NULL) {
-		obj->page_cpu_valid = kzalloc(obj->base.size / PAGE_SIZE,
-					      GFP_KERNEL);
-		if (obj->page_cpu_valid == NULL)
-			return -ENOMEM;
-	} else if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
-		memset(obj->page_cpu_valid, 0, obj->base.size / PAGE_SIZE);
-
-	/* Flush the cache on any pages that are still invalid from the CPU's
-	 * perspective.
-	 */
-	for (i = offset / PAGE_SIZE; i <= (offset + size - 1) / PAGE_SIZE;
-	     i++) {
-		if (obj->page_cpu_valid[i])
-			continue;
-
-		drm_clflush_pages(obj->pages + i, 1);
-
-		obj->page_cpu_valid[i] = 1;
-	}
-
-	/* It should now be out of any other write domains, and we can update
-	 * the domain values for our changes.
-	 */
-	BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_CPU) != 0);
-
-	old_read_domains = obj->base.read_domains;
-	obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
-
-	trace_i915_gem_object_change_domain(obj,
-					    old_read_domains,
-					    obj->base.write_domain);
-
-	return 0;
-}
-
 /* Throttle our rendering by waiting until the ring has completed our requests
  * emitted over 20 msec ago.
  *
@@ -3521,7 +3405,6 @@ static void i915_gem_free_object_tail(struct drm_i915_gem_object *obj)
 	drm_gem_object_release(&obj->base);
 	i915_gem_info_remove_obj(dev_priv, obj->base.size);
 
-	kfree(obj->page_cpu_valid);
 	kfree(obj->bit_17);
 	kfree(obj);
 }
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached objects
  2011-11-06 19:13 ` Daniel Vetter
                   ` (8 preceding siblings ...)
  (?)
@ 2011-11-06 19:13 ` Daniel Vetter
  2011-11-06 21:16     ` Chris Wilson
  -1 siblings, 1 reply; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0048917..8fd175c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -842,6 +842,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 		ret = i915_gem_phys_pwrite(dev, obj, args, file);
 		goto out;
 	} else if (obj->gtt_space &&
+		   obj->cache_level == I915_CACHE_NONE &&
 		   obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
 		ret = i915_gem_object_pin(obj, 0, true);
 		if (ret)
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/13] drm/i915: don't call shmem_read_mapping unnecessarily
  2011-11-06 19:13 ` Daniel Vetter
                   ` (9 preceding siblings ...)
  (?)
@ 2011-11-06 19:13 ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

This speeds up pwrite and pread from ~120 µs ro ~100 µs for
reading/writing 1mb on my snb (if the backing storage pages
are already pinned, of course).

v2: Chris Wilson pointed out a claring page reference bug - I've
unconditionally dropped the reference. With that fixed (and the
associated reduction of dirt in dmesg) it's now even a notch faster.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |   42 +++++++++++++++++++++++++++++---------
 1 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8fd175c..5b8a7c5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -339,6 +339,7 @@ i915_gem_shmem_pread(struct drm_device *dev,
 	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
 	int hit_slowpath = 0;
 	int needs_clflush = 0;
+	int release_page;
 
 	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
@@ -373,10 +374,16 @@ i915_gem_shmem_pread(struct drm_device *dev,
 		if ((shmem_page_offset + page_length) > PAGE_SIZE)
 			page_length = PAGE_SIZE - shmem_page_offset;
 
-		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
-		if (IS_ERR(page)) {
-			ret = PTR_ERR(page);
-			goto out;
+		if (obj->pages) {
+			page = obj->pages[offset >> PAGE_SHIFT];
+			release_page = 0;
+		} else {
+			page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
+			if (IS_ERR(page)) {
+				ret = PTR_ERR(page);
+				goto out;
+			}
+			release_page = 1;
 		}
 
 		page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
@@ -395,6 +402,9 @@ i915_gem_shmem_pread(struct drm_device *dev,
 		}
 
 		hit_slowpath = 1;
+		if (!release_page)
+			page_cache_get(page);
+		release_page = 1;
 
 		mutex_unlock(&dev->struct_mutex);
 
@@ -416,7 +426,8 @@ i915_gem_shmem_pread(struct drm_device *dev,
 		mutex_lock(&dev->struct_mutex);
 next_page:
 		mark_page_accessed(page);
-		page_cache_release(page);
+		if (release_page)
+			page_cache_release(page);
 
 		if (ret) {
 			ret = -EFAULT;
@@ -697,6 +708,7 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 	int shmem_page_offset, page_length, ret;
 	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
 	int hit_slowpath = 0;
+	int release_page;
 
 	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
@@ -721,10 +733,16 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		if ((shmem_page_offset + page_length) > PAGE_SIZE)
 			page_length = PAGE_SIZE - shmem_page_offset;
 
-		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
-		if (IS_ERR(page)) {
-			ret = PTR_ERR(page);
-			goto out;
+		if (obj->pages) {
+			page = obj->pages[offset >> PAGE_SHIFT];
+			release_page = 0;
+		} else {
+			page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
+			if (IS_ERR(page)) {
+				ret = PTR_ERR(page);
+				goto out;
+			}
+			release_page = 1;
 		}
 
 		page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
@@ -741,6 +759,9 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 		}
 
 		hit_slowpath = 1;
+		if (!release_page)
+			page_cache_get(page);
+		release_page = 1;
 
 		mutex_unlock(&dev->struct_mutex);
 
@@ -759,7 +780,8 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 next_page:
 		set_page_dirty(page);
 		mark_page_accessed(page);
-		page_cache_release(page);
+		if (release_page)
+			page_cache_release(page);
 
 		if (ret) {
 			ret = -EFAULT;
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/13] mm: extend prefault helpers to fault in more than PAGE_SIZE
  2011-11-06 19:13 ` Daniel Vetter
@ 2011-11-06 19:13   ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter, linux-mm

drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

I've checked the callsites and they all already clamp size when
calling fault_in_pages_* to the same as for the subsequent
__copy_to|from_user and hence don't rely on the implicit clamping
to PAGE_SIZE.

Also kill a copy&pasted spurious space in both functions while at it.

Cc: linux-mm@kvack.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/linux/pagemap.h |   28 ++++++++++++++++++----------
 1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cfaaa69..689527d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
 static inline int fault_in_pages_writeable(char __user *uaddr, int size)
 {
 	int ret;
+	char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
 		return 0;
@@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
 	 * Writing zeroes into userspace here is OK, because we know that if
 	 * the zero gets there, we'll be overwriting it.
 	 */
-	ret = __put_user(0, uaddr);
+	while (uaddr <= end) {
+		ret = __put_user(0, uaddr);
+		if (ret != 0)
+			return ret;
+		uaddr += PAGE_SIZE;
+	}
 	if (ret == 0) {
-		char __user *end = uaddr + size - 1;
-
 		/*
 		 * If the page was already mapped, this will get a cache miss
 		 * for sure, so try to avoid doing it.
 		 */
-		if (((unsigned long)uaddr & PAGE_MASK) !=
+		if (((unsigned long)uaddr & PAGE_MASK) ==
 				((unsigned long)end & PAGE_MASK))
-		 	ret = __put_user(0, end);
+			ret = __put_user(0, end);
 	}
 	return ret;
 }
@@ -435,17 +439,21 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size)
 {
 	volatile char c;
 	int ret;
+	const char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
 		return 0;
 
-	ret = __get_user(c, uaddr);
+	while (uaddr <= end) {
+		ret = __get_user(c, uaddr);
+		if (ret != 0)
+			return ret;
+		uaddr += PAGE_SIZE;
+	}
 	if (ret == 0) {
-		const char __user *end = uaddr + size - 1;
-
-		if (((unsigned long)uaddr & PAGE_MASK) !=
+		if (((unsigned long)uaddr & PAGE_MASK) ==
 				((unsigned long)end & PAGE_MASK)) {
-		 	ret = __get_user(c, end);
+			ret = __get_user(c, end);
 			(void)c;
 		}
 	}
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/13] mm: extend prefault helpers to fault in more than PAGE_SIZE
@ 2011-11-06 19:13   ` Daniel Vetter
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter, linux-mm

drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

I've checked the callsites and they all already clamp size when
calling fault_in_pages_* to the same as for the subsequent
__copy_to|from_user and hence don't rely on the implicit clamping
to PAGE_SIZE.

Also kill a copy&pasted spurious space in both functions while at it.

Cc: linux-mm@kvack.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/linux/pagemap.h |   28 ++++++++++++++++++----------
 1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cfaaa69..689527d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
 static inline int fault_in_pages_writeable(char __user *uaddr, int size)
 {
 	int ret;
+	char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
 		return 0;
@@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
 	 * Writing zeroes into userspace here is OK, because we know that if
 	 * the zero gets there, we'll be overwriting it.
 	 */
-	ret = __put_user(0, uaddr);
+	while (uaddr <= end) {
+		ret = __put_user(0, uaddr);
+		if (ret != 0)
+			return ret;
+		uaddr += PAGE_SIZE;
+	}
 	if (ret == 0) {
-		char __user *end = uaddr + size - 1;
-
 		/*
 		 * If the page was already mapped, this will get a cache miss
 		 * for sure, so try to avoid doing it.
 		 */
-		if (((unsigned long)uaddr & PAGE_MASK) !=
+		if (((unsigned long)uaddr & PAGE_MASK) ==
 				((unsigned long)end & PAGE_MASK))
-		 	ret = __put_user(0, end);
+			ret = __put_user(0, end);
 	}
 	return ret;
 }
@@ -435,17 +439,21 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size)
 {
 	volatile char c;
 	int ret;
+	const char __user *end = uaddr + size - 1;
 
 	if (unlikely(size == 0))
 		return 0;
 
-	ret = __get_user(c, uaddr);
+	while (uaddr <= end) {
+		ret = __get_user(c, uaddr);
+		if (ret != 0)
+			return ret;
+		uaddr += PAGE_SIZE;
+	}
 	if (ret == 0) {
-		const char __user *end = uaddr + size - 1;
-
-		if (((unsigned long)uaddr & PAGE_MASK) !=
+		if (((unsigned long)uaddr & PAGE_MASK) ==
 				((unsigned long)end & PAGE_MASK)) {
-		 	ret = __get_user(c, end);
+			ret = __get_user(c, end);
 			(void)c;
 		}
 	}
-- 
1.7.6.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/13] drm/i915: drop gtt slowpath
  2011-11-06 19:13 ` Daniel Vetter
                   ` (11 preceding siblings ...)
  (?)
@ 2011-11-06 19:13 ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

With the proper prefault, it's extremely unlikely that we fall back
to the gtt slowpath.

So just kill it and use the shmem_pwrite path as fallback.

To further clean up the code, move the preparatory gem calls into the
respective pwrite functions. This way the gtt_fast->shmem fallback
is much more obvious.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |  183 +++++++--------------------------------
 1 files changed, 30 insertions(+), 153 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5b8a7c5..0a374e1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -523,30 +523,6 @@ fast_user_write(struct io_mapping *mapping,
 	return unwritten;
 }
 
-/* Here's the write path which can sleep for
- * page faults
- */
-
-static inline void
-slow_kernel_write(struct io_mapping *mapping,
-		  loff_t gtt_base, int gtt_offset,
-		  struct page *user_page, int user_offset,
-		  int length)
-{
-	char __iomem *dst_vaddr;
-	char *src_vaddr;
-
-	dst_vaddr = io_mapping_map_wc(mapping, gtt_base);
-	src_vaddr = kmap(user_page);
-
-	memcpy_toio(dst_vaddr + gtt_offset,
-		    src_vaddr + user_offset,
-		    length);
-
-	kunmap(user_page);
-	io_mapping_unmap(dst_vaddr);
-}
-
 /**
  * This is the fast pwrite path, where we copy the data directly from the
  * user into the GTT, uncached.
@@ -561,7 +537,19 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 	ssize_t remain;
 	loff_t offset, page_base;
 	char __user *user_data;
-	int page_offset, page_length;
+	int page_offset, page_length, ret;
+
+	ret = i915_gem_object_pin(obj, 0, true);
+	if (ret)
+		goto out;
+
+	ret = i915_gem_object_set_to_gtt_domain(obj, true);
+	if (ret)
+		goto out_unpin;
+
+	ret = i915_gem_object_put_fence(obj);
+	if (ret)
+		goto out_unpin;
 
 	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
@@ -586,112 +574,19 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 		 * retry in the slow path.
 		 */
 		if (fast_user_write(dev_priv->mm.gtt_mapping, page_base,
-				    page_offset, user_data, page_length))
-			return -EFAULT;
+				    page_offset, user_data, page_length)) {
+			ret = -EFAULT;
+			goto out_unpin;
+		}
 
 		remain -= page_length;
 		user_data += page_length;
 		offset += page_length;
 	}
 
-	return 0;
-}
-
-/**
- * This is the fallback GTT pwrite path, which uses get_user_pages to pin
- * the memory and maps it using kmap_atomic for copying.
- *
- * This code resulted in x11perf -rgb10text consuming about 10% more CPU
- * than using i915_gem_gtt_pwrite_fast on a G45 (32-bit).
- */
-static int
-i915_gem_gtt_pwrite_slow(struct drm_device *dev,
-			 struct drm_i915_gem_object *obj,
-			 struct drm_i915_gem_pwrite *args,
-			 struct drm_file *file)
-{
-	drm_i915_private_t *dev_priv = dev->dev_private;
-	ssize_t remain;
-	loff_t gtt_page_base, offset;
-	loff_t first_data_page, last_data_page, num_pages;
-	loff_t pinned_pages, i;
-	struct page **user_pages;
-	struct mm_struct *mm = current->mm;
-	int gtt_page_offset, data_page_offset, data_page_index, page_length;
-	int ret;
-	uint64_t data_ptr = args->data_ptr;
-
-	remain = args->size;
-
-	/* Pin the user pages containing the data.  We can't fault while
-	 * holding the struct mutex, and all of the pwrite implementations
-	 * want to hold it while dereferencing the user data.
-	 */
-	first_data_page = data_ptr / PAGE_SIZE;
-	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
-	num_pages = last_data_page - first_data_page + 1;
-
-	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
-	if (user_pages == NULL)
-		return -ENOMEM;
-
-	mutex_unlock(&dev->struct_mutex);
-	down_read(&mm->mmap_sem);
-	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
-				      num_pages, 0, 0, user_pages, NULL);
-	up_read(&mm->mmap_sem);
-	mutex_lock(&dev->struct_mutex);
-	if (pinned_pages < num_pages) {
-		ret = -EFAULT;
-		goto out_unpin_pages;
-	}
-
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret)
-		goto out_unpin_pages;
-
-	ret = i915_gem_object_put_fence(obj);
-	if (ret)
-		goto out_unpin_pages;
-
-	offset = obj->gtt_offset + args->offset;
-
-	while (remain > 0) {
-		/* Operation in this page
-		 *
-		 * gtt_page_base = page offset within aperture
-		 * gtt_page_offset = offset within page in aperture
-		 * data_page_index = page number in get_user_pages return
-		 * data_page_offset = offset with data_page_index page.
-		 * page_length = bytes to copy for this page
-		 */
-		gtt_page_base = offset & PAGE_MASK;
-		gtt_page_offset = offset_in_page(offset);
-		data_page_index = data_ptr / PAGE_SIZE - first_data_page;
-		data_page_offset = offset_in_page(data_ptr);
-
-		page_length = remain;
-		if ((gtt_page_offset + page_length) > PAGE_SIZE)
-			page_length = PAGE_SIZE - gtt_page_offset;
-		if ((data_page_offset + page_length) > PAGE_SIZE)
-			page_length = PAGE_SIZE - data_page_offset;
-
-		slow_kernel_write(dev_priv->mm.gtt_mapping,
-				  gtt_page_base, gtt_page_offset,
-				  user_pages[data_page_index],
-				  data_page_offset,
-				  page_length);
-
-		remain -= page_length;
-		offset += page_length;
-		data_ptr += page_length;
-	}
-
-out_unpin_pages:
-	for (i = 0; i < pinned_pages; i++)
-		page_cache_release(user_pages[i]);
-	drm_free_large(user_pages);
-
+out_unpin:
+	i915_gem_object_unpin(obj);
+out:
 	return ret;
 }
 
@@ -710,6 +605,10 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
 	int hit_slowpath = 0;
 	int release_page;
 
+	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
+	if (ret)
+		return ret;
+
 	user_data = (char __user *) (uintptr_t) args->data_ptr;
 	remain = args->size;
 
@@ -854,6 +753,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 
 	trace_i915_gem_object_pwrite(obj, args->offset, args->size);
 
+	ret = -EFAULT;
 	/* We can only do the GTT pwrite on untiled buffers, as otherwise
 	 * it would end up going through the fenced access, and we'll get
 	 * different detiling behavior between reading and writing.
@@ -866,37 +766,14 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	} else if (obj->gtt_space &&
 		   obj->cache_level == I915_CACHE_NONE &&
 		   obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
-		ret = i915_gem_object_pin(obj, 0, true);
-		if (ret)
-			goto out;
-
-		ret = i915_gem_object_set_to_gtt_domain(obj, true);
-		if (ret)
-			goto out_unpin;
-
-		ret = i915_gem_object_put_fence(obj);
-		if (ret)
-			goto out_unpin;
-
 		ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file);
-		if (ret == -EFAULT)
-			ret = i915_gem_gtt_pwrite_slow(dev, obj, args, file);
-
-out_unpin:
-		i915_gem_object_unpin(obj);
-
-		if (ret != -EFAULT)
-			goto out;
-		/* Fall through to the shmfs paths because the gtt paths might
-		 * fail with non-page-backed user pointers (e.g. gtt mappings
-		 * when moving data between textures). */
+		/* Note that the gtt paths might fail with non-page-backed user
+		 * pointers (e.g. gtt mappings when moving data between
+		 * textures). Fallback to the shmem path in that case. */
 	}
 
-	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
-	if (ret)
-		goto out;
-
-	ret = i915_gem_shmem_pwrite(dev, obj, args, file);
+	if (ret == -EFAULT)
+		ret = i915_gem_shmem_pwrite(dev, obj, args, file);
 
 out:
 	drm_gem_object_unreference(&obj->base);
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/13] drm/i915: don't clobber userspace memory before commiting to the pread
  2011-11-06 19:13 ` Daniel Vetter
                   ` (12 preceding siblings ...)
  (?)
@ 2011-11-06 19:14 ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 19:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

The pagemap.h prefault helpers do the prefaulting by simply writing
some data into every page. Hence we should not prefault when we're not
yet commited to to actually writing data to userspace. The problem is
now that
- we can't prefault while holding dev->struct_mutex for we could
  deadlock with our own pagefault handler
- we need to grab dev->struct_mutex before copying to sync up with any
  outsanding gpu writes.

Therefore only prefault when we're dropping the lock the first time in
the pread slowpath - at that point we're committed to the write, don't
wait on the gpu anymore and hence won't return early (with e.g.
-EINTR).

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0a374e1..ebaa0f0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -338,6 +338,7 @@ i915_gem_shmem_pread(struct drm_device *dev,
 	int shmem_page_offset, page_length, ret;
 	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
 	int hit_slowpath = 0;
+	int prefaulted = 0;
 	int needs_clflush = 0;
 	int release_page;
 
@@ -408,6 +409,16 @@ i915_gem_shmem_pread(struct drm_device *dev,
 
 		mutex_unlock(&dev->struct_mutex);
 
+		if (!prefaulted) {
+			ret = fault_in_pages_writeable(user_data, remain);
+			/* Userspace is tricking us, but we've already clobbered
+			 * its pages with the prefault and promised to write the
+			 * data up to the first fault. Hence ignore any errors
+			 * and just continue. */
+			(void)ret;
+			prefaulted = 1;
+		}
+
 		vaddr = kmap(page);
 		if (needs_clflush)
 			drm_clflush_virt_range(vaddr + shmem_page_offset,
@@ -470,11 +481,6 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 		       args->size))
 		return -EFAULT;
 
-	ret = fault_in_pages_writeable((char __user *)(uintptr_t)args->data_ptr,
-				       args->size);
-	if (ret)
-		return -EFAULT;
-
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		return ret;
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached objects
  2011-11-06 19:13 ` [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached objects Daniel Vetter
@ 2011-11-06 21:16     ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2011-11-06 21:16 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1314 bytes --]

On Sun,  6 Nov 2011 20:13:56 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> ~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0048917..8fd175c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -842,6 +842,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>  		ret = i915_gem_phys_pwrite(dev, obj, args, file);
>  		goto out;
>  	} else if (obj->gtt_space &&
> +		   obj->cache_level == I915_CACHE_NONE &&
>  		   obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
>  		ret = i915_gem_object_pin(obj, 0, true);
>  		if (ret)

I still think you want to include a obj->map_and_fenceable test here.
When doing 2D benchmarks the stall incurred here to evict an old object
map the to-be-written object into the mappable GTT causes measureable
pain (obviously on non-LLC architectures).

The series looks good and I'll look at the impact upon 2D for pnv and
snb over the next couple of days. With and without the extra check ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached objects
@ 2011-11-06 21:16     ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2011-11-06 21:16 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter

On Sun,  6 Nov 2011 20:13:56 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> ~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 0048917..8fd175c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -842,6 +842,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>  		ret = i915_gem_phys_pwrite(dev, obj, args, file);
>  		goto out;
>  	} else if (obj->gtt_space &&
> +		   obj->cache_level == I915_CACHE_NONE &&
>  		   obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
>  		ret = i915_gem_object_pin(obj, 0, true);
>  		if (ret)

I still think you want to include a obj->map_and_fenceable test here.
When doing 2D benchmarks the stall incurred here to evict an old object
map the to-be-written object into the mappable GTT causes measureable
pain (obviously on non-LLC architectures).

The series looks good and I'll look at the impact upon 2D for pnv and
snb over the next couple of days. With and without the extra check ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached objects
  2011-11-06 21:16     ` Chris Wilson
  (?)
@ 2011-11-06 22:19     ` Daniel Vetter
  -1 siblings, 0 replies; 32+ messages in thread
From: Daniel Vetter @ 2011-11-06 22:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, linux-kernel, dri-devel

On Sun, Nov 06, 2011 at 09:16:00PM +0000, Chris Wilson wrote:
> On Sun,  6 Nov 2011 20:13:56 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > ~120 µs instead fo ~210 µs to write 1mb on my snb. I like this.
> > 
> > Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 0048917..8fd175c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -842,6 +842,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
> >  		ret = i915_gem_phys_pwrite(dev, obj, args, file);
> >  		goto out;
> >  	} else if (obj->gtt_space &&
> > +		   obj->cache_level == I915_CACHE_NONE &&
> >  		   obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> >  		ret = i915_gem_object_pin(obj, 0, true);
> >  		if (ret)
> 
> I still think you want to include a obj->map_and_fenceable test here.
> When doing 2D benchmarks the stall incurred here to evict an old object
> map the to-be-written object into the mappable GTT causes measureable
> pain (obviously on non-LLC architectures).

That's one of "further tricks". I think we need to also implement the same
in-place clflush trick like for pread, too, to avoid penalizing partial
pwrites too much.

The other trick is to do reloc fixups through llc/clflushed cpu writes.
This way we'd completely eliminate mappable pressure for all untiled
objects. The only thing left would be scanout, tiled gtt uploads and tiled
blts (only on pre-gen4).
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/13] mm: extend prefault helpers to fault in more than PAGE_SIZE
  2011-11-06 19:13   ` Daniel Vetter
  (?)
@ 2011-11-06 22:24     ` Chris Wilson
  -1 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2011-11-06 22:24 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter, linux-mm

On Sun,  6 Nov 2011 20:13:58 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> drm/i915 wants to read/write more than one page in its fastpath
> and hence needs to prefault more than PAGE_SIZE bytes.
> 
> I've checked the callsites and they all already clamp size when
> calling fault_in_pages_* to the same as for the subsequent
> __copy_to|from_user and hence don't rely on the implicit clamping
> to PAGE_SIZE.
> 
> Also kill a copy&pasted spurious space in both functions while at it.
> 
> Cc: linux-mm@kvack.org
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  include/linux/pagemap.h |   28 ++++++++++++++++++----------
>  1 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index cfaaa69..689527d 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
>  static inline int fault_in_pages_writeable(char __user *uaddr, int size)
>  {
>  	int ret;
> +	char __user *end = uaddr + size - 1;
>  
>  	if (unlikely(size == 0))
>  		return 0;
> @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
>  	 * Writing zeroes into userspace here is OK, because we know that if
>  	 * the zero gets there, we'll be overwriting it.
>  	 */
> -	ret = __put_user(0, uaddr);
> +	while (uaddr <= end) {
> +		ret = __put_user(0, uaddr);
> +		if (ret != 0)
> +			return ret;
> +		uaddr += PAGE_SIZE;
> +	}
>  	if (ret == 0) {
> -		char __user *end = uaddr + size - 1;
> -
>  		/*
>  		 * If the page was already mapped, this will get a cache miss
>  		 * for sure, so try to avoid doing it.
>  		 */
> -		if (((unsigned long)uaddr & PAGE_MASK) !=
> +		if (((unsigned long)uaddr & PAGE_MASK) ==
>  				((unsigned long)end & PAGE_MASK))
> -		 	ret = __put_user(0, end);
> +			ret = __put_user(0, end);
>  	}
>  	return ret;

You leave these functions in a worse mess by introducing a false
compiler warning about an uninitialized ret by the now redundant test
against zero, a do{}while loop would be clearer that the original
behaviour is merely extended upon. And please replace the open-coded
offset_in_page().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/13] mm: extend prefault helpers to fault in more than PAGE_SIZE
@ 2011-11-06 22:24     ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2011-11-06 22:24 UTC (permalink / raw)
  To: Daniel Vetter, intel-gfx; +Cc: linux-kernel, dri-devel, linux-mm

On Sun,  6 Nov 2011 20:13:58 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> drm/i915 wants to read/write more than one page in its fastpath
> and hence needs to prefault more than PAGE_SIZE bytes.
> 
> I've checked the callsites and they all already clamp size when
> calling fault_in_pages_* to the same as for the subsequent
> __copy_to|from_user and hence don't rely on the implicit clamping
> to PAGE_SIZE.
> 
> Also kill a copy&pasted spurious space in both functions while at it.
> 
> Cc: linux-mm@kvack.org
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  include/linux/pagemap.h |   28 ++++++++++++++++++----------
>  1 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index cfaaa69..689527d 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
>  static inline int fault_in_pages_writeable(char __user *uaddr, int size)
>  {
>  	int ret;
> +	char __user *end = uaddr + size - 1;
>  
>  	if (unlikely(size == 0))
>  		return 0;
> @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
>  	 * Writing zeroes into userspace here is OK, because we know that if
>  	 * the zero gets there, we'll be overwriting it.
>  	 */
> -	ret = __put_user(0, uaddr);
> +	while (uaddr <= end) {
> +		ret = __put_user(0, uaddr);
> +		if (ret != 0)
> +			return ret;
> +		uaddr += PAGE_SIZE;
> +	}
>  	if (ret == 0) {
> -		char __user *end = uaddr + size - 1;
> -
>  		/*
>  		 * If the page was already mapped, this will get a cache miss
>  		 * for sure, so try to avoid doing it.
>  		 */
> -		if (((unsigned long)uaddr & PAGE_MASK) !=
> +		if (((unsigned long)uaddr & PAGE_MASK) ==
>  				((unsigned long)end & PAGE_MASK))
> -		 	ret = __put_user(0, end);
> +			ret = __put_user(0, end);
>  	}
>  	return ret;

You leave these functions in a worse mess by introducing a false
compiler warning about an uninitialized ret by the now redundant test
against zero, a do{}while loop would be clearer that the original
behaviour is merely extended upon. And please replace the open-coded
offset_in_page().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/13] mm: extend prefault helpers to fault in more than PAGE_SIZE
@ 2011-11-06 22:24     ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2011-11-06 22:24 UTC (permalink / raw)
  To: intel-gfx; +Cc: linux-kernel, dri-devel, Daniel Vetter, linux-mm

On Sun,  6 Nov 2011 20:13:58 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> drm/i915 wants to read/write more than one page in its fastpath
> and hence needs to prefault more than PAGE_SIZE bytes.
> 
> I've checked the callsites and they all already clamp size when
> calling fault_in_pages_* to the same as for the subsequent
> __copy_to|from_user and hence don't rely on the implicit clamping
> to PAGE_SIZE.
> 
> Also kill a copy&pasted spurious space in both functions while at it.
> 
> Cc: linux-mm@kvack.org
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  include/linux/pagemap.h |   28 ++++++++++++++++++----------
>  1 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index cfaaa69..689527d 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
>  static inline int fault_in_pages_writeable(char __user *uaddr, int size)
>  {
>  	int ret;
> +	char __user *end = uaddr + size - 1;
>  
>  	if (unlikely(size == 0))
>  		return 0;
> @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
>  	 * Writing zeroes into userspace here is OK, because we know that if
>  	 * the zero gets there, we'll be overwriting it.
>  	 */
> -	ret = __put_user(0, uaddr);
> +	while (uaddr <= end) {
> +		ret = __put_user(0, uaddr);
> +		if (ret != 0)
> +			return ret;
> +		uaddr += PAGE_SIZE;
> +	}
>  	if (ret == 0) {
> -		char __user *end = uaddr + size - 1;
> -
>  		/*
>  		 * If the page was already mapped, this will get a cache miss
>  		 * for sure, so try to avoid doing it.
>  		 */
> -		if (((unsigned long)uaddr & PAGE_MASK) !=
> +		if (((unsigned long)uaddr & PAGE_MASK) ==
>  				((unsigned long)end & PAGE_MASK))
> -		 	ret = __put_user(0, end);
> +			ret = __put_user(0, end);
>  	}
>  	return ret;

You leave these functions in a worse mess by introducing a false
compiler warning about an uninitialized ret by the now redundant test
against zero, a do{}while loop would be clearer that the original
behaviour is merely extended upon. And please replace the open-coded
offset_in_page().
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 01/13] drm/i915: fall through pwrite_gtt_slow to the shmem slow path
  2011-11-06 19:13   ` Daniel Vetter
  (?)
@ 2011-11-21  3:09   ` Ben Widawsky
  2011-11-21 10:20     ` Chris Wilson
  -1 siblings, 1 reply; 32+ messages in thread
From: Ben Widawsky @ 2011-11-21  3:09 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, linux-kernel, dri-devel

On Sun,  6 Nov 2011 20:13:48 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> The gtt_pwrite slowpath grabs the userspace memory with
> get_user_pages. This will not work for non-page backed memory, like a
> gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
> -EFAULT in the gtt paths.
> 
> Now the shmem paths have exactly the same problem, but this way we
> only need to rearrange the code in one write path.
> 
> v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.
> 
> Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

It would be nice if there was some way to notify users that pwriting a
gtt mmapped address can be really damn slow. That's also the one
behavior change this patch introduces. It's possible that some SW was
expecting to get a, "fast path" and would deal with the -EFAULT if it
didn't get it.

Presumably subsequent patches in this series fix this up further, so
instead of figuring out all the failure conditions pwrite can cause,
let's just go with
Acked-by: Ben Widawsky <ben@bwidawsk.net>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
  2011-11-06 19:13   ` Daniel Vetter
@ 2011-11-21  5:56     ` Ben Widawsky
  -1 siblings, 0 replies; 32+ messages in thread
From: Ben Widawsky @ 2011-11-21  5:56 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, linux-kernel, dri-devel

On Sun,  6 Nov 2011 20:13:49 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> ... instead of get_user_pages, because that fails on non page-backed
> user addresses like e.g. a gtt mapping of a bo.
> 
> To get there essentially copy the vfs read path into pagecache. We
> can't call that right away because we have to take care of bit17
> swizzling. To not deadlock with our own pagefault handler we need
> to completely drop struct_mutex, reducing the atomicty-guarantees
> of our userspace abi. Implications for racing with other gem ioctl:
> 
> - execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
>   already the risk of the pwrite call not being atomic, no degration.
> - read/write access to mmaps: already fully racy, no degration.
> - set_tiling: Calling set_tiling while reading/writing is already
>   pretty much undefined, now it just got a bit worse. set_tiling is
>   only called by libdrm on unused/new bos, so no problem.
> - set_domain: When changing to the gtt domain while copying (without any
>   read/write access, e.g. for synchronization), we might leave unflushed
>   data in the cpu caches. The clflush_object at the end of pwrite_slow
>   takes care of this problem.
> - truncating of purgeable objects: the shmem_read_mapping_page call could
>   reinstate backing storage for truncated objects. The check at the end
>   of pwrite_slow takes care of this.
> 
> v2:
> - add missing intel_gtt_chipset_flush
> - add __ to copy_from_user_swizzled as suggest by Chris Wilson.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem.c |  126 ++++++++++++++++++++-------------------
>  1 files changed, 64 insertions(+), 62 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6fa24bc..f9efebb 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -58,6 +58,7 @@ static void i915_gem_free_object_tail(struct drm_i915_gem_object *obj);
>  
>  static int i915_gem_inactive_shrink(struct shrinker *shrinker,
>  				    struct shrink_control *sc);
> +static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);
>  
>  /* some bookkeeping */
>  static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv,
> @@ -385,6 +386,32 @@ i915_gem_shmem_pread_fast(struct drm_device *dev,
>  	return 0;
>  }
>  
> +static inline int
> +__copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset,
> +			  const char *cpu_vaddr,
> +			  int length)
> +{
> +	int ret, cpu_offset = 0;
> +
> +	while (length > 0) {
> +		int cacheline_end = ALIGN(gpu_offset + 1, 64);
> +		int this_length = min(cacheline_end - gpu_offset, length);
> +		int swizzled_gpu_offset = gpu_offset ^ 64;
> +
> +		ret = __copy_from_user(gpu_vaddr + swizzled_gpu_offset,
> +				       cpu_vaddr + cpu_offset,
> +				       this_length);
> +		if (ret)
> +			return ret + length;
> +
> +		cpu_offset += this_length;
> +		gpu_offset += this_length;
> +		length -= this_length;
> +	}
> +
> +	return 0;
> +}
> +

Bikeshed, but I would much prefer a #define for the swizzle
bit/cacheline size.

>  /**
>   * This is the fallback shmem pread path, which allocates temporary storage
>   * in kernel space to copy_to_user into outside of the struct_mutex, so we
> @@ -841,71 +868,36 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
>  			   struct drm_file *file)
>  {
>  	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
> -	struct mm_struct *mm = current->mm;
> -	struct page **user_pages;
>  	ssize_t remain;
> -	loff_t offset, pinned_pages, i;
> -	loff_t first_data_page, last_data_page, num_pages;
> -	int shmem_page_offset;
> -	int data_page_index,  data_page_offset;
> -	int page_length;
> -	int ret;
> -	uint64_t data_ptr = args->data_ptr;
> -	int do_bit17_swizzling;
> +	loff_t offset;
> +	char __user *user_data;
> +	int shmem_page_offset, page_length, ret;
> +	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
>  
> +	user_data = (char __user *) (uintptr_t) args->data_ptr;
>  	remain = args->size;
>  
> -	/* Pin the user pages containing the data.  We can't fault while
> -	 * holding the struct mutex, and all of the pwrite implementations
> -	 * want to hold it while dereferencing the user data.
> -	 */
> -	first_data_page = data_ptr / PAGE_SIZE;
> -	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
> -	num_pages = last_data_page - first_data_page + 1;
> -
> -	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
> -	if (user_pages == NULL)
> -		return -ENOMEM;
> -
> -	mutex_unlock(&dev->struct_mutex);
> -	down_read(&mm->mmap_sem);
> -	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
> -				      num_pages, 0, 0, user_pages, NULL);
> -	up_read(&mm->mmap_sem);
> -	mutex_lock(&dev->struct_mutex);
> -	if (pinned_pages < num_pages) {
> -		ret = -EFAULT;
> -		goto out;
> -	}
> -
> -	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
> -	if (ret)
> -		goto out;
> -
> -	do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
> +	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
>  
>  	offset = args->offset;
>  	obj->dirty = 1;
>  
> +	mutex_unlock(&dev->struct_mutex);
> +
>  	while (remain > 0) {
>  		struct page *page;
> +		char *vaddr;
>  
>  		/* Operation in this page
>  		 *
>  		 * shmem_page_offset = offset within page in shmem file
> -		 * data_page_index = page number in get_user_pages return
> -		 * data_page_offset = offset with data_page_index page.
>  		 * page_length = bytes to copy for this page
>  		 */
>  		shmem_page_offset = offset_in_page(offset);
> -		data_page_index = data_ptr / PAGE_SIZE - first_data_page;
> -		data_page_offset = offset_in_page(data_ptr);
>  
>  		page_length = remain;
>  		if ((shmem_page_offset + page_length) > PAGE_SIZE)
>  			page_length = PAGE_SIZE - shmem_page_offset;
> -		if ((data_page_offset + page_length) > PAGE_SIZE)
> -			page_length = PAGE_SIZE - data_page_offset;
>  
>  		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
>  		if (IS_ERR(page)) {
> @@ -913,34 +905,44 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
>  			goto out;
>  		}
>  
> -		if (do_bit17_swizzling) {
> -			slow_shmem_bit17_copy(page,
> -					      shmem_page_offset,
> -					      user_pages[data_page_index],
> -					      data_page_offset,
> -					      page_length,
> -					      0);
> -		} else {
> -			slow_shmem_copy(page,
> -					shmem_page_offset,
> -					user_pages[data_page_index],
> -					data_page_offset,
> -					page_length);
> -		}
> +		page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
> +
> +		vaddr = kmap(page);
> +		if (obj_do_bit17_swizzling && page_do_bit17_swizzling)
> +			ret = __copy_from_user_swizzled(vaddr, shmem_page_offset,
> +							user_data,
> +							page_length);
> +		else
> +			ret = __copy_from_user(vaddr + shmem_page_offset,
> +					       user_data,
> +					       page_length);
> +		kunmap(page);
>  
>  		set_page_dirty(page);
>  		mark_page_accessed(page);
>  		page_cache_release(page);
>  
> +		if (ret) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
> +
>  		remain -= page_length;
> -		data_ptr += page_length;
> +		user_data += page_length;
>  		offset += page_length;
>  	}
>  
>  out:
> -	for (i = 0; i < pinned_pages; i++)
> -		page_cache_release(user_pages[i]);
> -	drm_free_large(user_pages);
> +	mutex_lock(&dev->struct_mutex);
> +	/* Fixup: Kill any reinstated backing storage pages */
> +	if (obj->madv == __I915_MADV_PURGED)
> +		i915_gem_object_truncate(obj);
> +	/* and flush dirty cachelines in case the object isn't in the cpu write
> +	 * domain anymore. */
> +	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> +		i915_gem_clflush_object(obj);
> +		intel_gtt_chipset_flush();
> +	}
>  
>  	return ret;
>  }

I must be missing something obvious here...
Can you explain how this can possibly be considered safe without holding
struct_mutex?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
@ 2011-11-21  5:56     ` Ben Widawsky
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Widawsky @ 2011-11-21  5:56 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, linux-kernel, dri-devel

On Sun,  6 Nov 2011 20:13:49 +0100
Daniel Vetter <daniel.vetter@ffwll.ch> wrote:

> ... instead of get_user_pages, because that fails on non page-backed
> user addresses like e.g. a gtt mapping of a bo.
> 
> To get there essentially copy the vfs read path into pagecache. We
> can't call that right away because we have to take care of bit17
> swizzling. To not deadlock with our own pagefault handler we need
> to completely drop struct_mutex, reducing the atomicty-guarantees
> of our userspace abi. Implications for racing with other gem ioctl:
> 
> - execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
>   already the risk of the pwrite call not being atomic, no degration.
> - read/write access to mmaps: already fully racy, no degration.
> - set_tiling: Calling set_tiling while reading/writing is already
>   pretty much undefined, now it just got a bit worse. set_tiling is
>   only called by libdrm on unused/new bos, so no problem.
> - set_domain: When changing to the gtt domain while copying (without any
>   read/write access, e.g. for synchronization), we might leave unflushed
>   data in the cpu caches. The clflush_object at the end of pwrite_slow
>   takes care of this problem.
> - truncating of purgeable objects: the shmem_read_mapping_page call could
>   reinstate backing storage for truncated objects. The check at the end
>   of pwrite_slow takes care of this.
> 
> v2:
> - add missing intel_gtt_chipset_flush
> - add __ to copy_from_user_swizzled as suggest by Chris Wilson.
> 
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/i915/i915_gem.c |  126 ++++++++++++++++++++-------------------
>  1 files changed, 64 insertions(+), 62 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 6fa24bc..f9efebb 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -58,6 +58,7 @@ static void i915_gem_free_object_tail(struct drm_i915_gem_object *obj);
>  
>  static int i915_gem_inactive_shrink(struct shrinker *shrinker,
>  				    struct shrink_control *sc);
> +static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);
>  
>  /* some bookkeeping */
>  static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv,
> @@ -385,6 +386,32 @@ i915_gem_shmem_pread_fast(struct drm_device *dev,
>  	return 0;
>  }
>  
> +static inline int
> +__copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset,
> +			  const char *cpu_vaddr,
> +			  int length)
> +{
> +	int ret, cpu_offset = 0;
> +
> +	while (length > 0) {
> +		int cacheline_end = ALIGN(gpu_offset + 1, 64);
> +		int this_length = min(cacheline_end - gpu_offset, length);
> +		int swizzled_gpu_offset = gpu_offset ^ 64;
> +
> +		ret = __copy_from_user(gpu_vaddr + swizzled_gpu_offset,
> +				       cpu_vaddr + cpu_offset,
> +				       this_length);
> +		if (ret)
> +			return ret + length;
> +
> +		cpu_offset += this_length;
> +		gpu_offset += this_length;
> +		length -= this_length;
> +	}
> +
> +	return 0;
> +}
> +

Bikeshed, but I would much prefer a #define for the swizzle
bit/cacheline size.

>  /**
>   * This is the fallback shmem pread path, which allocates temporary storage
>   * in kernel space to copy_to_user into outside of the struct_mutex, so we
> @@ -841,71 +868,36 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
>  			   struct drm_file *file)
>  {
>  	struct address_space *mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
> -	struct mm_struct *mm = current->mm;
> -	struct page **user_pages;
>  	ssize_t remain;
> -	loff_t offset, pinned_pages, i;
> -	loff_t first_data_page, last_data_page, num_pages;
> -	int shmem_page_offset;
> -	int data_page_index,  data_page_offset;
> -	int page_length;
> -	int ret;
> -	uint64_t data_ptr = args->data_ptr;
> -	int do_bit17_swizzling;
> +	loff_t offset;
> +	char __user *user_data;
> +	int shmem_page_offset, page_length, ret;
> +	int obj_do_bit17_swizzling, page_do_bit17_swizzling;
>  
> +	user_data = (char __user *) (uintptr_t) args->data_ptr;
>  	remain = args->size;
>  
> -	/* Pin the user pages containing the data.  We can't fault while
> -	 * holding the struct mutex, and all of the pwrite implementations
> -	 * want to hold it while dereferencing the user data.
> -	 */
> -	first_data_page = data_ptr / PAGE_SIZE;
> -	last_data_page = (data_ptr + args->size - 1) / PAGE_SIZE;
> -	num_pages = last_data_page - first_data_page + 1;
> -
> -	user_pages = drm_malloc_ab(num_pages, sizeof(struct page *));
> -	if (user_pages == NULL)
> -		return -ENOMEM;
> -
> -	mutex_unlock(&dev->struct_mutex);
> -	down_read(&mm->mmap_sem);
> -	pinned_pages = get_user_pages(current, mm, (uintptr_t)args->data_ptr,
> -				      num_pages, 0, 0, user_pages, NULL);
> -	up_read(&mm->mmap_sem);
> -	mutex_lock(&dev->struct_mutex);
> -	if (pinned_pages < num_pages) {
> -		ret = -EFAULT;
> -		goto out;
> -	}
> -
> -	ret = i915_gem_object_set_to_cpu_domain(obj, 1);
> -	if (ret)
> -		goto out;
> -
> -	do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
> +	obj_do_bit17_swizzling = i915_gem_object_needs_bit17_swizzle(obj);
>  
>  	offset = args->offset;
>  	obj->dirty = 1;
>  
> +	mutex_unlock(&dev->struct_mutex);
> +
>  	while (remain > 0) {
>  		struct page *page;
> +		char *vaddr;
>  
>  		/* Operation in this page
>  		 *
>  		 * shmem_page_offset = offset within page in shmem file
> -		 * data_page_index = page number in get_user_pages return
> -		 * data_page_offset = offset with data_page_index page.
>  		 * page_length = bytes to copy for this page
>  		 */
>  		shmem_page_offset = offset_in_page(offset);
> -		data_page_index = data_ptr / PAGE_SIZE - first_data_page;
> -		data_page_offset = offset_in_page(data_ptr);
>  
>  		page_length = remain;
>  		if ((shmem_page_offset + page_length) > PAGE_SIZE)
>  			page_length = PAGE_SIZE - shmem_page_offset;
> -		if ((data_page_offset + page_length) > PAGE_SIZE)
> -			page_length = PAGE_SIZE - data_page_offset;
>  
>  		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
>  		if (IS_ERR(page)) {
> @@ -913,34 +905,44 @@ i915_gem_shmem_pwrite_slow(struct drm_device *dev,
>  			goto out;
>  		}
>  
> -		if (do_bit17_swizzling) {
> -			slow_shmem_bit17_copy(page,
> -					      shmem_page_offset,
> -					      user_pages[data_page_index],
> -					      data_page_offset,
> -					      page_length,
> -					      0);
> -		} else {
> -			slow_shmem_copy(page,
> -					shmem_page_offset,
> -					user_pages[data_page_index],
> -					data_page_offset,
> -					page_length);
> -		}
> +		page_do_bit17_swizzling = (page_to_phys(page) & (1 << 17)) == 0;
> +
> +		vaddr = kmap(page);
> +		if (obj_do_bit17_swizzling && page_do_bit17_swizzling)
> +			ret = __copy_from_user_swizzled(vaddr, shmem_page_offset,
> +							user_data,
> +							page_length);
> +		else
> +			ret = __copy_from_user(vaddr + shmem_page_offset,
> +					       user_data,
> +					       page_length);
> +		kunmap(page);
>  
>  		set_page_dirty(page);
>  		mark_page_accessed(page);
>  		page_cache_release(page);
>  
> +		if (ret) {
> +			ret = -EFAULT;
> +			goto out;
> +		}
> +
>  		remain -= page_length;
> -		data_ptr += page_length;
> +		user_data += page_length;
>  		offset += page_length;
>  	}
>  
>  out:
> -	for (i = 0; i < pinned_pages; i++)
> -		page_cache_release(user_pages[i]);
> -	drm_free_large(user_pages);
> +	mutex_lock(&dev->struct_mutex);
> +	/* Fixup: Kill any reinstated backing storage pages */
> +	if (obj->madv == __I915_MADV_PURGED)
> +		i915_gem_object_truncate(obj);
> +	/* and flush dirty cachelines in case the object isn't in the cpu write
> +	 * domain anymore. */
> +	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> +		i915_gem_clflush_object(obj);
> +		intel_gtt_chipset_flush();
> +	}
>  
>  	return ret;
>  }

I must be missing something obvious here...
Can you explain how this can possibly be considered safe without holding
struct_mutex?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 01/13] drm/i915: fall through pwrite_gtt_slow to the shmem slow path
  2011-11-21  3:09   ` [Intel-gfx] " Ben Widawsky
@ 2011-11-21 10:20     ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2011-11-21 10:20 UTC (permalink / raw)
  To: Ben Widawsky, Daniel Vetter; +Cc: intel-gfx, linux-kernel, dri-devel

On Sun, 20 Nov 2011 19:09:18 -0800, Ben Widawsky <ben@bwidawsk.net> wrote:
> On Sun,  6 Nov 2011 20:13:48 +0100
> Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> 
> > The gtt_pwrite slowpath grabs the userspace memory with
> > get_user_pages. This will not work for non-page backed memory, like a
> > gtt mmapped gem object. Hence fall throuh to the shmem paths if we hit
> > -EFAULT in the gtt paths.
> > 
> > Now the shmem paths have exactly the same problem, but this way we
> > only need to rearrange the code in one write path.
> > 
> > v2: v1 accidentaly falls back to shmem pwrite for phys objects. Fixed.
> > 
> > Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> It would be nice if there was some way to notify users that pwriting a
> gtt mmapped address can be really damn slow. That's also the one
> behavior change this patch introduces. It's possible that some SW was
> expecting to get a, "fast path" and would deal with the -EFAULT if it
> didn't get it.

The behaviour change is intentional. Before this patch we would
deadlock...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
  2011-11-21  5:56     ` Ben Widawsky
  (?)
@ 2011-11-21 16:02     ` Daniel Vetter
  2011-11-21 17:55       ` Ben Widawsky
  -1 siblings, 1 reply; 32+ messages in thread
From: Daniel Vetter @ 2011-11-21 16:02 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx, linux-kernel, dri-devel

On Sun, Nov 20, 2011 at 09:56:32PM -0800, Ben Widawsky wrote:
[snip the patch]
> Bikeshed, but I would much prefer a #define for the swizzle
> bit/cacheline size.

I've looked at this stuff way too long, so I'm biased, but 64 = cacheline
= dram fetch size = 1 << 64 feels about as natural for me as 4096 =
PAGE_SIZE ...

[snip the patch]

> I must be missing something obvious here...
> Can you explain how this can possibly be considered safe without holding
> struct_mutex?

That's the reason why the commit msg goes through every case and explains
why I think it's safe. The large thing here is that we need to drop the
mutex when calling copy_*_user (at least in the non-atomic slow-paths)
because otherwise we might deadlock with our own pagefault handler.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
  2011-11-21 16:02     ` [Intel-gfx] " Daniel Vetter
@ 2011-11-21 17:55       ` Ben Widawsky
  2011-11-21 18:43         ` Ben Widawsky
  0 siblings, 1 reply; 32+ messages in thread
From: Ben Widawsky @ 2011-11-21 17:55 UTC (permalink / raw)
  To: intel-gfx, linux-kernel, dri-devel

On Mon, Nov 21, 2011 at 05:02:44PM +0100, Daniel Vetter wrote:
> On Sun, Nov 20, 2011 at 09:56:32PM -0800, Ben Widawsky wrote:
> [snip the patch]
> > Bikeshed, but I would much prefer a #define for the swizzle
> > bit/cacheline size.
> 
> I've looked at this stuff way too long, so I'm biased, but 64 = cacheline
> = dram fetch size = 1 << 64 feels about as natural for me as 4096 =
> PAGE_SIZE ...
> 
> [snip the patch]
> 
> > I must be missing something obvious here...
> > Can you explain how this can possibly be considered safe without holding
> > struct_mutex?
> 
> That's the reason why the commit msg goes through every case and explains
> why I think it's safe. The large thing here is that we need to drop the
> mutex when calling copy_*_user (at least in the non-atomic slow-paths)
> because otherwise we might deadlock with our own pagefault handler.
> -Daniel

The part about dropping struct_mutex is clear to me.

The bit that I'm missing, I just don't see how you guarantee the page
you're reading from (assuming it's a GTT mmapped page) doesn't get moved
from out under you. For instance if the page isn't there when you do the
initial __copy_from_user, it will get faulted in... cool - but what if
somewhere in that loop the object gets swapped out and something else is
put in it's place? How is that prevented?

Sorry if it's a stupid question, I just don't get it.
Ben

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
  2011-11-21 17:55       ` Ben Widawsky
@ 2011-11-21 18:43         ` Ben Widawsky
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Widawsky @ 2011-11-21 18:43 UTC (permalink / raw)
  To: intel-gfx, linux-kernel, dri-devel

On Mon, Nov 21, 2011 at 09:55:07AM -0800, Ben Widawsky wrote:
> On Mon, Nov 21, 2011 at 05:02:44PM +0100, Daniel Vetter wrote:
> > On Sun, Nov 20, 2011 at 09:56:32PM -0800, Ben Widawsky wrote:
> > [snip the patch]
> > > Bikeshed, but I would much prefer a #define for the swizzle
> > > bit/cacheline size.
> > 
> > I've looked at this stuff way too long, so I'm biased, but 64 = cacheline
> > = dram fetch size = 1 << 64 feels about as natural for me as 4096 =
> > PAGE_SIZE ...
> > 
> > [snip the patch]
> > 
> > > I must be missing something obvious here...
> > > Can you explain how this can possibly be considered safe without holding
> > > struct_mutex?
> > 
> > That's the reason why the commit msg goes through every case and explains
> > why I think it's safe. The large thing here is that we need to drop the
> > mutex when calling copy_*_user (at least in the non-atomic slow-paths)
> > because otherwise we might deadlock with our own pagefault handler.
> > -Daniel
> 
> The part about dropping struct_mutex is clear to me.
> 
> The bit that I'm missing, I just don't see how you guarantee the page
> you're reading from (assuming it's a GTT mmapped page) doesn't get moved
> from out under you. For instance if the page isn't there when you do the
> initial __copy_from_user, it will get faulted in... cool - but what if
> somewhere in that loop the object gets swapped out and something else is
> put in it's place? How is that prevented?
> 
> Sorry if it's a stupid question, I just don't get it.
> Ben

Okay, I got what I was missing from IRC. Anytime the object is unmapped
we shoot down the userspace mapping. I couldn't find it in the code, but
it turned out to be right in front of me.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 06/13] drm: add helper to clflush a virtual address range
  2011-11-06 19:13 ` [PATCH 06/13] drm: add helper to clflush a virtual address range Daniel Vetter
@ 2011-11-21 19:46   ` Ben Widawsky
  0 siblings, 0 replies; 32+ messages in thread
From: Ben Widawsky @ 2011-11-21 19:46 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, linux-kernel, dri-devel

On Sun, Nov 06, 2011 at 08:13:53PM +0100, Daniel Vetter wrote:
> Useful when the page is already mapped to copy date in/out.
> 
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  drivers/gpu/drm/drm_cache.c |   23 +++++++++++++++++++++++
>  include/drm/drmP.h          |    1 +
>  2 files changed, 24 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
> index 0e3bd5b..502771a 100644
> --- a/drivers/gpu/drm/drm_cache.c
> +++ b/drivers/gpu/drm/drm_cache.c
> @@ -97,3 +97,26 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages)
>  #endif
>  }
>  EXPORT_SYMBOL(drm_clflush_pages);
> +
> +void
> +drm_clflush_virt_range(char *addr, unsigned long length)
> +{
> +#if defined(CONFIG_X86)
> +	if (cpu_has_clflush) {
> +		char *end = addr + length;
> +		mb();
> +		for (; addr < end; addr += boot_cpu_data.x86_clflush_size)
> +			clflush(addr);
> +		clflush(end - 1);
> +		mb();
> +		return;
> +	}
> +
> +	if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0)
> +		printk(KERN_ERR "Timed out waiting for cache flush.\n");
> +#else
> +	printk(KERN_ERR "Architecture has no drm_cache.c support\n");
> +	WARN_ON_ONCE(1);
> +#endif
> +}
> +EXPORT_SYMBOL(drm_clflush_virt_range);

I'd feel more comfortable with a BUG_ON(irqs_disabled()); before the
IPI... though I don't even know how many platforms that actually
pertains to (if any).

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2011-11-21 19:47 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-06 19:13 [PATCH 00/13] pwrite/pread rework Daniel Vetter
2011-11-06 19:13 ` Daniel Vetter
2011-11-06 19:13 ` [PATCH 01/13] drm/i915: fall through pwrite_gtt_slow to the shmem slow path Daniel Vetter
2011-11-06 19:13   ` Daniel Vetter
2011-11-21  3:09   ` [Intel-gfx] " Ben Widawsky
2011-11-21 10:20     ` Chris Wilson
2011-11-06 19:13 ` [PATCH 02/13] drm/i915: rewrite shmem_pwrite_slow to use copy_from_user Daniel Vetter
2011-11-06 19:13   ` Daniel Vetter
2011-11-21  5:56   ` [Intel-gfx] " Ben Widawsky
2011-11-21  5:56     ` Ben Widawsky
2011-11-21 16:02     ` [Intel-gfx] " Daniel Vetter
2011-11-21 17:55       ` Ben Widawsky
2011-11-21 18:43         ` Ben Widawsky
2011-11-06 19:13 ` [PATCH 03/13] drm/i915: rewrite shmem_pread_slow to use copy_to_user Daniel Vetter
2011-11-06 19:13 ` [PATCH 04/13] drm/i915: merge shmem_pwrite slow&fast-path Daniel Vetter
2011-11-06 19:13 ` [PATCH 05/13] drm/i915: merge shmem_pread slow&fast-path Daniel Vetter
2011-11-06 19:13 ` [PATCH 06/13] drm: add helper to clflush a virtual address range Daniel Vetter
2011-11-21 19:46   ` [Intel-gfx] " Ben Widawsky
2011-11-06 19:13 ` [PATCH 07/13] drm/i915: move clflushing into shmem_pread Daniel Vetter
2011-11-06 19:13 ` [PATCH 08/13] drm/i915: kill ranged cpu read domain support Daniel Vetter
2011-11-06 19:13 ` [PATCH 09/13] drm/i915: don't use gtt_pwrite on LLC cached objects Daniel Vetter
2011-11-06 21:16   ` Chris Wilson
2011-11-06 21:16     ` Chris Wilson
2011-11-06 22:19     ` Daniel Vetter
2011-11-06 19:13 ` [PATCH 10/13] drm/i915: don't call shmem_read_mapping unnecessarily Daniel Vetter
2011-11-06 19:13 ` [PATCH 11/13] mm: extend prefault helpers to fault in more than PAGE_SIZE Daniel Vetter
2011-11-06 19:13   ` Daniel Vetter
2011-11-06 22:24   ` Chris Wilson
2011-11-06 22:24     ` Chris Wilson
2011-11-06 22:24     ` Chris Wilson
2011-11-06 19:13 ` [PATCH 12/13] drm/i915: drop gtt slowpath Daniel Vetter
2011-11-06 19:14 ` [PATCH 13/13] drm/i915: don't clobber userspace memory before commiting to the pread Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.