linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm
@ 2011-06-06  4:21 Hugh Dickins
  2011-06-06  4:23 ` [PATCH 1/14] mm: move vmtruncate_range to truncate.c Hugh Dickins
                   ` (13 more replies)
  0 siblings, 14 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Chris Wilson, Keith Packard, Thomas Hellstrom,
	Dave Airlie, linux-kernel, linux-mm

Here's v2 patchset for mmotm, based on 30-rc1.  Nothing exciting,
mostly cleanup, preparation for what will probably be two more
patchsets coming over the next few weeks, first simplifying tmpfs
by getting rid of its ->readpage (give it a splice instead), then
getting rid of its peculiar swap index (use radix_tree instead).

The ordering here is somewhat illogical, arranged in the hope that
at least the first four can get into 30-rc, which would simplify
the dependencies between linux-next and mmotm.

Changes since last week's v1:
- removed the original 1/14, which was adding cleancache_flush_inode()
  into invalidate_mapping_pages(): I stand by that patch, but feedback
  from Dan Magenheimer and Chris Mason implies that cleancache has a
  bigger problem in this area (flushing too much or too little, unable
  to distinguish truncate from evict, potential issue with O_DIRECT)
  which we shall discuss and deal with separately.
- incorporated feedback from Christoph Hellwig, mainly electing for
  an explicit call to shmem_truncate_range() from drm/i915, which
  will also help if we replace ->truncate_range by ->fallocate later;
  so a new 2/14 which moves shmem function prototypes into shmem_fs.h.

1,2,3,4 affect the interface between tmpfs and drm very slightly.
Once they're in 30-rc, drm maintainers could take 5,6,7,8 out of
mmotm and into linux-next (or even 30-rc).

 1/14 mm: move vmtruncate_range to truncate.c
 2/14 mm: move shmem prototypes to shmem_fs.h
 3/14 tmpfs: take control of its truncate_range
 4/14 tmpfs: add shmem_read_mapping_page_gfp
 5/14 drm/ttm: use shmem_read_mapping_page
 6/14 drm/i915: use shmem_read_mapping_page
 7/14 drm/i915: use shmem_truncate_range
 8/14 drm/i915: more struct_mutex locking
 9/14 mm: cleanup descriptions of filler arg
10/14 mm: truncate functions are in truncate.c
11/14 mm: tidy vmtruncate_range and related functions
12/14 mm: consistent truncate and invalidate loops
13/14 mm: pincer in truncate_inode_pages_range
14/14 tmpfs: no need to use i_lock

 drivers/gpu/drm/drm_gem.c            |    1 
 drivers/gpu/drm/i915/i915_dma.c      |    3 
 drivers/gpu/drm/i915/i915_gem.c      |   38 ++---
 drivers/gpu/drm/i915/intel_overlay.c |    5 
 drivers/gpu/drm/ttm/ttm_tt.c         |    5 
 include/linux/mm.h                   |    3 
 include/linux/pagemap.h              |   12 -
 include/linux/shmem_fs.h             |   21 +++
 include/linux/swap.h                 |   10 -
 mm/filemap.c                         |   14 +-
 mm/memcontrol.c                      |    1 
 mm/memory.c                          |   24 ---
 mm/shmem.c                           |   88 +++++++++----
 mm/swapfile.c                        |    2 
 mm/truncate.c                        |  159 +++++++++++++------------
 15 files changed, 206 insertions(+), 180 deletions(-)

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/14] mm: move vmtruncate_range to truncate.c
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
@ 2011-06-06  4:23 ` Hugh Dickins
  2011-06-06  4:24 ` [PATCH 2/14] mm: move shmem prototypes to shmem_fs.h Hugh Dickins
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Hellwig, linux-kernel, linux-mm

You would expect to find vmtruncate_range() next to vmtruncate()
in mm/truncate.c: move it there.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Christoph Hellwig <hch@infradead.org>
---
 mm/memory.c   |   24 ------------------------
 mm/truncate.c |   24 ++++++++++++++++++++++++
 2 files changed, 24 insertions(+), 24 deletions(-)

--- linux.orig/mm/memory.c	2011-05-29 18:42:37.441882660 -0700
+++ linux/mm/memory.c	2011-06-05 14:26:36.383176813 -0700
@@ -2796,30 +2796,6 @@ void unmap_mapping_range(struct address_
 }
 EXPORT_SYMBOL(unmap_mapping_range);
 
-int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end)
-{
-	struct address_space *mapping = inode->i_mapping;
-
-	/*
-	 * If the underlying filesystem is not going to provide
-	 * a way to truncate a range of blocks (punch a hole) -
-	 * we should return failure right now.
-	 */
-	if (!inode->i_op->truncate_range)
-		return -ENOSYS;
-
-	mutex_lock(&inode->i_mutex);
-	down_write(&inode->i_alloc_sem);
-	unmap_mapping_range(mapping, offset, (end - offset), 1);
-	truncate_inode_pages_range(mapping, offset, end);
-	unmap_mapping_range(mapping, offset, (end - offset), 1);
-	inode->i_op->truncate_range(inode, offset, end);
-	up_write(&inode->i_alloc_sem);
-	mutex_unlock(&inode->i_mutex);
-
-	return 0;
-}
-
 /*
  * We enter with non-exclusive mmap_sem (to exclude vma changes,
  * but allow concurrent faults), and pte mapped but not yet locked.
--- linux.orig/mm/truncate.c	2011-05-29 18:42:37.477882839 -0700
+++ linux/mm/truncate.c	2011-06-05 17:16:33.369740944 -0700
@@ -603,3 +603,27 @@ int vmtruncate(struct inode *inode, loff
 	return 0;
 }
 EXPORT_SYMBOL(vmtruncate);
+
+int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end)
+{
+	struct address_space *mapping = inode->i_mapping;
+
+	/*
+	 * If the underlying filesystem is not going to provide
+	 * a way to truncate a range of blocks (punch a hole) -
+	 * we should return failure right now.
+	 */
+	if (!inode->i_op->truncate_range)
+		return -ENOSYS;
+
+	mutex_lock(&inode->i_mutex);
+	down_write(&inode->i_alloc_sem);
+	unmap_mapping_range(mapping, offset, (end - offset), 1);
+	truncate_inode_pages_range(mapping, offset, end);
+	unmap_mapping_range(mapping, offset, (end - offset), 1);
+	inode->i_op->truncate_range(inode, offset, end);
+	up_write(&inode->i_alloc_sem);
+	mutex_unlock(&inode->i_mutex);
+
+	return 0;
+}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/14] mm: move shmem prototypes to shmem_fs.h
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
  2011-06-06  4:23 ` [PATCH 1/14] mm: move vmtruncate_range to truncate.c Hugh Dickins
@ 2011-06-06  4:24 ` Hugh Dickins
  2011-06-06  4:26 ` [PATCH 3/14] tmpfs: take control of its truncate_range Hugh Dickins
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:24 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Hellwig, linux-kernel, linux-mm

Before adding any more global entry points into shmem.c, gather such
prototypes into shmem_fs.h.  Remove mm's own declarations from swap.h,
but for now leave the ones in mm.h: because shmem_file_setup() and
shmem_zero_setup() are called from various places, and we should not
force other subsystems to update immediately.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
---
 include/linux/shmem_fs.h |   17 +++++++++++++++++
 include/linux/swap.h     |   10 ----------
 mm/memcontrol.c          |    1 +
 mm/swapfile.c            |    2 +-
 4 files changed, 19 insertions(+), 11 deletions(-)

--- linux.orig/include/linux/shmem_fs.h	2011-06-05 17:16:33.313740660 -0700
+++ linux/include/linux/shmem_fs.h	2011-06-05 17:38:03.100136227 -0700
@@ -5,6 +5,13 @@
 #include <linux/mempolicy.h>
 #include <linux/percpu_counter.h>
 
+struct page;
+struct file;
+struct inode;
+struct super_block;
+struct user_struct;
+struct vm_area_struct;
+
 /* inode in-kernel data */
 
 #define SHMEM_NR_DIRECT 16
@@ -45,7 +52,17 @@ static inline struct shmem_inode_info *S
 	return container_of(inode, struct shmem_inode_info, vfs_inode);
 }
 
+/*
+ * Functions in mm/shmem.c called directly from elsewhere:
+ */
 extern int init_tmpfs(void);
 extern int shmem_fill_super(struct super_block *sb, void *data, int silent);
+extern struct file *shmem_file_setup(const char *name,
+					loff_t size, unsigned long flags);
+extern int shmem_zero_setup(struct vm_area_struct *);
+extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
+extern int shmem_unuse(swp_entry_t entry, struct page *page);
+extern void mem_cgroup_get_shmem_target(struct inode *inode, pgoff_t pgoff,
+					struct page **pagep, swp_entry_t *ent);
 
 #endif
--- linux.orig/include/linux/swap.h	2011-06-05 17:16:33.317740677 -0700
+++ linux/include/linux/swap.h	2011-06-05 17:25:19.748351090 -0700
@@ -300,16 +300,6 @@ static inline void scan_unevictable_unre
 extern int kswapd_run(int nid);
 extern void kswapd_stop(int nid);
 
-#ifdef CONFIG_MMU
-/* linux/mm/shmem.c */
-extern int shmem_unuse(swp_entry_t entry, struct page *page);
-#endif /* CONFIG_MMU */
-
-#ifdef CONFIG_CGROUP_MEM_RES_CTLR
-extern void mem_cgroup_get_shmem_target(struct inode *inode, pgoff_t pgoff,
-					struct page **pagep, swp_entry_t *ent);
-#endif
-
 #ifdef CONFIG_SWAP
 /* linux/mm/page_io.c */
 extern int swap_readpage(struct page *);
--- linux.orig/mm/memcontrol.c	2011-06-05 17:16:33.317740677 -0700
+++ linux/mm/memcontrol.c	2011-06-05 17:25:19.748351090 -0700
@@ -35,6 +35,7 @@
 #include <linux/limits.h>
 #include <linux/mutex.h>
 #include <linux/rbtree.h>
+#include <linux/shmem_fs.h>
 #include <linux/slab.h>
 #include <linux/swap.h>
 #include <linux/swapops.h>
--- linux.orig/mm/swapfile.c	2011-06-05 17:16:33.317740677 -0700
+++ linux/mm/swapfile.c	2011-06-05 17:25:19.748351090 -0700
@@ -14,7 +14,7 @@
 #include <linux/vmalloc.h>
 #include <linux/pagemap.h>
 #include <linux/namei.h>
-#include <linux/shm.h>
+#include <linux/shmem_fs.h>
 #include <linux/blkdev.h>
 #include <linux/random.h>
 #include <linux/writeback.h>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/14] tmpfs: take control of its truncate_range
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
  2011-06-06  4:23 ` [PATCH 1/14] mm: move vmtruncate_range to truncate.c Hugh Dickins
  2011-06-06  4:24 ` [PATCH 2/14] mm: move shmem prototypes to shmem_fs.h Hugh Dickins
@ 2011-06-06  4:26 ` Hugh Dickins
  2011-06-06  4:27 ` [PATCH 4/14] tmpfs: add shmem_read_mapping_page_gfp Hugh Dickins
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Hellwig, linux-kernel, linux-mm

2.6.35's new truncate convention gave tmpfs the opportunity to control
its file truncation, no longer enforced from outside by vmtruncate().
We shall want to build upon that, to handle pagecache and swap together.

Slightly redefine the ->truncate_range interface: let it now be called
between the unmap_mapping_range()s, with the filesystem responsible for
doing the truncate_inode_pages_range() from it - just as the filesystem
is nowadays responsible for doing that from its ->setattr.

Let's rename shmem_notify_change() to shmem_setattr().  Instead of
calling the generic truncate_setsize(), bring that code in so we can
call shmem_truncate_range() - which will later be updated to perform
its own variant of truncate_inode_pages_range().

Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
now that the COW's unmap_mapping_range() comes after ->truncate_range,
there is no need to call it a third time.

Export shmem_truncate_range() and add it to the list in shmem_fs.h,
so that i915_gem_object_truncate() can call it explicitly in future;
get this patch in first, then update drm/i915 once this is available
(until then, i915 will just be doing the truncate_inode_pages() twice).

Though introduced five years ago, no other filesystem is implementing
->truncate_range, and its only other user is madvise(,,MADV_REMOVE):
we expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly,
whereupon ->truncate_range can be removed from inode_operations -
shmem_truncate_range() will help i915 across that transition too.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
---

 include/linux/shmem_fs.h |    1 
 mm/shmem.c               |   51 +++++++++++++++++++++----------------
 mm/truncate.c            |    4 +-
 3 files changed, 32 insertions(+), 24 deletions(-)

--- linux.orig/mm/shmem.c	2011-06-05 17:16:33.369740944 -0700
+++ linux/mm/shmem.c	2011-06-05 17:44:02.293916853 -0700
@@ -539,7 +539,7 @@ static void shmem_free_pages(struct list
 	} while (next);
 }
 
-static void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end)
+void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	unsigned long idx;
@@ -562,6 +562,8 @@ static void shmem_truncate_range(struct
 	spinlock_t *punch_lock;
 	unsigned long upper_limit;
 
+	truncate_inode_pages_range(inode->i_mapping, start, end);
+
 	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 	idx = (start + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (idx >= info->next_index)
@@ -738,16 +740,8 @@ done2:
 		 * lowered next_index.  Also, though shmem_getpage checks
 		 * i_size before adding to cache, no recheck after: so fix the
 		 * narrow window there too.
-		 *
-		 * Recalling truncate_inode_pages_range and unmap_mapping_range
-		 * every time for punch_hole (which never got a chance to clear
-		 * SHMEM_PAGEIN at the start of vmtruncate_range) is expensive,
-		 * yet hardly ever necessary: try to optimize them out later.
 		 */
 		truncate_inode_pages_range(inode->i_mapping, start, end);
-		if (punch_hole)
-			unmap_mapping_range(inode->i_mapping, start,
-							end - start, 1);
 	}
 
 	spin_lock(&info->lock);
@@ -766,22 +760,23 @@ done2:
 		shmem_free_pages(pages_to_free.next);
 	}
 }
+EXPORT_SYMBOL_GPL(shmem_truncate_range);
 
-static int shmem_notify_change(struct dentry *dentry, struct iattr *attr)
+static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 {
 	struct inode *inode = dentry->d_inode;
-	loff_t newsize = attr->ia_size;
 	int error;
 
 	error = inode_change_ok(inode, attr);
 	if (error)
 		return error;
 
-	if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)
-					&& newsize != inode->i_size) {
+	if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)) {
+		loff_t oldsize = inode->i_size;
+		loff_t newsize = attr->ia_size;
 		struct page *page = NULL;
 
-		if (newsize < inode->i_size) {
+		if (newsize < oldsize) {
 			/*
 			 * If truncating down to a partial page, then
 			 * if that page is already allocated, hold it
@@ -810,12 +805,19 @@ static int shmem_notify_change(struct de
 				spin_unlock(&info->lock);
 			}
 		}
-
-		/* XXX(truncate): truncate_setsize should be called last */
-		truncate_setsize(inode, newsize);
+		if (newsize != oldsize) {
+			i_size_write(inode, newsize);
+			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
+		}
+		if (newsize < oldsize) {
+			loff_t holebegin = round_up(newsize, PAGE_SIZE);
+			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
+			shmem_truncate_range(inode, newsize, (loff_t)-1);
+			/* unmap again to remove racily COWed private pages */
+			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
+		}
 		if (page)
 			page_cache_release(page);
-		shmem_truncate_range(inode, newsize, (loff_t)-1);
 	}
 
 	setattr_copy(inode, attr);
@@ -832,7 +834,6 @@ static void shmem_evict_inode(struct ino
 	struct shmem_xattr *xattr, *nxattr;
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
-		truncate_inode_pages(inode->i_mapping, 0);
 		shmem_unacct_size(info->flags, inode->i_size);
 		inode->i_size = 0;
 		shmem_truncate_range(inode, 0, (loff_t)-1);
@@ -2706,7 +2707,7 @@ static const struct file_operations shme
 };
 
 static const struct inode_operations shmem_inode_operations = {
-	.setattr	= shmem_notify_change,
+	.setattr	= shmem_setattr,
 	.truncate_range	= shmem_truncate_range,
 #ifdef CONFIG_TMPFS_XATTR
 	.setxattr	= shmem_setxattr,
@@ -2739,7 +2740,7 @@ static const struct inode_operations shm
 	.removexattr	= shmem_removexattr,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
-	.setattr	= shmem_notify_change,
+	.setattr	= shmem_setattr,
 	.check_acl	= generic_check_acl,
 #endif
 };
@@ -2752,7 +2753,7 @@ static const struct inode_operations shm
 	.removexattr	= shmem_removexattr,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
-	.setattr	= shmem_notify_change,
+	.setattr	= shmem_setattr,
 	.check_acl	= generic_check_acl,
 #endif
 };
@@ -2908,6 +2909,12 @@ int shmem_lock(struct file *file, int lo
 	return 0;
 }
 
+void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end)
+{
+	truncate_inode_pages_range(inode->i_mapping, start, end);
+}
+EXPORT_SYMBOL_GPL(shmem_truncate_range);
+
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR
 /**
  * mem_cgroup_get_shmem_target - find a page or entry assigned to the shmem file
--- linux.orig/mm/truncate.c	2011-06-05 17:16:33.369740944 -0700
+++ linux/mm/truncate.c	2011-06-05 17:42:31.341466507 -0700
@@ -619,9 +619,9 @@ int vmtruncate_range(struct inode *inode
 	mutex_lock(&inode->i_mutex);
 	down_write(&inode->i_alloc_sem);
 	unmap_mapping_range(mapping, offset, (end - offset), 1);
-	truncate_inode_pages_range(mapping, offset, end);
-	unmap_mapping_range(mapping, offset, (end - offset), 1);
 	inode->i_op->truncate_range(inode, offset, end);
+	/* unmap again to remove racily COWed private pages */
+	unmap_mapping_range(mapping, offset, (end - offset), 1);
 	up_write(&inode->i_alloc_sem);
 	mutex_unlock(&inode->i_mutex);
 
--- linux.orig/include/linux/shmem_fs.h	2011-06-05 17:38:03.000000000 -0700
+++ linux/include/linux/shmem_fs.h	2011-06-05 17:50:02.783702754 -0700
@@ -61,6 +61,7 @@ extern struct file *shmem_file_setup(con
 					loff_t size, unsigned long flags);
 extern int shmem_zero_setup(struct vm_area_struct *);
 extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
+extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 extern void mem_cgroup_get_shmem_target(struct inode *inode, pgoff_t pgoff,
 					struct page **pagep, swp_entry_t *ent);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/14] tmpfs: add shmem_read_mapping_page_gfp
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (2 preceding siblings ...)
  2011-06-06  4:26 ` [PATCH 3/14] tmpfs: take control of its truncate_range Hugh Dickins
@ 2011-06-06  4:27 ` Hugh Dickins
  2011-06-06  4:29 ` [PATCH 5/14] drm/ttm: use shmem_read_mapping_page Hugh Dickins
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Hellwig, linux-kernel, linux-mm

Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp()
is unsuited to tmpfs, because it inserts a page into pagecache before
calling the filesystem's ->readpage: tmpfs may have pages in swapcache
which only it knows how to locate and switch to filecache.

At present tmpfs provides a ->readpage method, and copes with this by
copying pages; but soon we can simplify it by removing its ->readpage.
Provide shmem_read_mapping_page_gfp() now, ready for that transition,

Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h,
with shmem_read_mapping_page() inline for the common mapping_gfp case.

(shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp?  Generally
the read_mapping_page functions use the mapping's ->readpage, and the
read_cache_page functions use the supplied filler, so I think
read_cache_page_gfp was slightly misnamed.)

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
---
 include/linux/shmem_fs.h |   17 ++++++++++-------
 mm/shmem.c               |   23 +++++++++++++++++++++++
 2 files changed, 33 insertions(+), 7 deletions(-)

--- linux.orig/mm/shmem.c	2011-06-05 17:44:02.293916853 -0700
+++ linux/mm/shmem.c	2011-06-05 17:53:42.948796800 -0700
@@ -3035,3 +3035,26 @@ int shmem_zero_setup(struct vm_area_stru
 	vma->vm_flags |= VM_CAN_NONLINEAR;
 	return 0;
 }
+
+/**
+ * shmem_read_mapping_page_gfp - read into page cache, using specified page allocation flags.
+ * @mapping:	the page's address_space
+ * @index:	the page index
+ * @gfp:	the page allocator flags to use if allocating
+ *
+ * This behaves as a tmpfs "read_cache_page_gfp(mapping, index, gfp)",
+ * with any new page allocations done using the specified allocation flags.
+ * But read_cache_page_gfp() uses the ->readpage() method: which does not
+ * suit tmpfs, since it may have pages in swapcache, and needs to find those
+ * for itself; although drivers/gpu/drm i915 and ttm rely upon this support.
+ *
+ * Provide a stub for those callers to start using now, then later
+ * flesh it out to call shmem_getpage() with additional gfp mask, when
+ * shmem_file_splice_read() is added and shmem_readpage() is removed.
+ */
+struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
+					 pgoff_t index, gfp_t gfp)
+{
+	return read_cache_page_gfp(mapping, index, gfp);
+}
+EXPORT_SYMBOL_GPL(shmem_read_mapping_page_gfp);
--- linux.orig/include/linux/shmem_fs.h	2011-06-05 17:50:02.000000000 -0700
+++ linux/include/linux/shmem_fs.h	2011-06-05 18:11:16.426020691 -0700
@@ -3,15 +3,9 @@
 
 #include <linux/swap.h>
 #include <linux/mempolicy.h>
+#include <linux/pagemap.h>
 #include <linux/percpu_counter.h>
 
-struct page;
-struct file;
-struct inode;
-struct super_block;
-struct user_struct;
-struct vm_area_struct;
-
 /* inode in-kernel data */
 
 #define SHMEM_NR_DIRECT 16
@@ -61,9 +55,18 @@ extern struct file *shmem_file_setup(con
 					loff_t size, unsigned long flags);
 extern int shmem_zero_setup(struct vm_area_struct *);
 extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
+extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
+					pgoff_t index, gfp_t gfp_mask);
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 extern void mem_cgroup_get_shmem_target(struct inode *inode, pgoff_t pgoff,
 					struct page **pagep, swp_entry_t *ent);
 
+static inline struct page *shmem_read_mapping_page(
+				struct address_space *mapping, pgoff_t index)
+{
+	return shmem_read_mapping_page_gfp(mapping, index,
+					mapping_gfp_mask(mapping));
+}
+
 #endif

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 5/14] drm/ttm: use shmem_read_mapping_page
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (3 preceding siblings ...)
  2011-06-06  4:27 ` [PATCH 4/14] tmpfs: add shmem_read_mapping_page_gfp Hugh Dickins
@ 2011-06-06  4:29 ` Hugh Dickins
  2011-06-06  4:31 ` [PATCH 6/14] drm/i915: " Hugh Dickins
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Thomas Hellstrom, Dave Airlie, linux-kernel, linux-mm

Soon tmpfs will stop supporting ->readpage and read_mapping_page():
once "tmpfs: add shmem_read_mapping_page_gfp" has been applied,
this patch can be applied to ease the transition.

ttm_tt_swapin() and ttm_tt_swapout() use shmem_read_mapping_page()
in place of read_mapping_page(), since their swap_space has been
created with shmem_file_setup().

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- linux.orig/drivers/gpu/drm/ttm/ttm_tt.c	2011-05-18 21:06:34.000000000 -0700
+++ linux/drivers/gpu/drm/ttm/ttm_tt.c	2011-06-05 18:24:09.501854133 -0700
@@ -31,6 +31,7 @@
 #include <linux/sched.h>
 #include <linux/highmem.h>
 #include <linux/pagemap.h>
+#include <linux/shmem_fs.h>
 #include <linux/file.h>
 #include <linux/swap.h>
 #include <linux/slab.h>
@@ -484,7 +485,7 @@ static int ttm_tt_swapin(struct ttm_tt *
 	swap_space = swap_storage->f_path.dentry->d_inode->i_mapping;
 
 	for (i = 0; i < ttm->num_pages; ++i) {
-		from_page = read_mapping_page(swap_space, i, NULL);
+		from_page = shmem_read_mapping_page(swap_space, i);
 		if (IS_ERR(from_page)) {
 			ret = PTR_ERR(from_page);
 			goto out_err;
@@ -557,7 +558,7 @@ int ttm_tt_swapout(struct ttm_tt *ttm, s
 		from_page = ttm->pages[i];
 		if (unlikely(from_page == NULL))
 			continue;
-		to_page = read_mapping_page(swap_space, i, NULL);
+		to_page = shmem_read_mapping_page(swap_space, i);
 		if (unlikely(IS_ERR(to_page))) {
 			ret = PTR_ERR(to_page);
 			goto out_err;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 6/14] drm/i915: use shmem_read_mapping_page
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (4 preceding siblings ...)
  2011-06-06  4:29 ` [PATCH 5/14] drm/ttm: use shmem_read_mapping_page Hugh Dickins
@ 2011-06-06  4:31 ` Hugh Dickins
  2011-06-06  4:32 ` [PATCH 7/14] drm/i915: use shmem_truncate_range Hugh Dickins
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Chris Wilson, Keith Packard, Dave Airlie,
	linux-kernel, linux-mm

Soon tmpfs will stop supporting ->readpage and read_cache_page_gfp():
once "tmpfs: add shmem_read_mapping_page_gfp" has been applied,
this patch can be applied to ease the transition.

Make i915_gem_object_get_pages_gtt() use shmem_read_mapping_page_gfp()
in the one place it's needed; elsewhere use shmem_read_mapping_page(),
with the mapping's gfp_mask properly initialized.

Forget about __GFP_COLD: since tmpfs initializes its pages with memset,
asking for a cold page is counter-productive.

Include linux/shmem_fs.h also in drm_gem.c: with shmem_file_setup() now
declared there too, we shall remove the prototype from linux/mm.h later.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/drm_gem.c       |    1 
 drivers/gpu/drm/i915/i915_gem.c |   31 +++++++++++++-----------------
 2 files changed, 15 insertions(+), 17 deletions(-)

--- linux.orig/drivers/gpu/drm/i915/i915_gem.c	2011-05-29 18:42:31.789854626 -0700
+++ linux/drivers/gpu/drm/i915/i915_gem.c	2011-06-05 18:37:13.589743574 -0700
@@ -31,6 +31,7 @@
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
+#include <linux/shmem_fs.h>
 #include <linux/slab.h>
 #include <linux/swap.h>
 #include <linux/pci.h>
@@ -359,8 +360,7 @@ i915_gem_shmem_pread_fast(struct drm_dev
 		if ((page_offset + remain) > PAGE_SIZE)
 			page_length = PAGE_SIZE - page_offset;
 
-		page = read_cache_page_gfp(mapping, offset >> PAGE_SHIFT,
-					   GFP_HIGHUSER | __GFP_RECLAIMABLE);
+		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
 		if (IS_ERR(page))
 			return PTR_ERR(page);
 
@@ -463,8 +463,7 @@ i915_gem_shmem_pread_slow(struct drm_dev
 		if ((data_page_offset + page_length) > PAGE_SIZE)
 			page_length = PAGE_SIZE - data_page_offset;
 
-		page = read_cache_page_gfp(mapping, offset >> PAGE_SHIFT,
-					   GFP_HIGHUSER | __GFP_RECLAIMABLE);
+		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
 		if (IS_ERR(page))
 			return PTR_ERR(page);
 
@@ -796,8 +795,7 @@ i915_gem_shmem_pwrite_fast(struct drm_de
 		if ((page_offset + remain) > PAGE_SIZE)
 			page_length = PAGE_SIZE - page_offset;
 
-		page = read_cache_page_gfp(mapping, offset >> PAGE_SHIFT,
-					   GFP_HIGHUSER | __GFP_RECLAIMABLE);
+		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
 		if (IS_ERR(page))
 			return PTR_ERR(page);
 
@@ -906,8 +904,7 @@ i915_gem_shmem_pwrite_slow(struct drm_de
 		if ((data_page_offset + page_length) > PAGE_SIZE)
 			page_length = PAGE_SIZE - data_page_offset;
 
-		page = read_cache_page_gfp(mapping, offset >> PAGE_SHIFT,
-					   GFP_HIGHUSER | __GFP_RECLAIMABLE);
+		page = shmem_read_mapping_page(mapping, offset >> PAGE_SHIFT);
 		if (IS_ERR(page)) {
 			ret = PTR_ERR(page);
 			goto out;
@@ -1556,12 +1553,10 @@ i915_gem_object_get_pages_gtt(struct drm
 
 	inode = obj->base.filp->f_path.dentry->d_inode;
 	mapping = inode->i_mapping;
+	gfpmask |= mapping_gfp_mask(mapping);
+
 	for (i = 0; i < page_count; i++) {
-		page = read_cache_page_gfp(mapping, i,
-					   GFP_HIGHUSER |
-					   __GFP_COLD |
-					   __GFP_RECLAIMABLE |
-					   gfpmask);
+		page = shmem_read_mapping_page_gfp(mapping, i, gfpmask);
 		if (IS_ERR(page))
 			goto err_pages;
 
@@ -3565,6 +3560,7 @@ struct drm_i915_gem_object *i915_gem_all
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *obj;
+	struct address_space *mapping;
 
 	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
 	if (obj == NULL)
@@ -3575,6 +3571,9 @@ struct drm_i915_gem_object *i915_gem_all
 		return NULL;
 	}
 
+	mapping = obj->base.filp->f_path.dentry->d_inode->i_mapping;
+	mapping_set_gfp_mask(mapping, GFP_HIGHUSER | __GFP_RECLAIMABLE);
+
 	i915_gem_info_add_obj(dev_priv, size);
 
 	obj->base.write_domain = I915_GEM_DOMAIN_CPU;
@@ -3950,8 +3949,7 @@ void i915_gem_detach_phys_object(struct
 
 	page_count = obj->base.size / PAGE_SIZE;
 	for (i = 0; i < page_count; i++) {
-		struct page *page = read_cache_page_gfp(mapping, i,
-							GFP_HIGHUSER | __GFP_RECLAIMABLE);
+		struct page *page = shmem_read_mapping_page(mapping, i);
 		if (!IS_ERR(page)) {
 			char *dst = kmap_atomic(page);
 			memcpy(dst, vaddr + i*PAGE_SIZE, PAGE_SIZE);
@@ -4012,8 +4010,7 @@ i915_gem_attach_phys_object(struct drm_d
 		struct page *page;
 		char *dst, *src;
 
-		page = read_cache_page_gfp(mapping, i,
-					   GFP_HIGHUSER | __GFP_RECLAIMABLE);
+		page = shmem_read_mapping_page(mapping, i);
 		if (IS_ERR(page))
 			return PTR_ERR(page);
 
--- linux.orig/drivers/gpu/drm/drm_gem.c	2011-05-18 21:06:34.000000000 -0700
+++ linux/drivers/gpu/drm/drm_gem.c	2011-06-05 18:38:18.874063036 -0700
@@ -34,6 +34,7 @@
 #include <linux/module.h>
 #include <linux/mman.h>
 #include <linux/pagemap.h>
+#include <linux/shmem_fs.h>
 #include "drmP.h"
 
 /** @file drm_gem.c

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 7/14] drm/i915: use shmem_truncate_range
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (5 preceding siblings ...)
  2011-06-06  4:31 ` [PATCH 6/14] drm/i915: " Hugh Dickins
@ 2011-06-06  4:32 ` Hugh Dickins
  2011-06-06  4:34 ` [PATCH 8/14] drm/i915: more struct_mutex locking Hugh Dickins
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:32 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, Chris Wilson, Keith Packard, Dave Airlie,
	linux-kernel, linux-mm

The interface to ->truncate_range is changing very slightly:
once "tmpfs: take control of its truncate_range" has been applied,
this can be applied.  For now there is only a slight inefficiency
while this remains unapplied, but it will soon become essential
for managing shmem's use of swap.

Change i915_gem_object_truncate() to use shmem_truncate_range()
directly: which should also spare i915 later change if we switch
from inode_operations->truncate_range to file_operations->fallocate.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/i915/i915_gem.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

--- linux.orig/drivers/gpu/drm/i915/i915_gem.c	2011-06-05 18:37:13.589743574 -0700
+++ linux/drivers/gpu/drm/i915/i915_gem.c	2011-06-05 18:44:59.064050179 -0700
@@ -1694,13 +1694,10 @@ i915_gem_object_truncate(struct drm_i915
 	/* Our goal here is to return as much of the memory as
 	 * is possible back to the system as we are called from OOM.
 	 * To do this we must instruct the shmfs to drop all of its
-	 * backing pages, *now*. Here we mirror the actions taken
-	 * when by shmem_delete_inode() to release the backing store.
+	 * backing pages, *now*.
 	 */
 	inode = obj->base.filp->f_path.dentry->d_inode;
-	truncate_inode_pages(inode->i_mapping, 0);
-	if (inode->i_op->truncate_range)
-		inode->i_op->truncate_range(inode, 0, (loff_t)-1);
+	shmem_truncate_range(inode, 0, (loff_t)-1);
 
 	obj->madv = __I915_MADV_PURGED;
 }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 8/14] drm/i915: more struct_mutex locking
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (6 preceding siblings ...)
  2011-06-06  4:32 ` [PATCH 7/14] drm/i915: use shmem_truncate_range Hugh Dickins
@ 2011-06-06  4:34 ` Hugh Dickins
  2011-06-06  4:35 ` [PATCH 9/14] mm: cleanup descriptions of filler arg Hugh Dickins
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chris Wilson, Keith Packard, linux-kernel, linux-mm

When auditing the locking in i915_gem.c (for a prospective change which
I then abandoned), I noticed two places where struct_mutex is not held
across GEM object manipulations that would usually require it.  Since
one is in initial setup and the other in driver unload, I'm guessing
the mutex is not required for either; but post a patch in case it is.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
---
 drivers/gpu/drm/i915/i915_dma.c      |    3 +--
 drivers/gpu/drm/i915/intel_overlay.c |    5 +++++
 2 files changed, 6 insertions(+), 2 deletions(-)

--- linux.orig/drivers/gpu/drm/i915/i915_dma.c	2011-05-29 18:42:31.781854594 -0700
+++ linux/drivers/gpu/drm/i915/i915_dma.c	2011-06-05 18:49:14.757317293 -0700
@@ -2182,9 +2182,8 @@ int i915_driver_unload(struct drm_device
 		/* Flush any outstanding unpin_work. */
 		flush_workqueue(dev_priv->wq);
 
-		i915_gem_free_all_phys_object(dev);
-
 		mutex_lock(&dev->struct_mutex);
+		i915_gem_free_all_phys_object(dev);
 		i915_gem_cleanup_ringbuffer(dev);
 		mutex_unlock(&dev->struct_mutex);
 		if (I915_HAS_FBC(dev) && i915_powersave)
--- linux.orig/drivers/gpu/drm/i915/intel_overlay.c	2011-05-18 21:06:34.000000000 -0700
+++ linux/drivers/gpu/drm/i915/intel_overlay.c	2011-06-05 18:49:14.761317509 -0700
@@ -1416,6 +1416,8 @@ void intel_setup_overlay(struct drm_devi
 		goto out_free;
 	overlay->reg_bo = reg_bo;
 
+	mutex_lock(&dev->struct_mutex);
+
 	if (OVERLAY_NEEDS_PHYSICAL(dev)) {
 		ret = i915_gem_attach_phys_object(dev, reg_bo,
 						  I915_GEM_PHYS_OVERLAY_REGS,
@@ -1440,6 +1442,8 @@ void intel_setup_overlay(struct drm_devi
                 }
 	}
 
+	mutex_unlock(&dev->struct_mutex);
+
 	/* init all values */
 	overlay->color_key = 0x0101fe;
 	overlay->brightness = -19;
@@ -1463,6 +1467,7 @@ void intel_setup_overlay(struct drm_devi
 out_unpin_bo:
 	i915_gem_object_unpin(reg_bo);
 out_free_bo:
+	mutex_unlock(&dev->struct_mutex);
 	drm_gem_object_unreference(&reg_bo->base);
 out_free:
 	kfree(overlay);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 9/14] mm: cleanup descriptions of filler arg
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (7 preceding siblings ...)
  2011-06-06  4:34 ` [PATCH 8/14] drm/i915: more struct_mutex locking Hugh Dickins
@ 2011-06-06  4:35 ` Hugh Dickins
  2011-06-06  4:36 ` [PATCH 10/14] mm: truncate functions are in truncate.c Hugh Dickins
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

The often-NULL data arg to read_cache_page() and read_mapping_page()
functions is misdescribed as "destination for read data": no, it's the
first arg to the filler function, often struct file * to ->readpage().

Satisfy checkpatch.pl on those filler prototypes, and tidy up the
declarations in linux/pagemap.h.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/pagemap.h |   12 +++++-------
 mm/filemap.c            |   12 ++++++------
 2 files changed, 11 insertions(+), 13 deletions(-)

--- linux.orig/mm/filemap.c	2011-05-29 18:42:37.421882556 -0700
+++ linux/mm/filemap.c	2011-06-05 18:50:34.473713659 -0700
@@ -1795,7 +1795,7 @@ EXPORT_SYMBOL(generic_file_readonly_mmap
 
 static struct page *__read_cache_page(struct address_space *mapping,
 				pgoff_t index,
-				int (*filler)(void *,struct page*),
+				int (*filler)(void *, struct page *),
 				void *data,
 				gfp_t gfp)
 {
@@ -1826,7 +1826,7 @@ repeat:
 
 static struct page *do_read_cache_page(struct address_space *mapping,
 				pgoff_t index,
-				int (*filler)(void *,struct page*),
+				int (*filler)(void *, struct page *),
 				void *data,
 				gfp_t gfp)
 
@@ -1866,7 +1866,7 @@ out:
  * @mapping:	the page's address_space
  * @index:	the page index
  * @filler:	function to perform the read
- * @data:	destination for read data
+ * @data:	first arg to filler(data, page) function, often left as NULL
  *
  * Same as read_cache_page, but don't wait for page to become unlocked
  * after submitting it to the filler.
@@ -1878,7 +1878,7 @@ out:
  */
 struct page *read_cache_page_async(struct address_space *mapping,
 				pgoff_t index,
-				int (*filler)(void *,struct page*),
+				int (*filler)(void *, struct page *),
 				void *data)
 {
 	return do_read_cache_page(mapping, index, filler, data, mapping_gfp_mask(mapping));
@@ -1926,7 +1926,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
  * @mapping:	the page's address_space
  * @index:	the page index
  * @filler:	function to perform the read
- * @data:	destination for read data
+ * @data:	first arg to filler(data, page) function, often left as NULL
  *
  * Read into the page cache. If a page already exists, and PageUptodate() is
  * not set, try to fill the page then wait for it to become unlocked.
@@ -1935,7 +1935,7 @@ EXPORT_SYMBOL(read_cache_page_gfp);
  */
 struct page *read_cache_page(struct address_space *mapping,
 				pgoff_t index,
-				int (*filler)(void *,struct page*),
+				int (*filler)(void *, struct page *),
 				void *data)
 {
 	return wait_on_page_read(read_cache_page_async(mapping, index, filler, data));
--- linux.orig/include/linux/pagemap.h	2011-06-05 17:56:03.000000000 -0700
+++ linux/include/linux/pagemap.h	2011-06-05 18:53:54.758707823 -0700
@@ -255,26 +255,24 @@ static inline struct page *grab_cache_pa
 extern struct page * grab_cache_page_nowait(struct address_space *mapping,
 				pgoff_t index);
 extern struct page * read_cache_page_async(struct address_space *mapping,
-				pgoff_t index, filler_t *filler,
-				void *data);
+				pgoff_t index, filler_t *filler, void *data);
 extern struct page * read_cache_page(struct address_space *mapping,
-				pgoff_t index, filler_t *filler,
-				void *data);
+				pgoff_t index, filler_t *filler, void *data);
 extern struct page * read_cache_page_gfp(struct address_space *mapping,
 				pgoff_t index, gfp_t gfp_mask);
 extern int read_cache_pages(struct address_space *mapping,
 		struct list_head *pages, filler_t *filler, void *data);
 
 static inline struct page *read_mapping_page_async(
-						struct address_space *mapping,
-						     pgoff_t index, void *data)
+				struct address_space *mapping,
+				pgoff_t index, void *data)
 {
 	filler_t *filler = (filler_t *)mapping->a_ops->readpage;
 	return read_cache_page_async(mapping, index, filler, data);
 }
 
 static inline struct page *read_mapping_page(struct address_space *mapping,
-					     pgoff_t index, void *data)
+				pgoff_t index, void *data)
 {
 	filler_t *filler = (filler_t *)mapping->a_ops->readpage;
 	return read_cache_page(mapping, index, filler, data);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 10/14] mm: truncate functions are in truncate.c
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (8 preceding siblings ...)
  2011-06-06  4:35 ` [PATCH 9/14] mm: cleanup descriptions of filler arg Hugh Dickins
@ 2011-06-06  4:36 ` Hugh Dickins
  2011-06-06  4:38 ` [PATCH 11/14] mm: tidy vmtruncate_range and related functions Hugh Dickins
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Correct comment on truncate_inode_pages*() in linux/mm.h; and remove
declaration of page_unuse(), it didn't exist even in 2.2.26 or 2.4.0!

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/mm.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- linux.orig/include/linux/mm.h	2011-06-05 17:16:33.313740660 -0700
+++ linux/include/linux/mm.h	2011-06-05 18:57:56.399905055 -0700
@@ -1445,8 +1445,7 @@ extern int do_munmap(struct mm_struct *,
 
 extern unsigned long do_brk(unsigned long, unsigned long);
 
-/* filemap.c */
-extern unsigned long page_unuse(struct page *);
+/* truncate.c */
 extern void truncate_inode_pages(struct address_space *, loff_t);
 extern void truncate_inode_pages_range(struct address_space *,
 				       loff_t lstart, loff_t lend);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 11/14] mm: tidy vmtruncate_range and related functions
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (9 preceding siblings ...)
  2011-06-06  4:36 ` [PATCH 10/14] mm: truncate functions are in truncate.c Hugh Dickins
@ 2011-06-06  4:38 ` Hugh Dickins
  2011-06-06  4:39 ` [PATCH 12/14] mm: consistent truncate and invalidate loops Hugh Dickins
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Use consistent variable names in truncate_pagecache(), truncate_setsize(),
vmtruncate() and vmtruncate_range().

unmap_mapping_range() and vmtruncate_range() have mismatched interfaces:
don't change either, but make the vmtruncates more precise about what
they expect unmap_mapping_range() to do.

vmtruncate_range() is currently called only with page-aligned start and
end+1: can handle unaligned start, but unaligned end+1 would hit BUG_ON
in truncate_inode_pages_range() (lacks partial clearing of the end page).

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/truncate.c |   31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

--- linux.orig/mm/truncate.c	2011-06-05 17:42:31.341466507 -0700
+++ linux/mm/truncate.c	2011-06-05 19:19:52.254430023 -0700
@@ -526,8 +526,8 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
 /**
  * truncate_pagecache - unmap and remove pagecache that has been truncated
  * @inode: inode
- * @old: old file offset
- * @new: new file offset
+ * @oldsize: old file size
+ * @newsize: new file size
  *
  * inode's new i_size must already be written before truncate_pagecache
  * is called.
@@ -539,9 +539,10 @@ EXPORT_SYMBOL_GPL(invalidate_inode_pages
  * situations such as writepage being called for a page that has already
  * had its underlying blocks deallocated.
  */
-void truncate_pagecache(struct inode *inode, loff_t old, loff_t new)
+void truncate_pagecache(struct inode *inode, loff_t oldsize, loff_t newsize)
 {
 	struct address_space *mapping = inode->i_mapping;
+	loff_t holebegin = round_up(newsize, PAGE_SIZE);
 
 	/*
 	 * unmap_mapping_range is called twice, first simply for
@@ -552,9 +553,9 @@ void truncate_pagecache(struct inode *in
 	 * truncate_inode_pages finishes, hence the second
 	 * unmap_mapping_range call must be made for correctness.
 	 */
-	unmap_mapping_range(mapping, new + PAGE_SIZE - 1, 0, 1);
-	truncate_inode_pages(mapping, new);
-	unmap_mapping_range(mapping, new + PAGE_SIZE - 1, 0, 1);
+	unmap_mapping_range(mapping, holebegin, 0, 1);
+	truncate_inode_pages(mapping, newsize);
+	unmap_mapping_range(mapping, holebegin, 0, 1);
 }
 EXPORT_SYMBOL(truncate_pagecache);
 
@@ -584,29 +585,31 @@ EXPORT_SYMBOL(truncate_setsize);
 /**
  * vmtruncate - unmap mappings "freed" by truncate() syscall
  * @inode: inode of the file used
- * @offset: file offset to start truncating
+ * @newsize: file offset to start truncating
  *
  * This function is deprecated and truncate_setsize or truncate_pagecache
  * should be used instead, together with filesystem specific block truncation.
  */
-int vmtruncate(struct inode *inode, loff_t offset)
+int vmtruncate(struct inode *inode, loff_t newsize)
 {
 	int error;
 
-	error = inode_newsize_ok(inode, offset);
+	error = inode_newsize_ok(inode, newsize);
 	if (error)
 		return error;
 
-	truncate_setsize(inode, offset);
+	truncate_setsize(inode, newsize);
 	if (inode->i_op->truncate)
 		inode->i_op->truncate(inode);
 	return 0;
 }
 EXPORT_SYMBOL(vmtruncate);
 
-int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end)
+int vmtruncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 {
 	struct address_space *mapping = inode->i_mapping;
+	loff_t holebegin = round_up(lstart, PAGE_SIZE);
+	loff_t holelen = 1 + lend - holebegin;
 
 	/*
 	 * If the underlying filesystem is not going to provide
@@ -618,10 +621,10 @@ int vmtruncate_range(struct inode *inode
 
 	mutex_lock(&inode->i_mutex);
 	down_write(&inode->i_alloc_sem);
-	unmap_mapping_range(mapping, offset, (end - offset), 1);
-	inode->i_op->truncate_range(inode, offset, end);
+	unmap_mapping_range(mapping, holebegin, holelen, 1);
+	inode->i_op->truncate_range(inode, lstart, lend);
 	/* unmap again to remove racily COWed private pages */
-	unmap_mapping_range(mapping, offset, (end - offset), 1);
+	unmap_mapping_range(mapping, holebegin, holelen, 1);
 	up_write(&inode->i_alloc_sem);
 	mutex_unlock(&inode->i_mutex);
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 12/14] mm: consistent truncate and invalidate loops
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (10 preceding siblings ...)
  2011-06-06  4:38 ` [PATCH 11/14] mm: tidy vmtruncate_range and related functions Hugh Dickins
@ 2011-06-06  4:39 ` Hugh Dickins
  2011-06-06  4:40 ` [PATCH 13/14] mm: pincer in truncate_inode_pages_range Hugh Dickins
  2011-06-06  4:42 ` [PATCH 14/14] tmpfs: no need to use i_lock Hugh Dickins
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Make the pagevec_lookup loops in truncate_inode_pages_range(),
invalidate_mapping_pages() and invalidate_inode_pages2_range()
more consistent with each other.

They were relying upon page->index of an unlocked page, but apologizing
for it: accept it, embrace it, add comments and WARN_ONs, and simplify
the index handling.

invalidate_inode_pages2_range() had special handling for a wrapped
page->index + 1 = 0 case; but MAX_LFS_FILESIZE doesn't let us anywhere
near there, and a corrupt page->index in the radix_tree could cause
more trouble than that would catch.  Remove that wrapped handling.

invalidate_inode_pages2_range() uses min() to limit the pagevec_lookup
when near the end of the range: copy that into the other two, although
it's less useful than you might think (it limits the use of the buffer,
rather than the indices looked up).

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/filemap.c  |    2 
 mm/truncate.c |  110 ++++++++++++++++++++----------------------------
 2 files changed, 49 insertions(+), 63 deletions(-)

--- linux.orig/mm/filemap.c	2011-06-05 18:50:34.000000000 -0700
+++ linux/mm/filemap.c	2011-06-05 19:23:38.191550388 -0700
@@ -131,6 +131,7 @@ void __delete_from_page_cache(struct pag
 
 	radix_tree_delete(&mapping->page_tree, page->index);
 	page->mapping = NULL;
+	/* Leave page->index set: truncation lookup relies upon it */
 	mapping->nrpages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
 	if (PageSwapBacked(page))
@@ -486,6 +487,7 @@ int add_to_page_cache_locked(struct page
 			spin_unlock_irq(&mapping->tree_lock);
 		} else {
 			page->mapping = NULL;
+			/* Leave page->index set: truncation relies upon it */
 			spin_unlock_irq(&mapping->tree_lock);
 			mem_cgroup_uncharge_cache_page(page);
 			page_cache_release(page);
--- linux.orig/mm/truncate.c	2011-06-05 19:19:52.000000000 -0700
+++ linux/mm/truncate.c	2011-06-05 19:25:13.112013371 -0700
@@ -199,9 +199,6 @@ int invalidate_inode_page(struct page *p
  * The first pass will remove most pages, so the search cost of the second pass
  * is low.
  *
- * When looking at page->index outside the page lock we need to be careful to
- * copy it into a local to avoid races (it could change at any time).
- *
  * We pass down the cache-hot hint to the page freeing code.  Even if the
  * mapping is large, it is probably the case that the final pages are the most
  * recently touched, and freeing happens in ascending file offset order.
@@ -210,10 +207,10 @@ void truncate_inode_pages_range(struct a
 				loff_t lstart, loff_t lend)
 {
 	const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
-	pgoff_t end;
 	const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
 	struct pagevec pvec;
-	pgoff_t next;
+	pgoff_t index;
+	pgoff_t end;
 	int i;
 
 	cleancache_flush_inode(mapping);
@@ -224,24 +221,21 @@ void truncate_inode_pages_range(struct a
 	end = (lend >> PAGE_CACHE_SHIFT);
 
 	pagevec_init(&pvec, 0);
-	next = start;
-	while (next <= end &&
-	       pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
+	index = start;
+	while (index <= end && pagevec_lookup(&pvec, mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
 		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
-			pgoff_t page_index = page->index;
 
-			if (page_index > end) {
-				next = page_index;
+			/* We rely upon deletion not changing page->index */
+			index = page->index;
+			if (index > end)
 				break;
-			}
 
-			if (page_index > next)
-				next = page_index;
-			next++;
 			if (!trylock_page(page))
 				continue;
+			WARN_ON(page->index != index);
 			if (PageWriteback(page)) {
 				unlock_page(page);
 				continue;
@@ -252,6 +246,7 @@ void truncate_inode_pages_range(struct a
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		cond_resched();
+		index++;
 	}
 
 	if (partial) {
@@ -264,13 +259,14 @@ void truncate_inode_pages_range(struct a
 		}
 	}
 
-	next = start;
+	index = start;
 	for ( ; ; ) {
 		cond_resched();
-		if (!pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
-			if (next == start)
+		if (!pagevec_lookup(&pvec, mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+			if (index == start)
 				break;
-			next = start;
+			index = start;
 			continue;
 		}
 		if (pvec.pages[0]->index > end) {
@@ -281,18 +277,20 @@ void truncate_inode_pages_range(struct a
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
-			if (page->index > end)
+			/* We rely upon deletion not changing page->index */
+			index = page->index;
+			if (index > end)
 				break;
+
 			lock_page(page);
+			WARN_ON(page->index != index);
 			wait_on_page_writeback(page);
 			truncate_inode_page(mapping, page);
-			if (page->index > next)
-				next = page->index;
-			next++;
 			unlock_page(page);
 		}
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
+		index++;
 	}
 	cleancache_flush_inode(mapping);
 }
@@ -328,35 +326,26 @@ unsigned long invalidate_mapping_pages(s
 		pgoff_t start, pgoff_t end)
 {
 	struct pagevec pvec;
-	pgoff_t next = start;
+	pgoff_t index = start;
 	unsigned long ret;
 	unsigned long count = 0;
 	int i;
 
 	pagevec_init(&pvec, 0);
-	while (next <= end &&
-			pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
+	while (index <= end && pagevec_lookup(&pvec, mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
 		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
-			pgoff_t index;
-			int lock_failed;
-
-			lock_failed = !trylock_page(page);
 
-			/*
-			 * We really shouldn't be looking at the ->index of an
-			 * unlocked page.  But we're not allowed to lock these
-			 * pages.  So we rely upon nobody altering the ->index
-			 * of this (pinned-by-us) page.
-			 */
+			/* We rely upon deletion not changing page->index */
 			index = page->index;
-			if (index > next)
-				next = index;
-			next++;
-			if (lock_failed)
-				continue;
+			if (index > end)
+				break;
 
+			if (!trylock_page(page))
+				continue;
+			WARN_ON(page->index != index);
 			ret = invalidate_inode_page(page);
 			unlock_page(page);
 			/*
@@ -366,12 +355,11 @@ unsigned long invalidate_mapping_pages(s
 			if (!ret)
 				deactivate_page(page);
 			count += ret;
-			if (next > end)
-				break;
 		}
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		cond_resched();
+		index++;
 	}
 	return count;
 }
@@ -437,37 +425,32 @@ int invalidate_inode_pages2_range(struct
 				  pgoff_t start, pgoff_t end)
 {
 	struct pagevec pvec;
-	pgoff_t next;
+	pgoff_t index;
 	int i;
 	int ret = 0;
 	int ret2 = 0;
 	int did_range_unmap = 0;
-	int wrapped = 0;
 
 	cleancache_flush_inode(mapping);
 	pagevec_init(&pvec, 0);
-	next = start;
-	while (next <= end && !wrapped &&
-		pagevec_lookup(&pvec, mapping, next,
-			min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
+	index = start;
+	while (index <= end && pagevec_lookup(&pvec, mapping, index,
+			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
 		mem_cgroup_uncharge_start();
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
-			pgoff_t page_index;
+
+			/* We rely upon deletion not changing page->index */
+			index = page->index;
+			if (index > end)
+				break;
 
 			lock_page(page);
+			WARN_ON(page->index != index);
 			if (page->mapping != mapping) {
 				unlock_page(page);
 				continue;
 			}
-			page_index = page->index;
-			next = page_index + 1;
-			if (next == 0)
-				wrapped = 1;
-			if (page_index > end) {
-				unlock_page(page);
-				break;
-			}
 			wait_on_page_writeback(page);
 			if (page_mapped(page)) {
 				if (!did_range_unmap) {
@@ -475,9 +458,9 @@ int invalidate_inode_pages2_range(struct
 					 * Zap the rest of the file in one hit.
 					 */
 					unmap_mapping_range(mapping,
-					   (loff_t)page_index<<PAGE_CACHE_SHIFT,
-					   (loff_t)(end - page_index + 1)
-							<< PAGE_CACHE_SHIFT,
+					   (loff_t)index << PAGE_CACHE_SHIFT,
+					   (loff_t)(1 + end - index)
+							 << PAGE_CACHE_SHIFT,
 					    0);
 					did_range_unmap = 1;
 				} else {
@@ -485,8 +468,8 @@ int invalidate_inode_pages2_range(struct
 					 * Just zap this page
 					 */
 					unmap_mapping_range(mapping,
-					  (loff_t)page_index<<PAGE_CACHE_SHIFT,
-					  PAGE_CACHE_SIZE, 0);
+					   (loff_t)index << PAGE_CACHE_SHIFT,
+					   PAGE_CACHE_SIZE, 0);
 				}
 			}
 			BUG_ON(page_mapped(page));
@@ -502,6 +485,7 @@ int invalidate_inode_pages2_range(struct
 		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		cond_resched();
+		index++;
 	}
 	cleancache_flush_inode(mapping);
 	return ret;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 13/14] mm: pincer in truncate_inode_pages_range
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (11 preceding siblings ...)
  2011-06-06  4:39 ` [PATCH 12/14] mm: consistent truncate and invalidate loops Hugh Dickins
@ 2011-06-06  4:40 ` Hugh Dickins
  2011-06-06  4:42 ` [PATCH 14/14] tmpfs: no need to use i_lock Hugh Dickins
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

truncate_inode_pages_range()'s final loop has a nice pincer property,
bringing start and end together, squeezing out the last pages.  But
the range handling missed out on that, just sliding up the range,
perhaps letting pages come in behind it.  Add one more test to give
it the same pincer effect.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/truncate.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux.orig/mm/truncate.c	2011-06-05 19:25:13.112013371 -0700
+++ linux/mm/truncate.c	2011-06-05 19:27:12.244611885 -0700
@@ -269,7 +269,7 @@ void truncate_inode_pages_range(struct a
 			index = start;
 			continue;
 		}
-		if (pvec.pages[0]->index > end) {
+		if (index == start && pvec.pages[0]->index > end) {
 			pagevec_release(&pvec);
 			break;
 		}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 14/14] tmpfs: no need to use i_lock
  2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
                   ` (12 preceding siblings ...)
  2011-06-06  4:40 ` [PATCH 13/14] mm: pincer in truncate_inode_pages_range Hugh Dickins
@ 2011-06-06  4:42 ` Hugh Dickins
  13 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  4:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tim Chen, linux-kernel, linux-mm

2.6.36's 7e496299d4d2 to make tmpfs scalable with percpu_counter used
inode->i_lock in place of sbinfo->stat_lock around i_blocks updates;
but that was adverse to scalability, and unnecessary, since info->lock
is already held there in the fast paths.

Remove those uses of i_lock, and add info->lock in the three error
paths where it's then needed across shmem_free_blocks().  It's not
actually needed across shmem_unacct_blocks(), but they're so often
paired that it looks wrong to split them apart.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 mm/shmem.c |   14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

--- linux.orig/mm/shmem.c	2011-06-05 17:53:42.948796800 -0700
+++ linux/mm/shmem.c	2011-06-05 19:27:33.248715783 -0700
@@ -241,9 +241,7 @@ static void shmem_free_blocks(struct ino
 	struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
 	if (sbinfo->max_blocks) {
 		percpu_counter_add(&sbinfo->used_blocks, -pages);
-		spin_lock(&inode->i_lock);
 		inode->i_blocks -= pages*BLOCKS_PER_PAGE;
-		spin_unlock(&inode->i_lock);
 	}
 }
 
@@ -432,9 +430,7 @@ static swp_entry_t *shmem_swp_alloc(stru
 						sbinfo->max_blocks - 1) >= 0)
 				return ERR_PTR(-ENOSPC);
 			percpu_counter_inc(&sbinfo->used_blocks);
-			spin_lock(&inode->i_lock);
 			inode->i_blocks += BLOCKS_PER_PAGE;
-			spin_unlock(&inode->i_lock);
 		}
 
 		spin_unlock(&info->lock);
@@ -1421,9 +1417,7 @@ repeat:
 			    shmem_acct_block(info->flags))
 				goto nospace;
 			percpu_counter_inc(&sbinfo->used_blocks);
-			spin_lock(&inode->i_lock);
 			inode->i_blocks += BLOCKS_PER_PAGE;
-			spin_unlock(&inode->i_lock);
 		} else if (shmem_acct_block(info->flags))
 			goto nospace;
 
@@ -1434,8 +1428,10 @@ repeat:
 				spin_unlock(&info->lock);
 				filepage = shmem_alloc_page(gfp, info, idx);
 				if (!filepage) {
+					spin_lock(&info->lock);
 					shmem_unacct_blocks(info->flags, 1);
 					shmem_free_blocks(inode, 1);
+					spin_unlock(&info->lock);
 					error = -ENOMEM;
 					goto failed;
 				}
@@ -1449,8 +1445,10 @@ repeat:
 					current->mm, GFP_KERNEL);
 				if (error) {
 					page_cache_release(filepage);
+					spin_lock(&info->lock);
 					shmem_unacct_blocks(info->flags, 1);
 					shmem_free_blocks(inode, 1);
+					spin_unlock(&info->lock);
 					filepage = NULL;
 					goto failed;
 				}
@@ -1480,10 +1478,10 @@ repeat:
 			 * be done automatically.
 			 */
 			if (ret) {
-				spin_unlock(&info->lock);
-				page_cache_release(filepage);
 				shmem_unacct_blocks(info->flags, 1);
 				shmem_free_blocks(inode, 1);
+				spin_unlock(&info->lock);
+				page_cache_release(filepage);
 				filepage = NULL;
 				if (error)
 					goto failed;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/14] tmpfs: take control of its truncate_range
  2011-06-03  5:16       ` Christoph Hellwig
@ 2011-06-06  5:37         ` Hugh Dickins
  0 siblings, 0 replies; 20+ messages in thread
From: Hugh Dickins @ 2011-06-06  5:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrew Morton, linux-kernel, linux-mm

On Fri, 3 Jun 2011, Christoph Hellwig wrote:
> On Wed, Jun 01, 2011 at 09:58:18AM -0700, Hugh Dickins wrote:
> 
> > Fine, I'll add tmpfs PUNCH_HOLE later on.  And wire up madvise MADV_REMOVE
> > to fallocate PUNCH_HOLE, yes?
> 
> Yeah.  One thing I've noticed is that the hole punching doesn't seem
> to do the unmap_mapping_range.  It might be worth to audit that from the
> VM point of view.

I'd noticed that recently too.

At first I was alarmed, but it's actually an inefficiency rather than
a danger: because at some stage a safety unmap_mapping_range() call has
been added into truncate_inode_page().  I don't know what case that was
originally for, but it will cover fallocate() for now.

This is a call to unmap_mapping_range() with 0 for the even_cows arg
i.e. it will not remove COWed copies of the file page from private
mappings.  I think that's good semantics for hole punching (and it's
difficult to enforce the alternative, because we've neither i_size nor
page lock to prevent races); but it does differ from the (odd) POSIX
truncation behaviour, to unmap even the private COWs.

What do you think?  If you think we should unmap COWs, then it ought
to be corrrected sooner.  Otherwise I was inclined not to rush (I'm
also wondering about cleancache in truncation: that should be another
mail thread, but might call for passing down a flag useful here too).

You might notice that the alternate hole-punching's vmtruncate_range()
is passing even_cows 1: doesn't matter in practice, since that one has
been restricted to operating on shared writable mappings.  Oh, I
suppose there could also be a parallel private writable mapping,
whose COWs would get unmapped.  Hmm, worth worrying about if we
choose the opposite with fallocate hole punching?

> 
> > Would you like me to remove the ->truncate_range method from
> > inode_operations completely?
> 
> Doing that would be nice.  Do we always have the required file struct
> for ->fallocate in the callers?

Good point, but yes, no problem.

I'm carrying on using ->truncate_range for the moment, partly because
I don't want to get diverted by testing the ->fallocate alternative yet,
but also because removing ->truncate_range now would force an immediate
change on drm/i915: better use shmem_truncate_range() for the transition.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/14] tmpfs: take control of its truncate_range
  2011-06-01 16:58     ` Hugh Dickins
@ 2011-06-03  5:16       ` Christoph Hellwig
  2011-06-06  5:37         ` Hugh Dickins
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2011-06-03  5:16 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Christoph Hellwig, Andrew Morton, linux-kernel, linux-mm

On Wed, Jun 01, 2011 at 09:58:18AM -0700, Hugh Dickins wrote:
> (i915 isn't really doing hole-punching there, I think it just found it
> a useful interface to remove the page-and-swapcache without touching
> i_size.  Parentheses because it makes no difference to your point.)

Keeping i_size while removing pages on tmpfs fits the defintion of hole
punching for me.  Not that it matters anyway.

> When I say "shmem", I am including the !SHMEM-was-TINY_SHMEM case too,
> which goes to ramfs.  Currently i915 has been configured to disable that
> possibility, though we insisted on it originally: there may or may not be
> good reason for disabling it - may just be a side-effect of the rather
> twisted unintuitive SHMEM/TMPFS dependencies.

Hmm, the two different implementations make everything harder.  Also
because we don't even implement the hole punching in !SHMEM tmpfs.

> Fine, I'll add tmpfs PUNCH_HOLE later on.  And wire up madvise MADV_REMOVE
> to fallocate PUNCH_HOLE, yes?

Yeah.  One thing I've noticed is that the hole punching doesn't seem
to do the unmap_mapping_range.  It might be worth to audit that from the
VM point of view.

> Would you like me to remove the ->truncate_range method from
> inode_operations completely?

Doing that would be nice.  Do we always have the required file struct
for ->fallocate in the callers?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/14] tmpfs: take control of its truncate_range
  2011-06-01  0:39   ` Christoph Hellwig
@ 2011-06-01 16:58     ` Hugh Dickins
  2011-06-03  5:16       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Hugh Dickins @ 2011-06-01 16:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrew Morton, linux-kernel, linux-mm

Thanks a lot for looking at these.

On Tue, 31 May 2011, Christoph Hellwig wrote:
> > Note that drivers/gpu/drm/i915/i915_gem.c i915_gem_object_truncate()
> > calls the tmpfs ->truncate_range directly: update that in a separate
> > patch later, for now just let it duplicate the truncate_inode_pages().
> > Because i915 handles unmap_mapping_range() itself at a different stage,
> > we have chosen not to bundle that into ->truncate_range.
> 
> In your next series that makes it call the readpae replacement directly
> it might be nice to also call directly into shmem for hole punching.

(i915 isn't really doing hole-punching there, I think it just found it
a useful interface to remove the page-and-swapcache without touching
i_size.  Parentheses because it makes no difference to your point.)

Okay, I'd better do a v2 (probably not before the weekend), and change
that around to go explicitly to shmem there as well: I'd rather settle
the interfaces to other subsystems in this series, than mix it with the
implementation in the next series.

When I say "shmem", I am including the !SHMEM-was-TINY_SHMEM case too,
which goes to ramfs.  Currently i915 has been configured to disable that
possibility, though we insisted on it originally: there may or may not be
good reason for disabling it - may just be a side-effect of the rather
twisted unintuitive SHMEM/TMPFS dependencies.

> 
> > I notice that ext4 is now joining ocfs2 and xfs in supporting fallocate
> > FALLOC_FL_PUNCH_HOLE: perhaps they should support truncate_range, and
> > tmpfs should support fallocate?  But worry about that another time...
> 
> No, truncate_range and the madvice interface are pretty sad hacks that
> should never have been added in the first place.  Adding
> FALLOC_FL_PUNCH_HOLE support for shmem on the other hand might make
> some sense.

Fine, I'll add tmpfs PUNCH_HOLE later on.  And wire up madvise MADV_REMOVE
to fallocate PUNCH_HOLE, yes?

Would you like me to remove the ->truncate_range method from
inode_operations completely?  I can do that now, hack directly to tmpfs
in the interim, in the same way as for i915.

Hugh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/14] tmpfs: take control of its truncate_range
  2011-05-31  0:39 ` [PATCH 3/14] tmpfs: take control of its truncate_range Hugh Dickins
@ 2011-06-01  0:39   ` Christoph Hellwig
  2011-06-01 16:58     ` Hugh Dickins
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2011-06-01  0:39 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Christoph Hellwig, linux-kernel, linux-mm

> Note that drivers/gpu/drm/i915/i915_gem.c i915_gem_object_truncate()
> calls the tmpfs ->truncate_range directly: update that in a separate
> patch later, for now just let it duplicate the truncate_inode_pages().
> Because i915 handles unmap_mapping_range() itself at a different stage,
> we have chosen not to bundle that into ->truncate_range.

In your next series that makes it call the readpae replacement directly
it might be nice to also call directly into shmem for hole punching.

> I notice that ext4 is now joining ocfs2 and xfs in supporting fallocate
> FALLOC_FL_PUNCH_HOLE: perhaps they should support truncate_range, and
> tmpfs should support fallocate?  But worry about that another time...

No, truncate_range and the madvice interface are pretty sad hacks that
should never have been added in the first place.  Adding
FALLOC_FL_PUNCH_HOLE support for shmem on the other hand might make
some sense.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/14] tmpfs: take control of its truncate_range
  2011-05-31  0:33 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
@ 2011-05-31  0:39 ` Hugh Dickins
  2011-06-01  0:39   ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Hugh Dickins @ 2011-05-31  0:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Hellwig, linux-kernel, linux-mm

2.6.35's new truncate convention gave tmpfs the opportunity to control
its file truncation, no longer enforced from outside by vmtruncate().
We shall want to build upon that, to handle pagecache and swap together.

Slightly redefine the ->truncate_range interface, so far implemented
only by tmpfs to support madvise(,,MADV_REMOVE).  Let it now be called
between the unmap_mapping_range()s, with the filesystem responsible for
doing the truncate_inode_pages_range() from it - just as the filesystem
is nowadays responsible for doing that from its ->setattr.

Let's rename shmem_notify_change() to shmem_setattr().  Instead of
calling the generic truncate_setsize(), bring that code in so we can
call shmem_truncate_range() - which will later be updated to perform
its own variant of truncate_inode_pages_range().

Remove the punch_hole unmap_mapping_range() from shmem_truncate_range():
now that the COW's unmap_mapping_range() comes after ->truncate_range,
there's no need to call it a third time.

Note that drivers/gpu/drm/i915/i915_gem.c i915_gem_object_truncate()
calls the tmpfs ->truncate_range directly: update that in a separate
patch later, for now just let it duplicate the truncate_inode_pages().
Because i915 handles unmap_mapping_range() itself at a different stage,
we have chosen not to bundle that into ->truncate_range.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
---
I notice that ext4 is now joining ocfs2 and xfs in supporting fallocate
FALLOC_FL_PUNCH_HOLE: perhaps they should support truncate_range, and
tmpfs should support fallocate?  But worry about that another time...

 mm/shmem.c    |   42 +++++++++++++++++++++---------------------
 mm/truncate.c |    4 ++--
 2 files changed, 23 insertions(+), 23 deletions(-)

--- linux.orig/mm/shmem.c	2011-05-30 13:56:10.000000000 -0700
+++ linux/mm/shmem.c	2011-05-30 14:13:03.569821995 -0700
@@ -562,6 +562,8 @@ static void shmem_truncate_range(struct
 	spinlock_t *punch_lock;
 	unsigned long upper_limit;
 
+	truncate_inode_pages_range(inode->i_mapping, start, end);
+
 	inode->i_ctime = inode->i_mtime = CURRENT_TIME;
 	idx = (start + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	if (idx >= info->next_index)
@@ -738,16 +740,8 @@ done2:
 		 * lowered next_index.  Also, though shmem_getpage checks
 		 * i_size before adding to cache, no recheck after: so fix the
 		 * narrow window there too.
-		 *
-		 * Recalling truncate_inode_pages_range and unmap_mapping_range
-		 * every time for punch_hole (which never got a chance to clear
-		 * SHMEM_PAGEIN at the start of vmtruncate_range) is expensive,
-		 * yet hardly ever necessary: try to optimize them out later.
 		 */
 		truncate_inode_pages_range(inode->i_mapping, start, end);
-		if (punch_hole)
-			unmap_mapping_range(inode->i_mapping, start,
-							end - start, 1);
 	}
 
 	spin_lock(&info->lock);
@@ -767,21 +761,21 @@ done2:
 	}
 }
 
-static int shmem_notify_change(struct dentry *dentry, struct iattr *attr)
+static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 {
 	struct inode *inode = dentry->d_inode;
-	loff_t newsize = attr->ia_size;
 	int error;
 
 	error = inode_change_ok(inode, attr);
 	if (error)
 		return error;
 
-	if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)
-					&& newsize != inode->i_size) {
+	if (S_ISREG(inode->i_mode) && (attr->ia_valid & ATTR_SIZE)) {
+		loff_t oldsize = inode->i_size;
+		loff_t newsize = attr->ia_size;
 		struct page *page = NULL;
 
-		if (newsize < inode->i_size) {
+		if (newsize < oldsize) {
 			/*
 			 * If truncating down to a partial page, then
 			 * if that page is already allocated, hold it
@@ -810,12 +804,19 @@ static int shmem_notify_change(struct de
 				spin_unlock(&info->lock);
 			}
 		}
-
-		/* XXX(truncate): truncate_setsize should be called last */
-		truncate_setsize(inode, newsize);
+		if (newsize != oldsize) {
+			i_size_write(inode, newsize);
+			inode->i_ctime = inode->i_mtime = CURRENT_TIME;
+		}
+		if (newsize < oldsize) {
+			loff_t holebegin = round_up(newsize, PAGE_SIZE);
+			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
+			shmem_truncate_range(inode, newsize, (loff_t)-1);
+			/* unmap again to remove racily COWed private pages */
+			unmap_mapping_range(inode->i_mapping, holebegin, 0, 1);
+		}
 		if (page)
 			page_cache_release(page);
-		shmem_truncate_range(inode, newsize, (loff_t)-1);
 	}
 
 	setattr_copy(inode, attr);
@@ -832,7 +833,6 @@ static void shmem_evict_inode(struct ino
 	struct shmem_xattr *xattr, *nxattr;
 
 	if (inode->i_mapping->a_ops == &shmem_aops) {
-		truncate_inode_pages(inode->i_mapping, 0);
 		shmem_unacct_size(info->flags, inode->i_size);
 		inode->i_size = 0;
 		shmem_truncate_range(inode, 0, (loff_t)-1);
@@ -2706,7 +2706,7 @@ static const struct file_operations shme
 };
 
 static const struct inode_operations shmem_inode_operations = {
-	.setattr	= shmem_notify_change,
+	.setattr	= shmem_setattr,
 	.truncate_range	= shmem_truncate_range,
 #ifdef CONFIG_TMPFS_XATTR
 	.setxattr	= shmem_setxattr,
@@ -2739,7 +2739,7 @@ static const struct inode_operations shm
 	.removexattr	= shmem_removexattr,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
-	.setattr	= shmem_notify_change,
+	.setattr	= shmem_setattr,
 	.check_acl	= generic_check_acl,
 #endif
 };
@@ -2752,7 +2752,7 @@ static const struct inode_operations shm
 	.removexattr	= shmem_removexattr,
 #endif
 #ifdef CONFIG_TMPFS_POSIX_ACL
-	.setattr	= shmem_notify_change,
+	.setattr	= shmem_setattr,
 	.check_acl	= generic_check_acl,
 #endif
 };
--- linux.orig/mm/truncate.c	2011-05-30 14:09:52.000000000 -0700
+++ linux/mm/truncate.c	2011-05-30 14:15:29.814546645 -0700
@@ -621,9 +621,9 @@ int vmtruncate_range(struct inode *inode
 	mutex_lock(&inode->i_mutex);
 	down_write(&inode->i_alloc_sem);
 	unmap_mapping_range(mapping, offset, (end - offset), 1);
-	truncate_inode_pages_range(mapping, offset, end);
-	unmap_mapping_range(mapping, offset, (end - offset), 1);
 	inode->i_op->truncate_range(inode, offset, end);
+	/* unmap again to remove racily COWed private pages */
+	unmap_mapping_range(mapping, offset, (end - offset), 1);
 	up_write(&inode->i_alloc_sem);
 	mutex_unlock(&inode->i_mutex);
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-06-06  5:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-06  4:21 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
2011-06-06  4:23 ` [PATCH 1/14] mm: move vmtruncate_range to truncate.c Hugh Dickins
2011-06-06  4:24 ` [PATCH 2/14] mm: move shmem prototypes to shmem_fs.h Hugh Dickins
2011-06-06  4:26 ` [PATCH 3/14] tmpfs: take control of its truncate_range Hugh Dickins
2011-06-06  4:27 ` [PATCH 4/14] tmpfs: add shmem_read_mapping_page_gfp Hugh Dickins
2011-06-06  4:29 ` [PATCH 5/14] drm/ttm: use shmem_read_mapping_page Hugh Dickins
2011-06-06  4:31 ` [PATCH 6/14] drm/i915: " Hugh Dickins
2011-06-06  4:32 ` [PATCH 7/14] drm/i915: use shmem_truncate_range Hugh Dickins
2011-06-06  4:34 ` [PATCH 8/14] drm/i915: more struct_mutex locking Hugh Dickins
2011-06-06  4:35 ` [PATCH 9/14] mm: cleanup descriptions of filler arg Hugh Dickins
2011-06-06  4:36 ` [PATCH 10/14] mm: truncate functions are in truncate.c Hugh Dickins
2011-06-06  4:38 ` [PATCH 11/14] mm: tidy vmtruncate_range and related functions Hugh Dickins
2011-06-06  4:39 ` [PATCH 12/14] mm: consistent truncate and invalidate loops Hugh Dickins
2011-06-06  4:40 ` [PATCH 13/14] mm: pincer in truncate_inode_pages_range Hugh Dickins
2011-06-06  4:42 ` [PATCH 14/14] tmpfs: no need to use i_lock Hugh Dickins
  -- strict thread matches above, loose matches on Subject: below --
2011-05-31  0:33 [PATCH 0/14] mm: tmpfs and trunc changes, affecting drm Hugh Dickins
2011-05-31  0:39 ` [PATCH 3/14] tmpfs: take control of its truncate_range Hugh Dickins
2011-06-01  0:39   ` Christoph Hellwig
2011-06-01 16:58     ` Hugh Dickins
2011-06-03  5:16       ` Christoph Hellwig
2011-06-06  5:37         ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).