[patch 00/14] Page cache cleanup in anticipation of Large Blocksize support

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
@ 2007-06-14 19:38 clameter
  2007-06-14 19:38 ` [patch 01/14] Define functions for page cache handling clameter
                   ` (14 more replies)
  0 siblings, 15 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

This patchset cleans up the page cache handling by replacing
open coded shifts and adds through inline function calls.

The ultimate goal is to replace all uses of PAGE_CACHE_xxx in the
kernel through the use of these functions. All the functions take
a mapping parameter. This is in anticipation of support for higher order
pages in the page cache (like demonstrated by the Large Blocksize patchset).

It will take some time to get through all of the kernel source code.
The patches here convert only the core VM. We can likely do much
of the rest against Andrew's tree shortly before the merge window
opens for 2.6.23.

This patchset should have no effect. Both PAGE_CACHE_xxx and
page_cache-xxx functions can coexist while the conversion is
in progress. As long as filesystems / device drivers only use
PAGE_SIZE pages they can stay they are even if some file systems
and devices start to support higher order pages.

Patchset against 2.6.22-rc4-mm2

After this patchset more cleanups will follow against filesystems.
I have patches for 3 filesystems so far.

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 01/14] Define functions for page cache handling
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:56   ` Sam Ravnborg
  2007-06-14 19:38 ` [patch 02/14] Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user clameter
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_page_cache_functions --]
[-- Type: text/plain, Size: 3342 bytes --]

We use the macros PAGE_CACHE_SIZE PAGE_CACHE_SHIFT PAGE_CACHE_MASK
and PAGE_CACHE_ALIGN in various places in the kernel. Many times
common operations like calculating the offset or the index are coded
using shifts and adds. This patch provides inline function to
get the calculations accomplished in a consistent way.

All functions take an address_space pointer. The address space pointer
will be used in the future to eventually support a variable size
page cache. Information reachable via the mapping may then determine
page size.

New function			Related base page constant
---------------------------------------------------
page_cache_shift(a)		PAGE_CACHE_SHIFT
page_cache_size(a)		PAGE_CACHE_SIZE
page_cache_mask(a)		PAGE_CACHE_MASK
page_cache_index(a, pos)	Calculate page number from position
page_cache_next(addr, pos)	Page number of next page
page_cache_offset(a, pos)	Calculate offset into a page
page_cache_pos(a, index, offset)
				Form position based on page number
				and an offset.

This provides a basis that would allow the conversion of all page cache
handling in the kernel and ultimately allow the removal of the PAGE_CACHE_*
constants.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 include/linux/pagemap.h |   54 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

Index: vps/include/linux/pagemap.h
===================================================================
--- vps.orig/include/linux/pagemap.h	2007-06-08 10:57:49.000000000 -0700
+++ vps/include/linux/pagemap.h	2007-06-08 11:01:37.000000000 -0700
@@ -52,12 +52,66 @@ static inline void mapping_set_gfp_mask(
  * space in smaller chunks for same flexibility).
  *
  * Or rather, it _will_ be done in larger chunks.
+ *
+ * The following constants can be used if a filesystem only supports a single
+ * page size.
  */
 #define PAGE_CACHE_SHIFT	PAGE_SHIFT
 #define PAGE_CACHE_SIZE		PAGE_SIZE
 #define PAGE_CACHE_MASK		PAGE_MASK
 #define PAGE_CACHE_ALIGN(addr)	(((addr)+PAGE_CACHE_SIZE-1)&PAGE_CACHE_MASK)
 
+/*
+ * Functions that are currently setup for a fixed PAGE_SIZEd. The use of
+ * these will allow a variable page size pagecache in the future.
+ */
+static inline int mapping_order(struct address_space *a)
+{
+	return 0;
+}
+
+static inline int page_cache_shift(struct address_space *a)
+{
+	return PAGE_SHIFT;
+}
+
+static inline unsigned int page_cache_size(struct address_space *a)
+{
+	return PAGE_SIZE;
+}
+
+static inline loff_t page_cache_mask(struct address_space *a)
+{
+	return (loff_t)PAGE_MASK;
+}
+
+static inline unsigned int page_cache_offset(struct address_space *a,
+		loff_t pos)
+{
+	return pos & ~PAGE_MASK;
+}
+
+static inline pgoff_t page_cache_index(struct address_space *a,
+		loff_t pos)
+{
+	return pos >> page_cache_shift(a);
+}
+
+/*
+ * Index of the page starting on or after the given position.
+ */
+static inline pgoff_t page_cache_next(struct address_space *a,
+		loff_t pos)
+{
+	return page_cache_index(a, pos + page_cache_size(a) - 1);
+}
+
+static inline loff_t page_cache_pos(struct address_space *a,
+		pgoff_t index, unsigned long offset)
+{
+	return ((loff_t)index << page_cache_shift(a)) + offset;
+}
+
 #define page_cache_get(page)		get_page(page)
 #define page_cache_release(page)	put_page(page)
 void release_pages(struct page **pages, int nr, int cold);

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 02/14] Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
  2007-06-14 19:38 ` [patch 01/14] Define functions for page cache handling clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 03/14] Use page_cache_xx function in mm/filemap.c clameter
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: zero_user_segments --]
[-- Type: text/plain, Size: 27828 bytes --]

Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

	Zeros two segments of the page. It takes the position where to
	start and end the zeroing which avoids length calculations.

zero_user_segment(page, start, end)

	Same for a single segment.

zero_user(page, start, length)

	Length variant for the case where we know the length.



We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be because that constant is not used when these
functions are called.

Extract the flushing of the caches to be outside of the kmap.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 fs/affs/file.c                           |    2 -
 fs/buffer.c                              |   47 +++++++++--------------------
 fs/direct-io.c                           |    4 +-
 fs/ecryptfs/mmap.c                       |    5 +--
 fs/ext3/inode.c                          |    4 +-
 fs/gfs2/bmap.c                           |    2 -
 fs/libfs.c                               |   19 +++---------
 fs/mpage.c                               |    7 +---
 fs/nfs/read.c                            |   10 +++---
 fs/nfs/write.c                           |    2 -
 fs/ntfs/aops.c                           |   18 ++++++-----
 fs/ntfs/file.c                           |   32 +++++++++-----------
 fs/ocfs2/aops.c                          |    2 -
 fs/reiser4/plugin/file/cryptcompress.c   |    8 +----
 fs/reiser4/plugin/file/file.c            |    2 -
 fs/reiser4/plugin/item/ctail.c           |    2 -
 fs/reiser4/plugin/item/extent_file_ops.c |    4 +-
 fs/reiser4/plugin/item/tail.c            |    3 -
 fs/reiserfs/inode.c                      |    4 +-
 fs/xfs/linux-2.6/xfs_lrw.c               |    2 -
 include/linux/highmem.h                  |   49 +++++++++++++++++++------------
 mm/filemap_xip.c                         |    2 -
 mm/truncate.c                            |    2 -
 23 files changed, 107 insertions(+), 125 deletions(-)

Index: vps/include/linux/highmem.h
===================================================================
--- vps.orig/include/linux/highmem.h	2007-06-11 22:33:01.000000000 -0700
+++ vps/include/linux/highmem.h	2007-06-11 22:33:07.000000000 -0700
@@ -124,28 +124,41 @@ static inline void clear_highpage(struct
 	kunmap_atomic(kaddr, KM_USER0);
 }
 
-/*
- * Same but also flushes aliased cache contents to RAM.
- *
- * This must be a macro because KM_USER0 and friends aren't defined if
- * !CONFIG_HIGHMEM
- */
-#define zero_user_page(page, offset, size, km_type)		\
-	do {							\
-		void *kaddr;					\
-								\
-		BUG_ON((offset) + (size) > PAGE_SIZE);		\
-								\
-		kaddr = kmap_atomic(page, km_type);		\
-		memset((char *)kaddr + (offset), 0, (size));	\
-		flush_dcache_page(page);			\
-		kunmap_atomic(kaddr, (km_type));		\
-	} while (0)
+static inline void zero_user_segments(struct page *page,
+	unsigned start1, unsigned end1,
+	unsigned start2, unsigned end2)
+{
+	void *kaddr = kmap_atomic(page, KM_USER0);
+
+	BUG_ON(end1 > PAGE_SIZE ||
+		end2 > PAGE_SIZE);
+
+	if (end1 > start1)
+		memset(kaddr + start1, 0, end1 - start1);
+
+	if (end2 > start2)
+		memset(kaddr + start2, 0, end2 - start2);
+
+	kunmap_atomic(kaddr, KM_USER0);
+	flush_dcache_page(page);
+}
+
+static inline void zero_user_segment(struct page *page,
+	unsigned start, unsigned end)
+{
+	zero_user_segments(page, start, end, 0, 0);
+}
+
+static inline void zero_user(struct page *page,
+	unsigned start, unsigned size)
+{
+	zero_user_segments(page, start, start + size, 0, 0);
+}
 
 static inline void __deprecated memclear_highpage_flush(struct page *page,
 			unsigned int offset, unsigned int size)
 {
-	zero_user_page(page, offset, size, KM_USER0);
+	zero_user(page, offset, size);
 }
 
 #ifndef __HAVE_ARCH_COPY_USER_HIGHPAGE
Index: vps/fs/buffer.c
===================================================================
--- vps.orig/fs/buffer.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/buffer.c	2007-06-11 22:49:08.000000000 -0700
@@ -1792,7 +1792,7 @@ void page_zero_new_buffers(struct page *
 					start = max(from, block_start);
 					size = min(to, block_end) - start;
 
-					zero_user_page(page, start, size, KM_USER0);
+					zero_user(page, start, size);
 					set_buffer_uptodate(bh);
 				}
 
@@ -1855,19 +1855,10 @@ static int __block_prepare_write(struct 
 					mark_buffer_dirty(bh);
 					continue;
 				}
-				if (block_end > to || block_start < from) {
-					void *kaddr;
-
-					kaddr = kmap_atomic(page, KM_USER0);
-					if (block_end > to)
-						memset(kaddr+to, 0,
-							block_end-to);
-					if (block_start < from)
-						memset(kaddr+block_start,
-							0, from-block_start);
-					flush_dcache_page(page);
-					kunmap_atomic(kaddr, KM_USER0);
-				}
+				if (block_end > to || block_start < from)
+					zero_user_segments(page,
+							to, block_end,
+							block_start, from);
 				continue;
 			}
 		}
@@ -2095,8 +2086,7 @@ int block_read_full_page(struct page *pa
 					SetPageError(page);
 			}
 			if (!buffer_mapped(bh)) {
-				zero_user_page(page, i * blocksize, blocksize,
-						KM_USER0);
+				zero_user(page, i * blocksize, blocksize);
 				if (!err)
 					set_buffer_uptodate(bh);
 				continue;
@@ -2209,7 +2199,7 @@ int cont_expand_zero(struct file *file, 
 						&page, &fsdata);
 		if (err)
 			goto out;
-		zero_user_page(page, zerofrom, len, KM_USER0);
+		zero_user(page, zerofrom, len);
 		err = pagecache_write_end(file, mapping, curpos, len, len,
 						page, fsdata);
 		if (err < 0)
@@ -2236,7 +2226,7 @@ int cont_expand_zero(struct file *file, 
 						&page, &fsdata);
 		if (err)
 			goto out;
-		zero_user_page(page, zerofrom, len, KM_USER0);
+		zero_user(page, zerofrom, len);
 		err = pagecache_write_end(file, mapping, curpos, len, len,
 						page, fsdata);
 		if (err < 0)
@@ -2350,7 +2340,6 @@ int nobh_prepare_write(struct page *page
 	unsigned block_in_page;
 	unsigned block_start;
 	sector_t block_in_file;
-	char *kaddr;
 	int nr_reads = 0;
 	int i;
 	int ret = 0;
@@ -2390,13 +2379,8 @@ int nobh_prepare_write(struct page *page
 		if (PageUptodate(page))
 			continue;
 		if (buffer_new(&map_bh) || !buffer_mapped(&map_bh)) {
-			kaddr = kmap_atomic(page, KM_USER0);
-			if (block_start < from)
-				memset(kaddr+block_start, 0, from-block_start);
-			if (block_end > to)
-				memset(kaddr + to, 0, block_end - to);
-			flush_dcache_page(page);
-			kunmap_atomic(kaddr, KM_USER0);
+			zero_user_segments(page, block_start, from,
+						to, block_end);
 			continue;
 		}
 		if (buffer_uptodate(&map_bh))
@@ -2462,7 +2446,7 @@ failed:
 	 * Error recovery is pretty slack.  Clear the page and mark it dirty
 	 * so we'll later zero out any blocks which _were_ allocated.
 	 */
-	zero_user_page(page, 0, PAGE_CACHE_SIZE, KM_USER0);
+	zero_user(page, 0, PAGE_CACHE_SIZE);
 	SetPageUptodate(page);
 	set_page_dirty(page);
 	return ret;
@@ -2531,7 +2515,7 @@ int nobh_writepage(struct page *page, ge
 	 * the  page size, the remaining memory is zeroed when mapped, and
 	 * writes to that region are not written out to the file."
 	 */
-	zero_user_page(page, offset, PAGE_CACHE_SIZE - offset, KM_USER0);
+	zero_user_segment(page, offset, PAGE_CACHE_SIZE);
 out:
 	ret = mpage_writepage(page, get_block, wbc);
 	if (ret == -EAGAIN)
@@ -2565,8 +2549,7 @@ int nobh_truncate_page(struct address_sp
 	to = (offset + blocksize) & ~(blocksize - 1);
 	ret = a_ops->prepare_write(NULL, page, offset, to);
 	if (ret == 0) {
-		zero_user_page(page, offset, PAGE_CACHE_SIZE - offset,
-				KM_USER0);
+		zero_user_segment(page, offset, PAGE_CACHE_SIZE);
 		/*
 		 * It would be more correct to call aops->commit_write()
 		 * here, but this is more efficient.
@@ -2645,7 +2628,7 @@ int block_truncate_page(struct address_s
 			goto unlock;
 	}
 
-	zero_user_page(page, offset, length, KM_USER0);
+	zero_user(page, offset, length);
 	mark_buffer_dirty(bh);
 	err = 0;
 
@@ -2691,7 +2674,7 @@ int block_write_full_page(struct page *p
 	 * the  page size, the remaining memory is zeroed when mapped, and
 	 * writes to that region are not written out to the file."
 	 */
-	zero_user_page(page, offset, PAGE_CACHE_SIZE - offset, KM_USER0);
+	zero_user_segment(page, offset, PAGE_CACHE_SIZE);
 	return __block_write_full_page(inode, page, get_block, wbc);
 }
 
Index: vps/fs/libfs.c
===================================================================
--- vps.orig/fs/libfs.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/libfs.c	2007-06-11 22:49:09.000000000 -0700
@@ -340,13 +340,10 @@ int simple_prepare_write(struct file *fi
 			unsigned from, unsigned to)
 {
 	if (!PageUptodate(page)) {
-		if (to - from != PAGE_CACHE_SIZE) {
-			void *kaddr = kmap_atomic(page, KM_USER0);
-			memset(kaddr, 0, from);
-			memset(kaddr + to, 0, PAGE_CACHE_SIZE - to);
-			flush_dcache_page(page);
-			kunmap_atomic(kaddr, KM_USER0);
-		}
+		if (to - from != PAGE_CACHE_SIZE)
+			zero_user_segments(page,
+				0, from,
+				to, PAGE_CACHE_SIZE);
 	}
 	return 0;
 }
@@ -396,12 +393,8 @@ int simple_write_end(struct file *file, 
 	unsigned from = pos & (PAGE_CACHE_SIZE - 1);
 
 	/* zero the stale part of the page if we did a short copy */
-	if (copied < len) {
-		void *kaddr = kmap_atomic(page, KM_USER0);
-		memset(kaddr + from + copied, 0, len - copied);
-		flush_dcache_page(page);
-		kunmap_atomic(kaddr, KM_USER0);
-	}
+	if (copied < len)
+		zero_user(page, from + copied, len);
 
 	simple_commit_write(file, page, from, from+copied);
 
Index: vps/fs/affs/file.c
===================================================================
--- vps.orig/fs/affs/file.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/affs/file.c	2007-06-11 22:33:07.000000000 -0700
@@ -628,7 +628,7 @@ static int affs_prepare_write_ofs(struct
 			return err;
 	}
 	if (to < PAGE_CACHE_SIZE) {
-		zero_user_page(page, to, PAGE_CACHE_SIZE - to, KM_USER0);
+		zero_user_segment(page, to, PAGE_CACHE_SIZE);
 		if (size > offset + to) {
 			if (size < offset + PAGE_CACHE_SIZE)
 				tmp = size & ~PAGE_CACHE_MASK;
Index: vps/fs/mpage.c
===================================================================
--- vps.orig/fs/mpage.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/mpage.c	2007-06-11 22:49:08.000000000 -0700
@@ -284,9 +284,7 @@ do_mpage_readpage(struct bio *bio, struc
 	}
 
 	if (first_hole != blocks_per_page) {
-		zero_user_page(page, first_hole << blkbits,
-				PAGE_CACHE_SIZE - (first_hole << blkbits),
-				KM_USER0);
+		zero_user_segment(page, first_hole << blkbits, PAGE_CACHE_SIZE);
 		if (first_hole == 0) {
 			SetPageUptodate(page);
 			unlock_page(page);
@@ -579,8 +577,7 @@ page_is_mapped:
 
 		if (page->index > end_index || !offset)
 			goto confused;
-		zero_user_page(page, offset, PAGE_CACHE_SIZE - offset,
-				KM_USER0);
+		zero_user_segment(page, offset, PAGE_CACHE_SIZE);
 	}
 
 	/*
Index: vps/fs/ntfs/aops.c
===================================================================
--- vps.orig/fs/ntfs/aops.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/ntfs/aops.c	2007-06-11 22:33:07.000000000 -0700
@@ -87,13 +87,17 @@ static void ntfs_end_buffer_async_read(s
 		/* Check for the current buffer head overflowing. */
 		if (unlikely(file_ofs + bh->b_size > init_size)) {
 			int ofs;
+			void *kaddr;
 
 			ofs = 0;
 			if (file_ofs < init_size)
 				ofs = init_size - file_ofs;
 			local_irq_save(flags);
-			zero_user_page(page, bh_offset(bh) + ofs,
-					 bh->b_size - ofs, KM_BIO_SRC_IRQ);
+			kaddr = kmap_atomic(page, KM_BIO_SRC_IRQ);
+			memset(kaddr + bh_offset(bh) + ofs, 0,
+					bh->b_size - ofs);
+			flush_dcache_page(page);
+			kunmap_atomic(kaddr, KM_BIO_SRC_IRQ);
 			local_irq_restore(flags);
 		}
 	} else {
@@ -334,7 +338,7 @@ handle_hole:
 		bh->b_blocknr = -1UL;
 		clear_buffer_mapped(bh);
 handle_zblock:
-		zero_user_page(page, i * blocksize, blocksize, KM_USER0);
+		zero_user(page, i * blocksize, blocksize);
 		if (likely(!err))
 			set_buffer_uptodate(bh);
 	} while (i++, iblock++, (bh = bh->b_this_page) != head);
@@ -451,7 +455,7 @@ retry_readpage:
 	 * ok to ignore the compressed flag here.
 	 */
 	if (unlikely(page->index > 0)) {
-		zero_user_page(page, 0, PAGE_CACHE_SIZE, KM_USER0);
+		zero_user(page, 0, PAGE_CACHE_SIZE);
 		goto done;
 	}
 	if (!NInoAttr(ni))
@@ -780,8 +784,7 @@ lock_retry_remap:
 		if (err == -ENOENT || lcn == LCN_ENOENT) {
 			bh->b_blocknr = -1;
 			clear_buffer_dirty(bh);
-			zero_user_page(page, bh_offset(bh), blocksize,
-					KM_USER0);
+			zero_user(page, bh_offset(bh), blocksize);
 			set_buffer_uptodate(bh);
 			err = 0;
 			continue;
@@ -1406,8 +1409,7 @@ retry_writepage:
 		if (page->index >= (i_size >> PAGE_CACHE_SHIFT)) {
 			/* The page straddles i_size. */
 			unsigned int ofs = i_size & ~PAGE_CACHE_MASK;
-			zero_user_page(page, ofs, PAGE_CACHE_SIZE - ofs,
-					KM_USER0);
+			zero_user_segment(page, ofs, PAGE_CACHE_SIZE);
 		}
 		/* Handle mst protected attributes. */
 		if (NInoMstProtected(ni))
Index: vps/fs/reiserfs/inode.c
===================================================================
--- vps.orig/fs/reiserfs/inode.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/reiserfs/inode.c	2007-06-11 22:33:07.000000000 -0700
@@ -2151,7 +2151,7 @@ int reiserfs_truncate_file(struct inode 
 		/* if we are not on a block boundary */
 		if (length) {
 			length = blocksize - length;
-			zero_user_page(page, offset, length, KM_USER0);
+			zero_user(page, offset, length);
 			if (buffer_mapped(bh) && bh->b_blocknr != 0) {
 				mark_buffer_dirty(bh);
 			}
@@ -2375,7 +2375,7 @@ static int reiserfs_write_full_page(stru
 			unlock_page(page);
 			return 0;
 		}
-		zero_user_page(page, last_offset, PAGE_CACHE_SIZE - last_offset, KM_USER0);
+		zero_user_segment(page, last_offset, PAGE_CACHE_SIZE);
 	}
 	bh = head;
 	block = page->index << (PAGE_CACHE_SHIFT - s->s_blocksize_bits);
Index: vps/mm/truncate.c
===================================================================
--- vps.orig/mm/truncate.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/mm/truncate.c	2007-06-11 22:54:27.000000000 -0700
@@ -47,7 +47,7 @@ void do_invalidatepage(struct page *page
 
 static inline void truncate_partial_page(struct page *page, unsigned partial)
 {
-	zero_user_page(page, partial, PAGE_CACHE_SIZE - partial, KM_USER0);
+	zero_user_segment(page, partial, PAGE_CACHE_SIZE);
 	if (PagePrivate(page))
 		do_invalidatepage(page, partial);
 }
Index: vps/fs/direct-io.c
===================================================================
--- vps.orig/fs/direct-io.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/direct-io.c	2007-06-11 22:33:07.000000000 -0700
@@ -887,8 +887,8 @@ do_holes:
 					page_cache_release(page);
 					goto out;
 				}
-				zero_user_page(page, block_in_page << blkbits,
-						1 << blkbits, KM_USER0);
+				zero_user(page, block_in_page << blkbits,
+						1 << blkbits);
 				dio->block_in_file++;
 				block_in_page++;
 				goto next_block;
Index: vps/mm/filemap_xip.c
===================================================================
--- vps.orig/mm/filemap_xip.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/mm/filemap_xip.c	2007-06-11 22:54:27.000000000 -0700
@@ -461,7 +461,7 @@ xip_truncate_page(struct address_space *
 		else
 			return PTR_ERR(page);
 	}
-	zero_user_page(page, offset, length, KM_USER0);
+	zero_user(page, offset, length);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(xip_truncate_page);
Index: vps/fs/ext3/inode.c
===================================================================
--- vps.orig/fs/ext3/inode.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/ext3/inode.c	2007-06-11 22:33:07.000000000 -0700
@@ -1818,7 +1818,7 @@ static int ext3_block_truncate_page(hand
 	 */
 	if (!page_has_buffers(page) && test_opt(inode->i_sb, NOBH) &&
 	     ext3_should_writeback_data(inode) && PageUptodate(page)) {
-		zero_user_page(page, offset, length, KM_USER0);
+		zero_user(page, offset, length);
 		set_page_dirty(page);
 		goto unlock;
 	}
@@ -1871,7 +1871,7 @@ static int ext3_block_truncate_page(hand
 			goto unlock;
 	}
 
-	zero_user_page(page, offset, length, KM_USER0);
+	zero_user(page, offset, length);
 	BUFFER_TRACE(bh, "zeroed end of block");
 
 	err = 0;
Index: vps/fs/ntfs/file.c
===================================================================
--- vps.orig/fs/ntfs/file.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/ntfs/file.c	2007-06-11 22:33:07.000000000 -0700
@@ -607,8 +607,8 @@ do_next_page:
 					ntfs_submit_bh_for_read(bh);
 					*wait_bh++ = bh;
 				} else {
-					zero_user_page(page, bh_offset(bh),
-							blocksize, KM_USER0);
+					zero_user(page, bh_offset(bh),
+							blocksize);
 					set_buffer_uptodate(bh);
 				}
 			}
@@ -683,9 +683,8 @@ map_buffer_cached:
 						ntfs_submit_bh_for_read(bh);
 						*wait_bh++ = bh;
 					} else {
-						zero_user_page(page,
-							bh_offset(bh),
-							blocksize, KM_USER0);
+						zero_user(page, bh_offset(bh),
+								blocksize);
 						set_buffer_uptodate(bh);
 					}
 				}
@@ -703,8 +702,8 @@ map_buffer_cached:
 			 */
 			if (bh_end <= pos || bh_pos >= end) {
 				if (!buffer_uptodate(bh)) {
-					zero_user_page(page, bh_offset(bh),
-							blocksize, KM_USER0);
+					zero_user(page, bh_offset(bh),
+							blocksize);
 					set_buffer_uptodate(bh);
 				}
 				mark_buffer_dirty(bh);
@@ -743,8 +742,7 @@ map_buffer_cached:
 				if (!buffer_uptodate(bh))
 					set_buffer_uptodate(bh);
 			} else if (!buffer_uptodate(bh)) {
-				zero_user_page(page, bh_offset(bh), blocksize,
-						KM_USER0);
+				zero_user(page, bh_offset(bh), blocksize);
 				set_buffer_uptodate(bh);
 			}
 			continue;
@@ -868,8 +866,8 @@ rl_not_mapped_enoent:
 					if (!buffer_uptodate(bh))
 						set_buffer_uptodate(bh);
 				} else if (!buffer_uptodate(bh)) {
-					zero_user_page(page, bh_offset(bh),
-							blocksize, KM_USER0);
+					zero_user(page, bh_offset(bh),
+						blocksize);
 					set_buffer_uptodate(bh);
 				}
 				continue;
@@ -1128,8 +1126,8 @@ rl_not_mapped_enoent:
 
 				if (likely(bh_pos < initialized_size))
 					ofs = initialized_size - bh_pos;
-				zero_user_page(page, bh_offset(bh) + ofs,
-						blocksize - ofs, KM_USER0);
+				zero_user_segment(page, bh_offset(bh) + ofs,
+						blocksize);
 			}
 		} else /* if (unlikely(!buffer_uptodate(bh))) */
 			err = -EIO;
@@ -1269,8 +1267,8 @@ rl_not_mapped_enoent:
 				if (PageUptodate(page))
 					set_buffer_uptodate(bh);
 				else {
-					zero_user_page(page, bh_offset(bh),
-							blocksize, KM_USER0);
+					zero_user(page, bh_offset(bh),
+							blocksize);
 					set_buffer_uptodate(bh);
 				}
 			}
@@ -1330,7 +1328,7 @@ err_out:
 		len = PAGE_CACHE_SIZE;
 		if (len > bytes)
 			len = bytes;
-		zero_user_page(*pages, 0, len, KM_USER0);
+		zero_user(*pages, 0, len);
 	}
 	goto out;
 }
@@ -1451,7 +1449,7 @@ err_out:
 		len = PAGE_CACHE_SIZE;
 		if (len > bytes)
 			len = bytes;
-		zero_user_page(*pages, 0, len, KM_USER0);
+		zero_user(*pages, 0, len);
 	}
 	goto out;
 }
Index: vps/fs/nfs/read.c
===================================================================
--- vps.orig/fs/nfs/read.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/nfs/read.c	2007-06-11 22:33:07.000000000 -0700
@@ -79,7 +79,7 @@ void nfs_readdata_release(void *data)
 static
 int nfs_return_empty_page(struct page *page)
 {
-	zero_user_page(page, 0, PAGE_CACHE_SIZE, KM_USER0);
+	zero_user(page, 0, PAGE_CACHE_SIZE);
 	SetPageUptodate(page);
 	unlock_page(page);
 	return 0;
@@ -103,10 +103,10 @@ static void nfs_readpage_truncate_uninit
 	pglen = PAGE_CACHE_SIZE - base;
 	for (;;) {
 		if (remainder <= pglen) {
-			zero_user_page(*pages, base, remainder, KM_USER0);
+			zero_user(*pages, base, remainder);
 			break;
 		}
-		zero_user_page(*pages, base, pglen, KM_USER0);
+		zero_user(*pages, base, pglen);
 		pages++;
 		remainder -= pglen;
 		pglen = PAGE_CACHE_SIZE;
@@ -130,7 +130,7 @@ static int nfs_readpage_async(struct nfs
 		return PTR_ERR(new);
 	}
 	if (len < PAGE_CACHE_SIZE)
-		zero_user_page(page, len, PAGE_CACHE_SIZE - len, KM_USER0);
+		zero_user_segment(page, len, PAGE_CACHE_SIZE);
 
 	nfs_list_add_request(new, &one_request);
 	if (NFS_SERVER(inode)->rsize < PAGE_CACHE_SIZE)
@@ -538,7 +538,7 @@ readpage_async_filler(void *data, struct
 		goto out_error;
 
 	if (len < PAGE_CACHE_SIZE)
-		zero_user_page(page, len, PAGE_CACHE_SIZE - len, KM_USER0);
+		zero_user_segment(page, len, PAGE_CACHE_SIZE);
 	nfs_pageio_add_request(desc->pgio, new);
 	return 0;
 out_error:
Index: vps/fs/nfs/write.c
===================================================================
--- vps.orig/fs/nfs/write.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/nfs/write.c	2007-06-11 22:33:07.000000000 -0700
@@ -168,7 +168,7 @@ static void nfs_mark_uptodate(struct pag
 	if (count != nfs_page_length(page))
 		return;
 	if (count != PAGE_CACHE_SIZE)
-		zero_user_page(page, count, PAGE_CACHE_SIZE - count, KM_USER0);
+		zero_user_segment(page, count, PAGE_CACHE_SIZE);
 	SetPageUptodate(page);
 }
 
Index: vps/fs/xfs/linux-2.6/xfs_lrw.c
===================================================================
--- vps.orig/fs/xfs/linux-2.6/xfs_lrw.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/xfs/linux-2.6/xfs_lrw.c	2007-06-11 22:33:07.000000000 -0700
@@ -154,7 +154,7 @@ xfs_iozero(
 		if (status)
 			break;
 
-		zero_user_page(page, offset, bytes, KM_USER0);
+		zero_user(page, offset, bytes);
 
 		status = pagecache_write_end(NULL, mapping, pos, bytes, bytes,
 					page, fsdata);
Index: vps/fs/ecryptfs/mmap.c
===================================================================
--- vps.orig/fs/ecryptfs/mmap.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/ecryptfs/mmap.c	2007-06-11 22:33:07.000000000 -0700
@@ -370,8 +370,7 @@ static int fill_zeros_to_end_of_page(str
 	end_byte_in_page = i_size_read(inode) % PAGE_CACHE_SIZE;
 	if (to > end_byte_in_page)
 		end_byte_in_page = to;
-	zero_user_page(page, end_byte_in_page,
-		PAGE_CACHE_SIZE - end_byte_in_page, KM_USER0);
+	zero_user_segment(page, end_byte_in_page, PAGE_CACHE_SIZE);
 out:
 	return 0;
 }
@@ -784,7 +783,7 @@ int write_zeros(struct file *file, pgoff
 		page_cache_release(tmp_page);
 		goto out;
 	}
-	zero_user_page(tmp_page, start, num_zeros, KM_USER0);
+	zero_user(tmp_page, start, num_zeros);
 	rc = ecryptfs_commit_write(file, tmp_page, start, start + num_zeros);
 	if (rc < 0) {
 		ecryptfs_printk(KERN_ERR, "Error attempting to write zero's "
Index: vps/fs/gfs2/bmap.c
===================================================================
--- vps.orig/fs/gfs2/bmap.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/gfs2/bmap.c	2007-06-11 22:33:07.000000000 -0700
@@ -932,7 +932,7 @@ static int gfs2_block_truncate_page(stru
 	if (sdp->sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip))
 		gfs2_trans_add_bh(ip->i_gl, bh, 0);
 
-	zero_user_page(page, offset, length, KM_USER0);
+	zero_user(page, offset, length);
 
 unlock:
 	unlock_page(page);
Index: vps/fs/ocfs2/aops.c
===================================================================
--- vps.orig/fs/ocfs2/aops.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/ocfs2/aops.c	2007-06-11 22:33:07.000000000 -0700
@@ -238,7 +238,7 @@ static int ocfs2_readpage(struct file *f
 	 * XXX sys_readahead() seems to get that wrong?
 	 */
 	if (start >= i_size_read(inode)) {
-		zero_user_page(page, 0, PAGE_SIZE, KM_USER0);
+		zero_user(page, 0, PAGE_SIZE);
 		SetPageUptodate(page);
 		ret = 0;
 		goto out_alloc;
Index: vps/fs/reiser4/plugin/file/cryptcompress.c
===================================================================
--- vps.orig/fs/reiser4/plugin/file/cryptcompress.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/reiser4/plugin/file/cryptcompress.c	2007-06-11 22:33:07.000000000 -0700
@@ -1933,7 +1933,7 @@ static int write_hole(struct inode *inod
 
 		to_pg = min_count(PAGE_CACHE_SIZE - pg_off, cl_count);
 		lock_page(page);
-		zero_user_page(page, pg_off, to_pg, KM_USER0);
+		zero_user(page, pg_off, to_pg);
 		SetPageUptodate(page);
 		unlock_page(page);
 
@@ -2169,8 +2169,7 @@ static int read_some_cluster_pages(struc
 			off = off_to_pgoff(win->off+win->count+win->delta);
 			if (off) {
 				lock_page(pg);
-				zero_user_page(pg, off, PAGE_CACHE_SIZE - off,
-						KM_USER0);
+				zero_user_segment(pg, off, PAGE_CACHE_SIZE);
 				unlock_page(pg);
 			}
 		}
@@ -2217,8 +2216,7 @@ static int read_some_cluster_pages(struc
 
 			offset =
 			    off_to_pgoff(win->off + win->count + win->delta);
-			zero_user_page(pg, offset, PAGE_CACHE_SIZE - offset,
-					KM_USER0);
+			zero_user_segment(pg, offset, PAGE_CACHE_SIZE);
 			unlock_page(pg);
 			/* still not uptodate */
 			break;
Index: vps/fs/reiser4/plugin/file/file.c
===================================================================
--- vps.orig/fs/reiser4/plugin/file/file.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/reiser4/plugin/file/file.c	2007-06-11 22:33:07.000000000 -0700
@@ -538,7 +538,7 @@ static int shorten_file(struct inode *in
 
 	lock_page(page);
 	assert("vs-1066", PageLocked(page));
-	zero_user_page(page, padd_from, PAGE_CACHE_SIZE - padd_from, KM_USER0);
+	zero_user_segment(page, padd_from, PAGE_CACHE_SIZE);
 	unlock_page(page);
 	page_cache_release(page);
 	/* the below does up(sbinfo->delete_mutex). Do not get confused */
Index: vps/fs/reiser4/plugin/item/ctail.c
===================================================================
--- vps.orig/fs/reiser4/plugin/item/ctail.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/reiser4/plugin/item/ctail.c	2007-06-11 22:33:07.000000000 -0700
@@ -627,7 +627,7 @@ int do_readpage_ctail(struct inode * ino
 #endif
 	case FAKE_DISK_CLUSTER:
 		/* fill the page by zeroes */
-		zero_user_page(page, 0, PAGE_CACHE_SIZE, KM_USER0);
+		zero_user(page, 0, PAGE_CACHE_SIZE);
 		SetPageUptodate(page);
 		break;
 	case PREP_DISK_CLUSTER:
Index: vps/fs/reiser4/plugin/item/extent_file_ops.c
===================================================================
--- vps.orig/fs/reiser4/plugin/item/extent_file_ops.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/reiser4/plugin/item/extent_file_ops.c	2007-06-11 22:33:07.000000000 -0700
@@ -1112,7 +1112,7 @@ int reiser4_do_readpage_extent(reiser4_e
 		 */
 		j = jfind(mapping, index);
 		if (j == NULL) {
-			zero_user_page(page, 0, PAGE_CACHE_SIZE, KM_USER0);
+			zero_user(page, 0, PAGE_CACHE_SIZE);
 			SetPageUptodate(page);
 			unlock_page(page);
 			return 0;
@@ -1127,7 +1127,7 @@ int reiser4_do_readpage_extent(reiser4_e
 		block = *jnode_get_io_block(j);
 		spin_unlock_jnode(j);
 		if (block == 0) {
-			zero_user_page(page, 0, PAGE_CACHE_SIZE, KM_USER0);
+			zero_user(page, 0, PAGE_CACHE_SIZE);
 			SetPageUptodate(page);
 			unlock_page(page);
 			jput(j);
Index: vps/fs/reiser4/plugin/item/tail.c
===================================================================
--- vps.orig/fs/reiser4/plugin/item/tail.c	2007-06-11 22:33:01.000000000 -0700
+++ vps/fs/reiser4/plugin/item/tail.c	2007-06-11 22:33:07.000000000 -0700
@@ -392,8 +392,7 @@ static int do_readpage_tail(uf_coord_t *
 
  done:
 	if (mapped != PAGE_CACHE_SIZE)
-		zero_user_page(page, mapped, PAGE_CACHE_SIZE - mapped,
-				KM_USER0);
+		zero_user_segment(page, mapped, PAGE_CACHE_SIZE);
 	SetPageUptodate(page);
  out_unlock_page:
 	unlock_page(page);

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 03/14] Use page_cache_xx function in mm/filemap.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
  2007-06-14 19:38 ` [patch 01/14] Define functions for page cache handling clameter
  2007-06-14 19:38 ` [patch 02/14] Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 04/14] Use page_cache_xxx in mm/page-writeback.c clameter
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_mm_filemap --]
[-- Type: text/plain, Size: 8613 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/filemap.c |   76 +++++++++++++++++++++++++++++------------------------------
 1 file changed, 38 insertions(+), 38 deletions(-)

Index: vps/mm/filemap.c
===================================================================
--- vps.orig/mm/filemap.c	2007-06-08 10:57:37.000000000 -0700
+++ vps/mm/filemap.c	2007-06-09 21:15:04.000000000 -0700
@@ -304,8 +304,8 @@ EXPORT_SYMBOL(add_to_page_cache_lru);
 int sync_page_range(struct inode *inode, struct address_space *mapping,
 			loff_t pos, loff_t count)
 {
-	pgoff_t start = pos >> PAGE_CACHE_SHIFT;
-	pgoff_t end = (pos + count - 1) >> PAGE_CACHE_SHIFT;
+	pgoff_t start = page_cache_index(mapping, pos);
+	pgoff_t end = page_cache_index(mapping, pos + count - 1);
 	int ret;
 
 	if (!mapping_cap_writeback_dirty(mapping) || !count)
@@ -336,8 +336,8 @@ EXPORT_SYMBOL(sync_page_range);
 int sync_page_range_nolock(struct inode *inode, struct address_space *mapping,
 			   loff_t pos, loff_t count)
 {
-	pgoff_t start = pos >> PAGE_CACHE_SHIFT;
-	pgoff_t end = (pos + count - 1) >> PAGE_CACHE_SHIFT;
+	pgoff_t start = page_cache_index(mapping, pos);
+	pgoff_t end = page_cache_index(mapping, pos + count - 1);
 	int ret;
 
 	if (!mapping_cap_writeback_dirty(mapping) || !count)
@@ -366,7 +366,7 @@ int filemap_fdatawait(struct address_spa
 		return 0;
 
 	return wait_on_page_writeback_range(mapping, 0,
-				(i_size - 1) >> PAGE_CACHE_SHIFT);
+				page_cache_index(mapping, i_size - 1));
 }
 EXPORT_SYMBOL(filemap_fdatawait);
 
@@ -414,8 +414,8 @@ int filemap_write_and_wait_range(struct 
 		/* See comment of filemap_write_and_wait() */
 		if (err != -EIO) {
 			int err2 = wait_on_page_writeback_range(mapping,
-						lstart >> PAGE_CACHE_SHIFT,
-						lend >> PAGE_CACHE_SHIFT);
+					page_cache_index(mapping, lstart),
+					page_cache_index(mapping, lend));
 			if (!err)
 				err = err2;
 		}
@@ -881,28 +881,28 @@ void do_generic_mapping_read(struct addr
 	int error;
 	struct file_ra_state ra = *_ra;
 
-	index = *ppos >> PAGE_CACHE_SHIFT;
+	index = page_cache_index(mapping, *ppos);
 	next_index = index;
 	prev_index = ra.prev_index;
 	prev_offset = ra.prev_offset;
-	last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
-	offset = *ppos & ~PAGE_CACHE_MASK;
+	last_index = page_cache_next(mapping, *ppos + desc->count);
+	offset = page_cache_offset(mapping, *ppos);
 
 	isize = i_size_read(inode);
 	if (!isize)
 		goto out;
 
-	end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+	end_index = page_cache_index(mapping, isize - 1);
 	for (;;) {
 		struct page *page;
 		unsigned long nr, ret;
 
 		/* nr is the maximum number of bytes to copy from this page */
-		nr = PAGE_CACHE_SIZE;
+		nr = page_cache_size(mapping);
 		if (index >= end_index) {
 			if (index > end_index)
 				goto out;
-			nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1;
+			nr = page_cache_offset(mapping, isize - 1) + 1;
 			if (nr <= offset) {
 				goto out;
 			}
@@ -956,8 +956,8 @@ page_ok:
 		 */
 		ret = actor(desc, page, offset, nr);
 		offset += ret;
-		index += offset >> PAGE_CACHE_SHIFT;
-		offset &= ~PAGE_CACHE_MASK;
+		index += page_cache_index(mapping, offset);
+		offset = page_cache_offset(mapping, offset);
 		prev_offset = offset;
 		ra.prev_offset = offset;
 
@@ -1023,16 +1023,16 @@ readpage:
 		 * another truncate extends the file - this is desired though).
 		 */
 		isize = i_size_read(inode);
-		end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+		end_index = page_cache_index(mapping, isize - 1);
 		if (unlikely(!isize || index > end_index)) {
 			page_cache_release(page);
 			goto out;
 		}
 
 		/* nr is the maximum number of bytes to copy from this page */
-		nr = PAGE_CACHE_SIZE;
+		nr = page_cache_size(mapping);
 		if (index == end_index) {
-			nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1;
+			nr = page_cache_offset(mapping, isize - 1) + 1;
 			if (nr <= offset) {
 				page_cache_release(page);
 				goto out;
@@ -1073,7 +1073,7 @@ out:
 	*_ra = ra;
 	_ra->prev_index = prev_index;
 
-	*ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset;
+	*ppos = page_cache_pos(mapping, index, offset);
 	if (filp)
 		file_accessed(filp);
 }
@@ -1291,8 +1291,8 @@ asmlinkage ssize_t sys_readahead(int fd,
 	if (file) {
 		if (file->f_mode & FMODE_READ) {
 			struct address_space *mapping = file->f_mapping;
-			unsigned long start = offset >> PAGE_CACHE_SHIFT;
-			unsigned long end = (offset + count - 1) >> PAGE_CACHE_SHIFT;
+			unsigned long start = page_cache_index(mapping, offset);
+			unsigned long end = page_cache_index(mapping, offset + count - 1);
 			unsigned long len = end - start + 1;
 			ret = do_readahead(mapping, file, start, len);
 		}
@@ -1364,7 +1364,7 @@ struct page *filemap_fault(struct vm_are
 
 	BUG_ON(!(vma->vm_flags & VM_CAN_INVALIDATE));
 
-	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+	size = page_cache_next(mapping, i_size_read(inode));
 	if (fdata->pgoff >= size)
 		goto outside_data_content;
 
@@ -1439,7 +1439,7 @@ retry_find:
 		goto page_not_uptodate;
 
 	/* Must recheck i_size under page lock */
-	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+	size = page_cache_next(mapping, i_size_read(inode));
 	if (unlikely(fdata->pgoff >= size)) {
 		unlock_page(page);
 		goto outside_data_content;
@@ -1930,8 +1930,8 @@ int pagecache_write_begin(struct file *f
 							pagep, fsdata);
 	} else {
 		int ret;
-		pgoff_t index = pos >> PAGE_CACHE_SHIFT;
-		unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
+		pgoff_t index = page_cache_index(mapping, pos);
+		unsigned offset = page_cache_offset(mapping, pos);
 		struct inode *inode = mapping->host;
 		struct page *page;
 again:
@@ -1984,7 +1984,7 @@ int pagecache_write_end(struct file *fil
 		ret = aops->write_end(file, mapping, pos, len, copied,
 							page, fsdata);
 	} else {
-		unsigned offset = pos & (PAGE_CACHE_SIZE - 1);
+		unsigned offset = page_cache_offset(mapping, pos);
 		struct inode *inode = mapping->host;
 
 		flush_dcache_page(page);
@@ -2089,9 +2089,9 @@ static ssize_t generic_perform_write_2co
 		unsigned long bytes;	/* Bytes to write to page */
 		size_t copied;		/* Bytes copied from user */
 
-		offset = (pos & (PAGE_CACHE_SIZE - 1));
-		index = pos >> PAGE_CACHE_SHIFT;
-		bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
+		offset = page_cache_offset(mapping, pos );
+		index = page_cache_index(mapping, pos);
+		bytes = min_t(unsigned long, page_cache_size(mapping) - offset,
 						iov_iter_count(i));
 
 		/*
@@ -2267,9 +2267,9 @@ static ssize_t generic_perform_write(str
 		size_t copied;		/* Bytes copied from user */
 		void *fsdata;
 
-		offset = (pos & (PAGE_CACHE_SIZE - 1));
-		index = pos >> PAGE_CACHE_SHIFT;
-		bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
+		offset = page_cache_offset(mapping, pos);
+		index = page_cache_index(mapping, pos);
+		bytes = min_t(unsigned long, page_cache_size(mapping)  - offset,
 						iov_iter_count(i));
 
 again:
@@ -2316,7 +2316,7 @@ again:
 			 * because not all segments in the iov can be copied at
 			 * once without a pagefault.
 			 */
-			bytes = min_t(unsigned long, PAGE_CACHE_SIZE - offset,
+			bytes = min_t(unsigned long, page_cache_size(mapping) - offset,
 						iov_iter_single_seg_count(i));
 			goto again;
 		}
@@ -2459,8 +2459,8 @@ __generic_file_aio_write_nolock(struct k
 		if (err == 0) {
 			written = written_buffered;
 			invalidate_mapping_pages(mapping,
-						 pos >> PAGE_CACHE_SHIFT,
-						 endbyte >> PAGE_CACHE_SHIFT);
+						 page_cache_index(mapping, pos),
+						 page_cache_index(mapping, endbyte));
 		} else {
 			/*
 			 * We don't know how much we wrote, so just return
@@ -2547,7 +2547,7 @@ generic_file_direct_IO(int rw, struct ki
 	 */
 	if (rw == WRITE) {
 		write_len = iov_length(iov, nr_segs);
-		end = (offset + write_len - 1) >> PAGE_CACHE_SHIFT;
+		end = page_cache_index(mapping, offset + write_len - 1);
 	       	if (mapping_mapped(mapping))
 			unmap_mapping_range(mapping, offset, write_len, 0);
 	}
@@ -2564,7 +2564,7 @@ generic_file_direct_IO(int rw, struct ki
 	 */
 	if (rw == WRITE && mapping->nrpages) {
 		retval = invalidate_inode_pages2_range(mapping,
-					offset >> PAGE_CACHE_SHIFT, end);
+					page_cache_index(mapping, offset), end);
 		if (retval)
 			goto out;
 	}
@@ -2582,7 +2582,7 @@ generic_file_direct_IO(int rw, struct ki
 	 */
 	if (rw == WRITE && mapping->nrpages) {
 		int err = invalidate_inode_pages2_range(mapping,
-					      offset >> PAGE_CACHE_SHIFT, end);
+					      page_cache_index(mapping, offset), end);
 		if (err && retval >= 0)
 			retval = err;
 	}

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 04/14] Use page_cache_xxx in mm/page-writeback.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (2 preceding siblings ...)
  2007-06-14 19:38 ` [patch 03/14] Use page_cache_xx function in mm/filemap.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 05/14] Use page_cache_xxx in mm/truncate.c clameter
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_mm_page_writeback --]
[-- Type: text/plain, Size: 1241 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/page-writeback.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: vps/mm/page-writeback.c
===================================================================
--- vps.orig/mm/page-writeback.c	2007-06-07 17:01:04.000000000 -0700
+++ vps/mm/page-writeback.c	2007-06-09 21:34:24.000000000 -0700
@@ -626,8 +626,8 @@ int write_cache_pages(struct address_spa
 		index = mapping->writeback_index; /* Start from prev offset */
 		end = -1;
 	} else {
-		index = wbc->range_start >> PAGE_CACHE_SHIFT;
-		end = wbc->range_end >> PAGE_CACHE_SHIFT;
+		index = page_cache_index(mapping, wbc->range_start);
+		end = page_cache_index(mapping, wbc->range_end);
 		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
 			range_whole = 1;
 		scanned = 1;
@@ -829,7 +829,7 @@ int __set_page_dirty_nobuffers(struct pa
 			WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page));
 			if (mapping_cap_account_dirty(mapping)) {
 				__inc_zone_page_state(page, NR_FILE_DIRTY);
-				task_io_account_write(PAGE_CACHE_SIZE);
+				task_io_account_write(page_cache_size(mapping));
 			}
 			radix_tree_tag_set(&mapping->page_tree,
 				page_index(page), PAGECACHE_TAG_DIRTY);

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 05/14] Use page_cache_xxx in mm/truncate.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (3 preceding siblings ...)
  2007-06-14 19:38 ` [patch 04/14] Use page_cache_xxx in mm/page-writeback.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 06/14] Use page_cache_xxx in mm/rmap.c clameter
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_mm_truncate --]
[-- Type: text/plain, Size: 3797 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/truncate.c |   35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

Index: vps/mm/truncate.c
===================================================================
--- vps.orig/mm/truncate.c	2007-06-09 20:35:19.000000000 -0700
+++ vps/mm/truncate.c	2007-06-09 21:39:47.000000000 -0700
@@ -45,9 +45,10 @@ void do_invalidatepage(struct page *page
 		(*invalidatepage)(page, offset);
 }
 
-static inline void truncate_partial_page(struct page *page, unsigned partial)
+static inline void truncate_partial_page(struct address_space *mapping,
+			struct page *page, unsigned partial)
 {
-	zero_user_segment(page, partial, PAGE_CACHE_SIZE);
+	zero_user_segment(page, partial, page_cache_size(mapping));
 	if (PagePrivate(page))
 		do_invalidatepage(page, partial);
 }
@@ -95,7 +96,7 @@ truncate_complete_page(struct address_sp
 	if (page->mapping != mapping)
 		return;
 
-	cancel_dirty_page(page, PAGE_CACHE_SIZE);
+	cancel_dirty_page(page, page_cache_size(mapping));
 
 	if (PagePrivate(page))
 		do_invalidatepage(page, 0);
@@ -157,9 +158,9 @@ invalidate_complete_page(struct address_
 void truncate_inode_pages_range(struct address_space *mapping,
 				loff_t lstart, loff_t lend)
 {
-	const pgoff_t start = (lstart + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
+	const pgoff_t start = page_cache_next(mapping, lstart);
 	pgoff_t end;
-	const unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
+	const unsigned partial = page_cache_offset(mapping, lstart);
 	struct pagevec pvec;
 	pgoff_t next;
 	int i;
@@ -167,8 +168,9 @@ void truncate_inode_pages_range(struct a
 	if (mapping->nrpages == 0)
 		return;
 
-	BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
-	end = (lend >> PAGE_CACHE_SHIFT);
+	BUG_ON(page_cache_offset(mapping, lend) !=
+				page_cache_size(mapping) - 1);
+	end = page_cache_index(mapping, lend);
 
 	pagevec_init(&pvec, 0);
 	next = start;
@@ -194,8 +196,8 @@ void truncate_inode_pages_range(struct a
 			}
 			if (page_mapped(page)) {
 				unmap_mapping_range(mapping,
-				  (loff_t)page_index<<PAGE_CACHE_SHIFT,
-				  PAGE_CACHE_SIZE, 0);
+				  page_cache_pos(mapping, page_index, 0),
+				  page_cache_size(mapping), 0);
 			}
 			truncate_complete_page(mapping, page);
 			unlock_page(page);
@@ -208,7 +210,7 @@ void truncate_inode_pages_range(struct a
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
 			wait_on_page_writeback(page);
-			truncate_partial_page(page, partial);
+			truncate_partial_page(mapping, page, partial);
 			unlock_page(page);
 			page_cache_release(page);
 		}
@@ -236,8 +238,8 @@ void truncate_inode_pages_range(struct a
 			wait_on_page_writeback(page);
 			if (page_mapped(page)) {
 				unmap_mapping_range(mapping,
-				  (loff_t)page->index<<PAGE_CACHE_SHIFT,
-				  PAGE_CACHE_SIZE, 0);
+				  page_cache_pos(mapping, page->index, 0),
+				  page_cache_size(mapping), 0);
 			}
 			if (page->index > next)
 				next = page->index;
@@ -421,9 +423,8 @@ int invalidate_inode_pages2_range(struct
 					 * Zap the rest of the file in one hit.
 					 */
 					unmap_mapping_range(mapping,
-					   (loff_t)page_index<<PAGE_CACHE_SHIFT,
-					   (loff_t)(end - page_index + 1)
-							<< PAGE_CACHE_SHIFT,
+					   page_cache_pos(mapping, page_index, 0),
+					   page_cache_pos(mapping, end - page_index + 1, 0),
 					    0);
 					did_range_unmap = 1;
 				} else {
@@ -431,8 +432,8 @@ int invalidate_inode_pages2_range(struct
 					 * Just zap this page
 					 */
 					unmap_mapping_range(mapping,
-					  (loff_t)page_index<<PAGE_CACHE_SHIFT,
-					  PAGE_CACHE_SIZE, 0);
+					  page_cache_pos(mapping, page_index, 0),
+					  page_cache_size(mapping), 0);
 				}
 			}
 			BUG_ON(page_mapped(page));

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 06/14] Use page_cache_xxx in mm/rmap.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (4 preceding siblings ...)
  2007-06-14 19:38 ` [patch 05/14] Use page_cache_xxx in mm/truncate.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 07/14] Use page_cache_xx in mm/filemap_xip.c clameter
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_mm_rmap --]
[-- Type: text/plain, Size: 1925 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/rmap.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: linux-2.6.22-rc4-mm2/mm/rmap.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/mm/rmap.c	2007-06-14 10:35:45.000000000 -0700
+++ linux-2.6.22-rc4-mm2/mm/rmap.c	2007-06-14 10:49:29.000000000 -0700
@@ -210,9 +210,14 @@
 static inline unsigned long
 vma_address(struct page *page, struct vm_area_struct *vma)
 {
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff_t pgoff;
 	unsigned long address;
 
+	if (PageAnon(page))
+		pgoff = page->index;
+	else
+		pgoff = page->index << mapping_order(page->mapping);
+
 	address = vma->vm_start + ((pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 	if (unlikely(address < vma->vm_start || address >= vma->vm_end)) {
 		/* page should be within any vma from prio_tree_next */
@@ -357,7 +362,7 @@
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff_t pgoff = page->index << (page_cache_shift(mapping) - PAGE_SHIFT);
 	struct vm_area_struct *vma;
 	struct prio_tree_iter iter;
 	int referenced = 0;
@@ -469,7 +474,7 @@
 
 static int page_mkclean_file(struct address_space *mapping, struct page *page)
 {
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff_t pgoff = page->index << (page_cache_shift(mapping) - PAGE_SHIFT);
 	struct vm_area_struct *vma;
 	struct prio_tree_iter iter;
 	int ret = 0;
@@ -885,7 +890,7 @@
 static int try_to_unmap_file(struct page *page, int migration)
 {
 	struct address_space *mapping = page->mapping;
-	pgoff_t pgoff = page->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff_t pgoff = page->index << (page_cache_shift(mapping) - PAGE_SHIFT);
 	struct vm_area_struct *vma;
 	struct prio_tree_iter iter;
 	int ret = SWAP_AGAIN;

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 07/14] Use page_cache_xx in mm/filemap_xip.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (5 preceding siblings ...)
  2007-06-14 19:38 ` [patch 06/14] Use page_cache_xxx in mm/rmap.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 08/14] Use page_cache_xx in mm/migrate.c clameter
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_mm_filemap_xip --]
[-- Type: text/plain, Size: 2873 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/filemap_xip.c |   28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

Index: vps/mm/filemap_xip.c
===================================================================
--- vps.orig/mm/filemap_xip.c	2007-06-09 21:52:40.000000000 -0700
+++ vps/mm/filemap_xip.c	2007-06-09 21:58:11.000000000 -0700
@@ -60,24 +60,24 @@ do_xip_mapping_read(struct address_space
 
 	BUG_ON(!mapping->a_ops->get_xip_page);
 
-	index = *ppos >> PAGE_CACHE_SHIFT;
-	offset = *ppos & ~PAGE_CACHE_MASK;
+	index = page_cache_index(mapping, *ppos);
+	offset = page_cache_offset(mapping, *ppos);
 
 	isize = i_size_read(inode);
 	if (!isize)
 		goto out;
 
-	end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+	end_index = page_cache_index(mapping, isize - 1);
 	for (;;) {
 		struct page *page;
 		unsigned long nr, ret;
 
 		/* nr is the maximum number of bytes to copy from this page */
-		nr = PAGE_CACHE_SIZE;
+		nr = page_cache_size(mapping);
 		if (index >= end_index) {
 			if (index > end_index)
 				goto out;
-			nr = ((isize - 1) & ~PAGE_CACHE_MASK) + 1;
+			nr = page_cache_next(mapping, size - 1) + 1;
 			if (nr <= offset) {
 				goto out;
 			}
@@ -116,8 +116,8 @@ do_xip_mapping_read(struct address_space
 		 */
 		ret = actor(desc, page, offset, nr);
 		offset += ret;
-		index += offset >> PAGE_CACHE_SHIFT;
-		offset &= ~PAGE_CACHE_MASK;
+		index += page_cache_index(mapping, offset);
+		offset = page_cache_offset(mapping, offset);
 
 		if (ret == nr && desc->count)
 			continue;
@@ -130,7 +130,7 @@ no_xip_page:
 	}
 
 out:
-	*ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset;
+	*ppos = page_cache_pos(mapping, index, offset);
 	if (filp)
 		file_accessed(filp);
 }
@@ -242,7 +242,7 @@ static struct page *xip_file_fault(struc
 
 	/* XXX: are VM_FAULT_ codes OK? */
 
-	size = (i_size_read(inode) + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+	size = page_cache_next(mapping, i_size_read(inode));
 	if (fdata->pgoff >= size) {
 		fdata->type = VM_FAULT_SIGBUS;
 		return NULL;
@@ -320,9 +320,9 @@ __xip_file_write(struct file *filp, cons
 		size_t copied;
 		char *kaddr;
 
-		offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
-		index = pos >> PAGE_CACHE_SHIFT;
-		bytes = PAGE_CACHE_SIZE - offset;
+		offset = page_cache_offset(mapping, pos); /* Within page */
+		index = page_cache_index(mapping, pos);
+		bytes = page_cache_size(mapping) - offset;
 		if (bytes > count)
 			bytes = count;
 
@@ -433,8 +433,8 @@ EXPORT_SYMBOL_GPL(xip_file_write);
 int
 xip_truncate_page(struct address_space *mapping, loff_t from)
 {
-	pgoff_t index = from >> PAGE_CACHE_SHIFT;
-	unsigned offset = from & (PAGE_CACHE_SIZE-1);
+	pgoff_t index = page_cache_index(mapping, from);
+	unsigned offset = page_cache_offset(mapping, from);
 	unsigned blocksize;
 	unsigned length;
 	struct page *page;

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 08/14] Use page_cache_xx in mm/migrate.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (6 preceding siblings ...)
  2007-06-14 19:38 ` [patch 07/14] Use page_cache_xx in mm/filemap_xip.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 09/14] Use page_cache_xx in fs/libfs.c clameter
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_mm_migrate --]
[-- Type: text/plain, Size: 669 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/migrate.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: vps/mm/migrate.c
===================================================================
--- vps.orig/mm/migrate.c	2007-06-11 15:56:37.000000000 -0700
+++ vps/mm/migrate.c	2007-06-11 22:05:16.000000000 -0700
@@ -196,7 +196,7 @@ static void remove_file_migration_ptes(s
 	struct vm_area_struct *vma;
 	struct address_space *mapping = page_mapping(new);
 	struct prio_tree_iter iter;
-	pgoff_t pgoff = new->index << (PAGE_CACHE_SHIFT - PAGE_SHIFT);
+	pgoff_t pgoff = new->index << mapping_order(mapping);
 
 	if (!mapping)
 		return;

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 09/14] Use page_cache_xx in fs/libfs.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (7 preceding siblings ...)
  2007-06-14 19:38 ` [patch 08/14] Use page_cache_xx in mm/migrate.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 10/14] Use page_cache_xx in fs/sync clameter
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_fs_libfs --]
[-- Type: text/plain, Size: 2134 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 fs/libfs.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Index: vps/fs/libfs.c
===================================================================
--- vps.orig/fs/libfs.c	2007-06-11 21:39:09.000000000 -0700
+++ vps/fs/libfs.c	2007-06-11 22:08:13.000000000 -0700
@@ -16,7 +16,8 @@ int simple_getattr(struct vfsmount *mnt,
 {
 	struct inode *inode = dentry->d_inode;
 	generic_fillattr(inode, stat);
-	stat->blocks = inode->i_mapping->nrpages << (PAGE_CACHE_SHIFT - 9);
+	stat->blocks = inode->i_mapping->nrpages <<
+				(page_cache_shift(inode->i_mapping) - 9);
 	return 0;
 }
 
@@ -340,10 +341,10 @@ int simple_prepare_write(struct file *fi
 			unsigned from, unsigned to)
 {
 	if (!PageUptodate(page)) {
-		if (to - from != PAGE_CACHE_SIZE)
+		if (to - from != page_cache_size(file->f_mapping))
 			zero_user_segments(page,
 				0, from,
-				to, PAGE_CACHE_SIZE);
+				to, page_cache_size(file->f_mapping));
 	}
 	return 0;
 }
@@ -356,8 +357,8 @@ int simple_write_begin(struct file *file
 	pgoff_t index;
 	unsigned from;
 
-	index = pos >> PAGE_CACHE_SHIFT;
-	from = pos & (PAGE_CACHE_SIZE - 1);
+	index = page_cache_index(mapping, pos);
+	from = page_cache_offset(mapping, pos);
 
 	page = __grab_cache_page(mapping, index);
 	if (!page)
@@ -371,8 +372,9 @@ int simple_write_begin(struct file *file
 int simple_commit_write(struct file *file, struct page *page,
 			unsigned from, unsigned to)
 {
-	struct inode *inode = page->mapping->host;
-	loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+	struct address_space *mapping = page->mapping;
+	struct inode *inode = mapping->host;
+	loff_t pos = page_cache_pos(mapping, page->index, to);
 
 	if (!PageUptodate(page))
 		SetPageUptodate(page);
@@ -390,7 +392,7 @@ int simple_write_end(struct file *file, 
 			loff_t pos, unsigned len, unsigned copied,
 			struct page *page, void *fsdata)
 {
-	unsigned from = pos & (PAGE_CACHE_SIZE - 1);
+	unsigned from = page_cache_offset(mapping, pos);
 
 	/* zero the stale part of the page if we did a short copy */
 	if (copied < len)

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 10/14] Use page_cache_xx in fs/sync.
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (8 preceding siblings ...)
  2007-06-14 19:38 ` [patch 09/14] Use page_cache_xx in fs/libfs.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 11/14] Use page_cache_xx in fs/buffer.c clameter
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_fs_sync --]
[-- Type: text/plain, Size: 1025 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 fs/sync.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: vps/fs/sync.c
===================================================================
--- vps.orig/fs/sync.c	2007-06-04 17:57:25.000000000 -0700
+++ vps/fs/sync.c	2007-06-09 21:17:45.000000000 -0700
@@ -252,8 +252,8 @@ int do_sync_mapping_range(struct address
 	ret = 0;
 	if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
 		ret = wait_on_page_writeback_range(mapping,
-					offset >> PAGE_CACHE_SHIFT,
-					endbyte >> PAGE_CACHE_SHIFT);
+					page_cache_index(mapping, offset),
+					page_cache_index(mapping, endbyte));
 		if (ret < 0)
 			goto out;
 	}
@@ -267,8 +267,8 @@ int do_sync_mapping_range(struct address
 
 	if (flags & SYNC_FILE_RANGE_WAIT_AFTER) {
 		ret = wait_on_page_writeback_range(mapping,
-					offset >> PAGE_CACHE_SHIFT,
-					endbyte >> PAGE_CACHE_SHIFT);
+					page_cache_index(mapping, offset),
+					page_cache_index(mapping, endbyte));
 	}
 out:
 	return ret;

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 11/14] Use page_cache_xx in fs/buffer.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (9 preceding siblings ...)
  2007-06-14 19:38 ` [patch 10/14] Use page_cache_xx in fs/sync clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 12/14] Use page_cache_xxx in mm/mpage.c clameter
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_fs_buffer --]
[-- Type: text/plain, Size: 12577 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 fs/buffer.c |   99 +++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 56 insertions(+), 43 deletions(-)

Index: vps/fs/buffer.c
===================================================================
--- vps.orig/fs/buffer.c	2007-06-11 22:33:07.000000000 -0700
+++ vps/fs/buffer.c	2007-06-11 22:34:34.000000000 -0700
@@ -265,7 +265,7 @@ __find_get_block_slow(struct block_devic
 	struct page *page;
 	int all_mapped = 1;
 
-	index = block >> (PAGE_CACHE_SHIFT - bd_inode->i_blkbits);
+	index = block >> (page_cache_shift(bd_mapping) - bd_inode->i_blkbits);
 	page = find_get_page(bd_mapping, index);
 	if (!page)
 		goto out;
@@ -705,7 +705,7 @@ static int __set_page_dirty(struct page 
 
 		if (mapping_cap_account_dirty(mapping)) {
 			__inc_zone_page_state(page, NR_FILE_DIRTY);
-			task_io_account_write(PAGE_CACHE_SIZE);
+			task_io_account_write(page_cache_size(mapping));
 		}
 		radix_tree_tag_set(&mapping->page_tree,
 				page_index(page), PAGECACHE_TAG_DIRTY);
@@ -899,10 +899,11 @@ struct buffer_head *alloc_page_buffers(s
 {
 	struct buffer_head *bh, *head;
 	long offset;
+	unsigned page_size = page_cache_size(page->mapping);
 
 try_again:
 	head = NULL;
-	offset = PAGE_SIZE;
+	offset = page_size;
 	while ((offset -= size) >= 0) {
 		bh = alloc_buffer_head(GFP_NOFS);
 		if (!bh)
@@ -1434,7 +1435,7 @@ void set_bh_page(struct buffer_head *bh,
 		struct page *page, unsigned long offset)
 {
 	bh->b_page = page;
-	BUG_ON(offset >= PAGE_SIZE);
+	BUG_ON(offset >= page_cache_size(page->mapping));
 	if (PageHighMem(page))
 		/*
 		 * This catches illegal uses and preserves the offset:
@@ -1613,6 +1614,7 @@ static int __block_write_full_page(struc
 	struct buffer_head *bh, *head;
 	const unsigned blocksize = 1 << inode->i_blkbits;
 	int nr_underway = 0;
+	struct address_space *mapping = inode->i_mapping;
 
 	BUG_ON(!PageLocked(page));
 
@@ -1633,7 +1635,8 @@ static int __block_write_full_page(struc
 	 * handle that here by just cleaning them.
 	 */
 
-	block = (sector_t)page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
+	block = (sector_t)page->index <<
+		(page_cache_shift(mapping) - inode->i_blkbits);
 	head = page_buffers(page);
 	bh = head;
 
@@ -1750,7 +1753,7 @@ recover:
 	} while ((bh = bh->b_this_page) != head);
 	SetPageError(page);
 	BUG_ON(PageWriteback(page));
-	mapping_set_error(page->mapping, err);
+	mapping_set_error(mapping, err);
 	set_page_writeback(page);
 	do {
 		struct buffer_head *next = bh->b_this_page;
@@ -1817,8 +1820,8 @@ static int __block_prepare_write(struct 
 	struct buffer_head *bh, *head, *wait[2], **wait_bh=wait;
 
 	BUG_ON(!PageLocked(page));
-	BUG_ON(from > PAGE_CACHE_SIZE);
-	BUG_ON(to > PAGE_CACHE_SIZE);
+	BUG_ON(from > page_cache_size(inode->i_mapping));
+	BUG_ON(to > page_cache_size(inode->i_mapping));
 	BUG_ON(from > to);
 
 	blocksize = 1 << inode->i_blkbits;
@@ -1827,7 +1830,8 @@ static int __block_prepare_write(struct 
 	head = page_buffers(page);
 
 	bbits = inode->i_blkbits;
-	block = (sector_t)page->index << (PAGE_CACHE_SHIFT - bbits);
+	block = (sector_t)page->index <<
+		(page_cache_shift(inode->i_mapping) - bbits);
 
 	for(bh = head, block_start = 0; bh != head || !block_start;
 	    block++, block_start=block_end, bh = bh->b_this_page) {
@@ -1942,8 +1946,8 @@ int block_write_begin(struct file *file,
 	unsigned start, end;
 	int ownpage = 0;
 
-	index = pos >> PAGE_CACHE_SHIFT;
-	start = pos & (PAGE_CACHE_SIZE - 1);
+	index = page_cache_index(mapping, pos);
+	start = page_cache_offset(mapping, pos);
 	end = start + len;
 
 	page = *pagep;
@@ -1989,7 +1993,7 @@ int block_write_end(struct file *file, s
 	struct inode *inode = mapping->host;
 	unsigned start;
 
-	start = pos & (PAGE_CACHE_SIZE - 1);
+	start = page_cache_offset(mapping, pos);
 
 	if (unlikely(copied < len)) {
 		/*
@@ -2065,7 +2069,8 @@ int block_read_full_page(struct page *pa
 		create_empty_buffers(page, blocksize, 0);
 	head = page_buffers(page);
 
-	iblock = (sector_t)page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
+	iblock = (sector_t)page->index <<
+		(page_cache_shift(page->mapping) - inode->i_blkbits);
 	lblock = (i_size_read(inode)+blocksize-1) >> inode->i_blkbits;
 	bh = head;
 	nr = 0;
@@ -2183,16 +2188,17 @@ int cont_expand_zero(struct file *file, 
 	unsigned zerofrom, offset, len;
 	int err = 0;
 
-	index = pos >> PAGE_CACHE_SHIFT;
-	offset = pos & ~PAGE_CACHE_MASK;
+	index = page_cache_index(mapping, pos);
+	offset = page_cache_offset(mapping, pos);
 
-	while (index > (curidx = (curpos = *bytes)>>PAGE_CACHE_SHIFT)) {
-		zerofrom = curpos & ~PAGE_CACHE_MASK;
+	while (index > (curidx = page_cache_index(mapping,
+					(curpos = *bytes)))) {
+		zerofrom = page_cache_offset(mapping, curpos);
 		if (zerofrom & (blocksize-1)) {
 			*bytes |= (blocksize-1);
 			(*bytes)++;
 		}
-		len = PAGE_CACHE_SIZE - zerofrom;
+		len = page_cache_size(mapping) - zerofrom;
 
 		err = pagecache_write_begin(file, mapping, curpos, len,
 						AOP_FLAG_UNINTERRUPTIBLE,
@@ -2210,7 +2216,7 @@ int cont_expand_zero(struct file *file, 
 
 	/* page covers the boundary, find the boundary offset */
 	if (index == curidx) {
-		zerofrom = curpos & ~PAGE_CACHE_MASK;
+		zerofrom = page_cache_offset(mapping, curpos);
 		/* if we will expand the thing last block will be filled */
 		if (offset <= zerofrom) {
 			goto out;
@@ -2256,7 +2262,7 @@ int cont_write_begin(struct file *file, 
 	if (err)
 		goto out;
 
-	zerofrom = *bytes & ~PAGE_CACHE_MASK;
+	zerofrom = page_cache_offset(mapping, *bytes);
 	if (pos+len > *bytes && zerofrom & (blocksize-1)) {
 		*bytes |= (blocksize-1);
 		(*bytes)++;
@@ -2289,8 +2295,9 @@ int block_commit_write(struct page *page
 int generic_commit_write(struct file *file, struct page *page,
 		unsigned from, unsigned to)
 {
-	struct inode *inode = page->mapping->host;
-	loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+	struct address_space *mapping = page->mapping;
+	struct inode *inode = mapping->host;
+	loff_t pos = page_cache_pos(mapping, page->index, to);
 	__block_commit_write(inode,page,from,to);
 	/*
 	 * No need to use i_size_read() here, the i_size
@@ -2332,6 +2339,7 @@ static void end_buffer_read_nobh(struct 
 int nobh_prepare_write(struct page *page, unsigned from, unsigned to,
 			get_block_t *get_block)
 {
+	struct address_space *mapping = page->mapping;
 	struct inode *inode = page->mapping->host;
 	const unsigned blkbits = inode->i_blkbits;
 	const unsigned blocksize = 1 << blkbits;
@@ -2339,6 +2347,7 @@ int nobh_prepare_write(struct page *page
 	struct buffer_head *read_bh[MAX_BUF_PER_PAGE];
 	unsigned block_in_page;
 	unsigned block_start;
+	unsigned page_size = page_cache_size(mapping);
 	sector_t block_in_file;
 	int nr_reads = 0;
 	int i;
@@ -2348,7 +2357,8 @@ int nobh_prepare_write(struct page *page
 	if (PageMappedToDisk(page))
 		return 0;
 
-	block_in_file = (sector_t)page->index << (PAGE_CACHE_SHIFT - blkbits);
+	block_in_file = (sector_t)page->index <<
+			(page_cache_shift(mapping) - blkbits);
 	map_bh.b_page = page;
 
 	/*
@@ -2357,7 +2367,7 @@ int nobh_prepare_write(struct page *page
 	 * page is fully mapped-to-disk.
 	 */
 	for (block_start = 0, block_in_page = 0;
-		  block_start < PAGE_CACHE_SIZE;
+		  block_start < page_size;
 		  block_in_page++, block_start += blocksize) {
 		unsigned block_end = block_start + blocksize;
 		int create;
@@ -2446,7 +2456,7 @@ failed:
 	 * Error recovery is pretty slack.  Clear the page and mark it dirty
 	 * so we'll later zero out any blocks which _were_ allocated.
 	 */
-	zero_user(page, 0, PAGE_CACHE_SIZE);
+	zero_user(page, 0, page_size);
 	SetPageUptodate(page);
 	set_page_dirty(page);
 	return ret;
@@ -2460,8 +2470,9 @@ EXPORT_SYMBOL(nobh_prepare_write);
 int nobh_commit_write(struct file *file, struct page *page,
 		unsigned from, unsigned to)
 {
-	struct inode *inode = page->mapping->host;
-	loff_t pos = ((loff_t)page->index << PAGE_CACHE_SHIFT) + to;
+	struct address_space *mapping = page->mapping;
+	struct inode *inode = mapping->host;
+	loff_t pos = page_cache_pos(mapping, page->index, to);
 
 	SetPageUptodate(page);
 	set_page_dirty(page);
@@ -2481,9 +2492,10 @@ EXPORT_SYMBOL(nobh_commit_write);
 int nobh_writepage(struct page *page, get_block_t *get_block,
 			struct writeback_control *wbc)
 {
-	struct inode * const inode = page->mapping->host;
+	struct address_space *mapping = page->mapping;
+	struct inode * const inode = mapping->host;
 	loff_t i_size = i_size_read(inode);
-	const pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT;
+	const pgoff_t end_index = page_cache_offset(mapping, i_size);
 	unsigned offset;
 	int ret;
 
@@ -2492,7 +2504,7 @@ int nobh_writepage(struct page *page, ge
 		goto out;
 
 	/* Is the page fully outside i_size? (truncate in progress) */
-	offset = i_size & (PAGE_CACHE_SIZE-1);
+	offset = page_cache_offset(mapping, i_size);
 	if (page->index >= end_index+1 || !offset) {
 		/*
 		 * The page may have dirty, unmapped buffers.  For example,
@@ -2515,7 +2527,7 @@ int nobh_writepage(struct page *page, ge
 	 * the  page size, the remaining memory is zeroed when mapped, and
 	 * writes to that region are not written out to the file."
 	 */
-	zero_user_segment(page, offset, PAGE_CACHE_SIZE);
+	zero_user_segment(page, offset, page_cache_size(mapping));
 out:
 	ret = mpage_writepage(page, get_block, wbc);
 	if (ret == -EAGAIN)
@@ -2531,8 +2543,8 @@ int nobh_truncate_page(struct address_sp
 {
 	struct inode *inode = mapping->host;
 	unsigned blocksize = 1 << inode->i_blkbits;
-	pgoff_t index = from >> PAGE_CACHE_SHIFT;
-	unsigned offset = from & (PAGE_CACHE_SIZE-1);
+	pgoff_t index = page_cache_index(mapping, from);
+	unsigned offset = page_cache_offset(mapping, from);
 	unsigned to;
 	struct page *page;
 	const struct address_space_operations *a_ops = mapping->a_ops;
@@ -2549,7 +2561,7 @@ int nobh_truncate_page(struct address_sp
 	to = (offset + blocksize) & ~(blocksize - 1);
 	ret = a_ops->prepare_write(NULL, page, offset, to);
 	if (ret == 0) {
-		zero_user_segment(page, offset, PAGE_CACHE_SIZE);
+		zero_user_segment(page, offset, page_cache_size(mapping));
 		/*
 		 * It would be more correct to call aops->commit_write()
 		 * here, but this is more efficient.
@@ -2567,8 +2579,8 @@ EXPORT_SYMBOL(nobh_truncate_page);
 int block_truncate_page(struct address_space *mapping,
 			loff_t from, get_block_t *get_block)
 {
-	pgoff_t index = from >> PAGE_CACHE_SHIFT;
-	unsigned offset = from & (PAGE_CACHE_SIZE-1);
+	pgoff_t index = page_cache_index(mapping, from);
+	unsigned offset = page_cache_offset(mapping, from);
 	unsigned blocksize;
 	sector_t iblock;
 	unsigned length, pos;
@@ -2585,8 +2597,8 @@ int block_truncate_page(struct address_s
 		return 0;
 
 	length = blocksize - length;
-	iblock = (sector_t)index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
-	
+	iblock = (sector_t)index <<
+			(page_cache_shift(mapping) - inode->i_blkbits);
 	page = grab_cache_page(mapping, index);
 	err = -ENOMEM;
 	if (!page)
@@ -2645,9 +2657,10 @@ out:
 int block_write_full_page(struct page *page, get_block_t *get_block,
 			struct writeback_control *wbc)
 {
-	struct inode * const inode = page->mapping->host;
+	struct address_space *mapping = page->mapping;
+	struct inode * const inode = mapping->host;
 	loff_t i_size = i_size_read(inode);
-	const pgoff_t end_index = i_size >> PAGE_CACHE_SHIFT;
+	const pgoff_t end_index = page_cache_index(mapping, i_size);
 	unsigned offset;
 
 	/* Is the page fully inside i_size? */
@@ -2655,7 +2668,7 @@ int block_write_full_page(struct page *p
 		return __block_write_full_page(inode, page, get_block, wbc);
 
 	/* Is the page fully outside i_size? (truncate in progress) */
-	offset = i_size & (PAGE_CACHE_SIZE-1);
+	offset = page_cache_offset(mapping, i_size);
 	if (page->index >= end_index+1 || !offset) {
 		/*
 		 * The page may have dirty, unmapped buffers.  For example,
@@ -2674,7 +2687,7 @@ int block_write_full_page(struct page *p
 	 * the  page size, the remaining memory is zeroed when mapped, and
 	 * writes to that region are not written out to the file."
 	 */
-	zero_user_segment(page, offset, PAGE_CACHE_SIZE);
+	zero_user_segment(page, offset, page_cache_size(mapping));
 	return __block_write_full_page(inode, page, get_block, wbc);
 }
 
@@ -2928,7 +2941,7 @@ int try_to_free_buffers(struct page *pag
 	 * dirty bit from being lost.
 	 */
 	if (ret)
-		cancel_dirty_page(page, PAGE_CACHE_SIZE);
+		cancel_dirty_page(page, page_cache_size(mapping));
 	spin_unlock(&mapping->private_lock);
 out:
 	if (buffers_to_free) {

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 12/14] Use page_cache_xxx in mm/mpage.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (10 preceding siblings ...)
  2007-06-14 19:38 ` [patch 11/14] Use page_cache_xx in fs/buffer.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 13/14] Use page_cache_xxx in mm/fadvise.c clameter
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_fs_mpage --]
[-- Type: text/plain, Size: 4096 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 fs/mpage.c |   28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

Index: vps/fs/mpage.c
===================================================================
--- vps.orig/fs/mpage.c	2007-06-11 22:33:07.000000000 -0700
+++ vps/fs/mpage.c	2007-06-11 22:37:24.000000000 -0700
@@ -133,7 +133,8 @@ mpage_alloc(struct block_device *bdev,
 static void 
 map_buffer_to_page(struct page *page, struct buffer_head *bh, int page_block) 
 {
-	struct inode *inode = page->mapping->host;
+	struct address_space *mapping = page->mapping;
+	struct inode *inode = mapping->host;
 	struct buffer_head *page_bh, *head;
 	int block = 0;
 
@@ -142,9 +143,9 @@ map_buffer_to_page(struct page *page, st
 		 * don't make any buffers if there is only one buffer on
 		 * the page and the page just needs to be set up to date
 		 */
-		if (inode->i_blkbits == PAGE_CACHE_SHIFT && 
+		if (inode->i_blkbits == page_cache_shift(mapping) &&
 		    buffer_uptodate(bh)) {
-			SetPageUptodate(page);    
+			SetPageUptodate(page);
 			return;
 		}
 		create_empty_buffers(page, 1 << inode->i_blkbits, 0);
@@ -177,9 +178,10 @@ do_mpage_readpage(struct bio *bio, struc
 		sector_t *last_block_in_bio, struct buffer_head *map_bh,
 		unsigned long *first_logical_block, get_block_t get_block)
 {
-	struct inode *inode = page->mapping->host;
+	struct address_space *mapping = page->mapping;
+	struct inode *inode = mapping->host;
 	const unsigned blkbits = inode->i_blkbits;
-	const unsigned blocks_per_page = PAGE_CACHE_SIZE >> blkbits;
+	const unsigned blocks_per_page = page_cache_size(mapping) >> blkbits;
 	const unsigned blocksize = 1 << blkbits;
 	sector_t block_in_file;
 	sector_t last_block;
@@ -196,7 +198,7 @@ do_mpage_readpage(struct bio *bio, struc
 	if (page_has_buffers(page))
 		goto confused;
 
-	block_in_file = (sector_t)page->index << (PAGE_CACHE_SHIFT - blkbits);
+	block_in_file = (sector_t)page->index << (page_cache_shift(mapping) - blkbits);
 	last_block = block_in_file + nr_pages * blocks_per_page;
 	last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits;
 	if (last_block > last_block_in_file)
@@ -284,7 +286,8 @@ do_mpage_readpage(struct bio *bio, struc
 	}
 
 	if (first_hole != blocks_per_page) {
-		zero_user_segment(page, first_hole << blkbits, PAGE_CACHE_SIZE);
+		zero_user_segment(page, first_hole << blkbits,
+					page_cache_size(mapping));
 		if (first_hole == 0) {
 			SetPageUptodate(page);
 			unlock_page(page);
@@ -462,7 +465,7 @@ static int __mpage_writepage(struct page
 	struct inode *inode = page->mapping->host;
 	const unsigned blkbits = inode->i_blkbits;
 	unsigned long end_index;
-	const unsigned blocks_per_page = PAGE_CACHE_SIZE >> blkbits;
+	const unsigned blocks_per_page = page_cache_size(mapping) >> blkbits;
 	sector_t last_block;
 	sector_t block_in_file;
 	sector_t blocks[MAX_BUF_PER_PAGE];
@@ -531,7 +534,8 @@ static int __mpage_writepage(struct page
 	 * The page has no buffers: map it to disk
 	 */
 	BUG_ON(!PageUptodate(page));
-	block_in_file = (sector_t)page->index << (PAGE_CACHE_SHIFT - blkbits);
+	block_in_file = (sector_t)page->index <<
+			(page_cache_shift(mapping) - blkbits);
 	last_block = (i_size - 1) >> blkbits;
 	map_bh.b_page = page;
 	for (page_block = 0; page_block < blocks_per_page; ) {
@@ -563,7 +567,7 @@ static int __mpage_writepage(struct page
 	first_unmapped = page_block;
 
 page_is_mapped:
-	end_index = i_size >> PAGE_CACHE_SHIFT;
+	end_index = page_cache_index(mapping, i_size);
 	if (page->index >= end_index) {
 		/*
 		 * The page straddles i_size.  It must be zeroed out on each
@@ -573,11 +577,11 @@ page_is_mapped:
 		 * is zeroed when mapped, and writes to that region are not
 		 * written out to the file."
 		 */
-		unsigned offset = i_size & (PAGE_CACHE_SIZE - 1);
+		unsigned offset = page_cache_offset(mapping, i_size);
 
 		if (page->index > end_index || !offset)
 			goto confused;
-		zero_user_segment(page, offset, PAGE_CACHE_SIZE);
+		zero_user_segment(page, offset, page_cache_size(mapping));
 	}
 
 	/*

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 13/14] Use page_cache_xxx in mm/fadvise.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (11 preceding siblings ...)
  2007-06-14 19:38 ` [patch 12/14] Use page_cache_xxx in mm/mpage.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 19:38 ` [patch 14/14] Use page_cache_xx in fs/splice.c clameter
  2007-06-14 20:06 ` [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support Andrew Morton
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_fs_fadvise --]
[-- Type: text/plain, Size: 1164 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 mm/fadvise.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Index: vps/mm/fadvise.c
===================================================================
--- vps.orig/mm/fadvise.c	2007-06-04 17:57:25.000000000 -0700
+++ vps/mm/fadvise.c	2007-06-09 21:32:46.000000000 -0700
@@ -79,8 +79,8 @@ asmlinkage long sys_fadvise64_64(int fd,
 		}
 
 		/* First and last PARTIAL page! */
-		start_index = offset >> PAGE_CACHE_SHIFT;
-		end_index = endbyte >> PAGE_CACHE_SHIFT;
+		start_index = page_cache_index(mapping, offset);
+		end_index = page_cache_index(mapping, endbyte);
 
 		/* Careful about overflow on the "+1" */
 		nrpages = end_index - start_index + 1;
@@ -100,8 +100,8 @@ asmlinkage long sys_fadvise64_64(int fd,
 			filemap_flush(mapping);
 
 		/* First and last FULL page! */
-		start_index = (offset+(PAGE_CACHE_SIZE-1)) >> PAGE_CACHE_SHIFT;
-		end_index = (endbyte >> PAGE_CACHE_SHIFT);
+		start_index = page_cache_next(mapping, offset);
+		end_index = page_cache_index(mapping, endbyte);
 
 		if (end_index >= start_index)
 			invalidate_mapping_pages(mapping, start_index,

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch 14/14] Use page_cache_xx in fs/splice.c
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (12 preceding siblings ...)
  2007-06-14 19:38 ` [patch 13/14] Use page_cache_xxx in mm/fadvise.c clameter
@ 2007-06-14 19:38 ` clameter
  2007-06-14 20:06 ` [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support Andrew Morton
  14 siblings, 0 replies; 44+ messages in thread
From: clameter @ 2007-06-14 19:38 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, Christoph Hellwig

[-- Attachment #1: vps_fs_splice --]
[-- Type: text/plain, Size: 2822 bytes --]

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 fs/splice.c |   23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

Index: vps/fs/splice.c
===================================================================
--- vps.orig/fs/splice.c	2007-06-09 22:18:02.000000000 -0700
+++ vps/fs/splice.c	2007-06-09 22:22:08.000000000 -0700
@@ -282,9 +282,9 @@ __generic_file_splice_read(struct file *
 		.ops = &page_cache_pipe_buf_ops,
 	};
 
-	index = *ppos >> PAGE_CACHE_SHIFT;
-	loff = *ppos & ~PAGE_CACHE_MASK;
-	nr_pages = (len + loff + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+	index = page_cache_index(mapping, *ppos);
+	loff = page_cache_offset(mapping, *ppos);
+	nr_pages = page_cache_next(mapping, len + loff);
 
 	if (nr_pages > PIPE_BUFFERS)
 		nr_pages = PIPE_BUFFERS;
@@ -345,7 +345,7 @@ __generic_file_splice_read(struct file *
 	 * Now loop over the map and see if we need to start IO on any
 	 * pages, fill in the partial map, etc.
 	 */
-	index = *ppos >> PAGE_CACHE_SHIFT;
+	index = page_cache_index(mapping, *ppos);
 	nr_pages = spd.nr_pages;
 	spd.nr_pages = 0;
 	for (page_nr = 0; page_nr < nr_pages; page_nr++) {
@@ -357,7 +357,8 @@ __generic_file_splice_read(struct file *
 		/*
 		 * this_len is the max we'll use from this page
 		 */
-		this_len = min_t(unsigned long, len, PAGE_CACHE_SIZE - loff);
+		this_len = min_t(unsigned long, len,
+					page_cache_size(mapping) - loff);
 		page = pages[page_nr];
 
 		if (PageReadahead(page))
@@ -416,7 +417,7 @@ __generic_file_splice_read(struct file *
 			 * i_size must be checked after ->readpage().
 			 */
 			isize = i_size_read(mapping->host);
-			end_index = (isize - 1) >> PAGE_CACHE_SHIFT;
+			end_index = page_cache_index(mapping, isize - 1);
 			if (unlikely(!isize || index > end_index))
 				break;
 
@@ -425,7 +426,8 @@ __generic_file_splice_read(struct file *
 			 * the length and stop
 			 */
 			if (end_index == index) {
-				loff = PAGE_CACHE_SIZE - (isize & ~PAGE_CACHE_MASK);
+				loff = page_cache_size(mapping)
+					- page_cache_offset(mapping, isize);
 				if (total_len + loff > isize)
 					break;
 				/*
@@ -557,6 +559,7 @@ static int pipe_to_file(struct pipe_inod
 	struct page *page;
 	void *fsdata;
 	int ret;
+	int pagesize = page_cache_size(mapping);
 
 	/*
 	 * make sure the data in this buffer is uptodate
@@ -565,11 +568,11 @@ static int pipe_to_file(struct pipe_inod
 	if (unlikely(ret))
 		return ret;
 
-	offset = sd->pos & ~PAGE_CACHE_MASK;
+	offset = page_cache_offset(mapping, sd->pos);
 
 	this_len = sd->len;
-	if (this_len + offset > PAGE_CACHE_SIZE)
-		this_len = PAGE_CACHE_SIZE - offset;
+	if (this_len + offset > pagesize)
+		this_len = pagesize - offset;
 
 	ret = pagecache_write_begin(file, mapping, sd->pos, sd->len,
 				AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata);

-- 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 01/14] Define functions for page cache handling
  2007-06-14 19:38 ` [patch 01/14] Define functions for page cache handling clameter
@ 2007-06-14 19:56   ` Sam Ravnborg
  2007-06-14 19:58     ` Christoph Lameter
  0 siblings, 1 reply; 44+ messages in thread
From: Sam Ravnborg @ 2007-06-14 19:56 UTC (permalink / raw)
  To: clameter; +Cc: akpm, linux-kernel, Christoph Hellwig

On Thu, Jun 14, 2007 at 12:38:40PM -0700, clameter@sgi.com wrote:
> We use the macros PAGE_CACHE_SIZE PAGE_CACHE_SHIFT PAGE_CACHE_MASK
> and PAGE_CACHE_ALIGN in various places in the kernel. Many times
> common operations like calculating the offset or the index are coded
> using shifts and adds. This patch provides inline function to
> get the calculations accomplished in a consistent way.
> 
> All functions take an address_space pointer. The address space pointer
> will be used in the future to eventually support a variable size
> page cache. Information reachable via the mapping may then determine
> page size.
> 
> New function			Related base page constant
> ---------------------------------------------------
> page_cache_shift(a)		PAGE_CACHE_SHIFT
> page_cache_size(a)		PAGE_CACHE_SIZE
> page_cache_mask(a)		PAGE_CACHE_MASK
> page_cache_index(a, pos)	Calculate page number from position
> page_cache_next(addr, pos)	Page number of next page
> page_cache_offset(a, pos)	Calculate offset into a page
> page_cache_pos(a, index, offset)
> 				Form position based on page number
> 				and an offset.
> 
> This provides a basis that would allow the conversion of all page cache
> handling in the kernel and ultimately allow the removal of the PAGE_CACHE_*
> constants.

We need access to PAGE_SIZE in vmlinux.lds.h.
What is your plan with that usage?

	Sam

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 01/14] Define functions for page cache handling
  2007-06-14 19:56   ` Sam Ravnborg
@ 2007-06-14 19:58     ` Christoph Lameter
  2007-06-14 20:07       ` Sam Ravnborg
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-06-14 19:58 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: akpm, linux-kernel, Christoph Hellwig

On Thu, 14 Jun 2007, Sam Ravnborg wrote:

> We need access to PAGE_SIZE in vmlinux.lds.h.
> What is your plan with that usage?

This is about PAGE_CACHE_xxx. No changes to PAGE__SIZE are planned.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
                   ` (13 preceding siblings ...)
  2007-06-14 19:38 ` [patch 14/14] Use page_cache_xx in fs/splice.c clameter
@ 2007-06-14 20:06 ` Andrew Morton
  2007-06-14 21:07   ` Christoph Hellwig
                     ` (3 more replies)
  14 siblings, 4 replies; 44+ messages in thread
From: Andrew Morton @ 2007-06-14 20:06 UTC (permalink / raw)
  To: clameter; +Cc: linux-kernel, Christoph Hellwig

On Thu, 14 Jun 2007 12:38:39 -0700
clameter@sgi.com wrote:

> This patchset cleans up the page cache handling by replacing
> open coded shifts and adds through inline function calls.

If we never inflict variable PAGE_CACHE_SIZE upon the kernel, these changes
become pointless obfuscation.

Let's put our horses ahead of our carts.  We had a lengthy discussion about
variable PAGE_CACHE_SIZE in which I pointed out that the performance
benefits could be replicated in a manner which doesn't add complexity to
core VFS and which provides immediate benefit to all filesystems without
any need to alter them: populate contiguous pagecache pages with physically
contiguous pages.

I think the best way to proceed would be to investigate that _general_
optimisation and then, based upon the results of that work, decide whether
further _specialised_ changes such as variable PAGE_CACHE_SIZE are needed,
and if so, what they should be.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 01/14] Define functions for page cache handling
  2007-06-14 19:58     ` Christoph Lameter
@ 2007-06-14 20:07       ` Sam Ravnborg
  0 siblings, 0 replies; 44+ messages in thread
From: Sam Ravnborg @ 2007-06-14 20:07 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, Christoph Hellwig

On Thu, Jun 14, 2007 at 12:58:21PM -0700, Christoph Lameter wrote:
> On Thu, 14 Jun 2007, Sam Ravnborg wrote:
> 
> > We need access to PAGE_SIZE in vmlinux.lds.h.
> > What is your plan with that usage?
> 
> This is about PAGE_CACHE_xxx. No changes to PAGE__SIZE are planned.

Obviously - thanks for the quick response.

	Sam

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 20:06 ` [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support Andrew Morton
@ 2007-06-14 21:07   ` Christoph Hellwig
  2007-06-14 21:25     ` Dave McCracken
  2007-06-14 21:20   ` Christoph Lameter
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2007-06-14 21:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, linux-kernel, Christoph Hellwig

On Thu, Jun 14, 2007 at 01:06:45PM -0700, Andrew Morton wrote:
> On Thu, 14 Jun 2007 12:38:39 -0700
> clameter@sgi.com wrote:
> 
> > This patchset cleans up the page cache handling by replacing
> > open coded shifts and adds through inline function calls.
> 
> If we never inflict variable PAGE_CACHE_SIZE upon the kernel, these changes
> become pointless obfuscation.
> 
> Let's put our horses ahead of our carts.  We had a lengthy discussion about
> variable PAGE_CACHE_SIZE in which I pointed out that the performance
> benefits could be replicated in a manner which doesn't add complexity to
> core VFS and which provides immediate benefit to all filesystems without
> any need to alter them: populate contiguous pagecache pages with physically
> contiguous pages.
> 
> I think the best way to proceed would be to investigate that _general_
> optimisation and then, based upon the results of that work, decide whether
> further _specialised_ changes such as variable PAGE_CACHE_SIZE are needed,
> and if so, what they should be.

Christophs patches are an extremly useful cleanup and can stand on their
own.  Right now PAGE_CACHE_SIZE and friends are in there and now one can
keep them distinct because their useage is not clear at all.  By making
the macros per-mapping at least the useage is clear.

That beeing said we should do a full conversion so that PAGE_CACHE_SIZE
just goes away, otherwise the whole excercise is rather pointless.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 20:06 ` [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support Andrew Morton
  2007-06-14 21:07   ` Christoph Hellwig
@ 2007-06-14 21:20   ` Christoph Lameter
  2007-06-14 21:32     ` Andrew Morton
  2007-06-14 23:54   ` David Chinner
  2007-07-02 18:16   ` Badari Pulavarty
  3 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-06-14 21:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Christoph Hellwig

On Thu, 14 Jun 2007, Andrew Morton wrote:

> If we never inflict variable PAGE_CACHE_SIZE upon the kernel, these changes
> become pointless obfuscation.

But there is no such resonable scenario that I am aware of unless we 
continue to add workarounds for the issues covered here to the VM.

And it was pointed out to you that such approach can never stand in place 
of the different uses of having a larger page cache.

> I think the best way to proceed would be to investigate that _general_
> optimisation and then, based upon the results of that work, decide whether
> further _specialised_ changes such as variable PAGE_CACHE_SIZE are needed,
> and if so, what they should be.

As has been pointed out performance is only one beneficial issue of
having a higher page cache. It is doubtful in principle that the proposed 
alternative can work given that locking overhead and management overhead
by the VM are not minimized but made more complex by your envisioned 
solution.

The solution here significantly cleans up the page cache even if we never 
go to the variable page cache. If we do get there then numerous 
workarounds that we have in the tree because of not supporting larger I/O 
go away cleaning up the VM further. The large disk sizes can be handled in 
a reasonable way (f.e. fsck times would decrease) since we can handle
large contiguous chunks of memory. This is a necessary strategic move for 
the Linux kernel. It would also pave the way of managing large chunks
of contiguous memory for other ways and has the potential of getting rid
of such sore spots as the hugetlb filesystem.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 21:07   ` Christoph Hellwig
@ 2007-06-14 21:25     ` Dave McCracken
  0 siblings, 0 replies; 44+ messages in thread
From: Dave McCracken @ 2007-06-14 21:25 UTC (permalink / raw)
  To: linux-kernel

On Thursday 14 June 2007, Christoph Hellwig wrote:
> Christophs patches are an extremly useful cleanup and can stand on their
> own.  Right now PAGE_CACHE_SIZE and friends are in there and now one can
> keep them distinct because their useage is not clear at all.  By making
> the macros per-mapping at least the useage is clear.
>
> That beeing said we should do a full conversion so that PAGE_CACHE_SIZE
> just goes away, otherwise the whole excercise is rather pointless.

I agree with Christoph  and Christoph here.  The page_cache_xxx() macros are 
cleaner than PAGE_CACHE_SIZE.  Too many places have gotten it wrong too many 
times.  Let's go ahead with them even if we never implement variable cache 
page size.

Dave McCracken

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 21:20   ` Christoph Lameter
@ 2007-06-14 21:32     ` Andrew Morton
  2007-06-14 21:37       ` Christoph Lameter
  2007-06-17  1:25       ` Arjan van de Ven
  0 siblings, 2 replies; 44+ messages in thread
From: Andrew Morton @ 2007-06-14 21:32 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, hch

> On Thu, 14 Jun 2007 14:20:04 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> > I think the best way to proceed would be to investigate that _general_
> > optimisation and then, based upon the results of that work, decide whether
> > further _specialised_ changes such as variable PAGE_CACHE_SIZE are needed,
> > and if so, what they should be.
> 
> As has been pointed out performance is only one beneficial issue of
> having a higher page cache. It is doubtful in principle that the proposed 
> alternative can work given that locking overhead and management overhead
> by the VM are not minimized but made more complex by your envisioned 
> solution.

Why do we have to replay all of this?

You: conceptully-new add-on which benefits 0.25% of the user base, provided
they select the right config options and filesystem.

Me: simpler enhancement which benefits 100% of the user base (ie: includes
4k blocksize, 4k pagesize) and which also fixes your performance problem
with that HBA.


We want the 100% case.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 21:32     ` Andrew Morton
@ 2007-06-14 21:37       ` Christoph Lameter
  2007-06-14 22:04         ` Andrew Morton
  2007-06-17  1:25       ` Arjan van de Ven
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-06-14 21:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, hch

On Thu, 14 Jun 2007, Andrew Morton wrote:

> We want the 100% case.

Yes that is what we intend to do. Universal support for larger blocksize. 
I.e. your desktop filesystem will use 64k page size and server platforms 
likely much larger. fsck times etc etc are becoming an issue for desktop 
systems given the capacities and lockinhg becomes an issue the more 
multicore your desktops become.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 21:37       ` Christoph Lameter
@ 2007-06-14 22:04         ` Andrew Morton
  2007-06-14 22:22           ` Christoph Lameter
                             ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Andrew Morton @ 2007-06-14 22:04 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, hch

> On Thu, 14 Jun 2007 14:37:33 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> On Thu, 14 Jun 2007, Andrew Morton wrote:
> 
> > We want the 100% case.
> 
> Yes that is what we intend to do. Universal support for larger blocksize. 
> I.e. your desktop filesystem will use 64k page size and server platforms 
> likely much larger.

With 64k pagesize the amount of memory required to hold a kernel tree (say)
will go from 270MB to 1400MB.   This is not an optimisation.

Several 64k pagesize people have already spent time looking at various
tail-packing schemes to get around this serious problem.  And that's on
_server_ class machines.  Large ones.  I don't think
laptop/desktop/samll-server machines would want to go anywhere near this.

> fsck times etc etc are becoming an issue for desktop 
> systems

I don't see what fsck has to do with it.

fsck is single-threaded (hence no locking issues) and operates against the
blockdev pagecache and does a _lot_ of small reads (indirect blocks,
especially).  If the memory consumption for each 4k read jumps to 64k, fsck
is likely to slow down due to performing a lot more additional IO and due
to entering page reclaim much earlier.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 22:04         ` Andrew Morton
@ 2007-06-14 22:22           ` Christoph Lameter
  2007-06-14 22:49             ` Andrew Morton
  2007-06-14 23:30           ` David Chinner
  2007-06-15 15:05           ` Dave Kleikamp
  2 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-06-14 22:22 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, hch

On Thu, 14 Jun 2007, Andrew Morton wrote:

> With 64k pagesize the amount of memory required to hold a kernel tree (say)
> will go from 270MB to 1400MB.   This is not an optimisation.

I do not think that the 100% users will do kernel compiles all day like 
we do. We likely would prefer 4k page size for our small text files.

> Several 64k pagesize people have already spent time looking at various
> tail-packing schemes to get around this serious problem.  And that's on
> _server_ class machines.  Large ones.  I don't think
> laptop/desktop/samll-server machines would want to go anywhere near this.

I never understood the point of that exercise. If you have variable page 
size then the 64k page size can be used specific to files that benefit 
from it. Typically usage scenarios are video audio streaming I/O, large 
picture files, large documents with embedded images. These are the major
usage scenarioes today and we suck the. Our DVD/CD subsystems are 
currently not capable of directly reading from these devices into the page 
cache since they do not do I/O in 4k chunks.

> > fsck times etc etc are becoming an issue for desktop 
> > systems
> 
> I don't see what fsck has to do with it.
> 
> fsck is single-threaded (hence no locking issues) and operates against the
> blockdev pagecache and does a _lot_ of small reads (indirect blocks,
> especially).  If the memory consumption for each 4k read jumps to 64k, fsck
> is likely to slow down due to performing a lot more additional IO and due
> to entering page reclaim much earlier.

Every 64k block contains more information and the number of pages managed
is reduced by a factor of 16. Less seeks , less tlb pressure , less reads, 
more cpu cache and cpu cache prefetch friendly behavior.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 22:22           ` Christoph Lameter
@ 2007-06-14 22:49             ` Andrew Morton
  2007-06-15  0:45               ` Christoph Lameter
  0 siblings, 1 reply; 44+ messages in thread
From: Andrew Morton @ 2007-06-14 22:49 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, hch

> On Thu, 14 Jun 2007 15:22:46 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> On Thu, 14 Jun 2007, Andrew Morton wrote:
> 
> > With 64k pagesize the amount of memory required to hold a kernel tree (say)
> > will go from 270MB to 1400MB.   This is not an optimisation.
> 
> I do not think that the 100% users will do kernel compiles all day like 
> we do. We likely would prefer 4k page size for our small text files.

There are many, many applications which use small files.

> > Several 64k pagesize people have already spent time looking at various
> > tail-packing schemes to get around this serious problem.  And that's on
> > _server_ class machines.  Large ones.  I don't think
> > laptop/desktop/samll-server machines would want to go anywhere near this.
> 
> I never understood the point of that exercise. If you have variable page 
> size then the 64k page size can be used specific to files that benefit 
> from it. Typically usage scenarios are video audio streaming I/O, large 
> picture files, large documents with embedded images. These are the major
> usage scenarioes today and we suck the. Our DVD/CD subsystems are 
> currently not capable of directly reading from these devices into the page 
> cache since they do not do I/O in 4k chunks.

So with sufficient magical kernel heuristics or operator intervention, some
people will gain some benefit from 64k pagesize.  Most people with most
workloads will remain where they are: shoving zillions of physically
discontiguous pages into fixed-size sg lists.

Whereas with contig-pagecache, all users on all machines with all workloads
will benefit from the improved merging.

> > > fsck times etc etc are becoming an issue for desktop 
> > > systems
> > 
> > I don't see what fsck has to do with it.
> > 
> > fsck is single-threaded (hence no locking issues) and operates against the
> > blockdev pagecache and does a _lot_ of small reads (indirect blocks,
> > especially).  If the memory consumption for each 4k read jumps to 64k, fsck
> > is likely to slow down due to performing a lot more additional IO and due
> > to entering page reclaim much earlier.
> 
> Every 64k block contains more information and the number of pages managed
> is reduced by a factor of 16. Less seeks , less tlb pressure , less reads, 
> more cpu cache and cpu cache prefetch friendly behavior.

argh.  Everything you say is just wrong.  A fsck involves zillions of
discontiguous small reads.  It is largely seek-bound, so there is no
benefit to be had here.  Your proposed change will introduce regressions by
causing larger amounts of physical reading and large amounts of memory
consumption.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 22:04         ` Andrew Morton
  2007-06-14 22:22           ` Christoph Lameter
@ 2007-06-14 23:30           ` David Chinner
  2007-06-14 23:41             ` Andrew Morton
  2007-06-15 15:05           ` Dave Kleikamp
  2 siblings, 1 reply; 44+ messages in thread
From: David Chinner @ 2007-06-14 23:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, linux-kernel, hch

On Thu, Jun 14, 2007 at 03:04:17PM -0700, Andrew Morton wrote:
> fsck is single-threaded (hence no locking issues) and operates against the
> blockdev pagecache and does a _lot_ of small reads (indirect blocks,
> especially).

Commenting purely about the above statement (and not on large pages
or block sizes), xfs-repair has had multithreaded capability for some
time now. E.g. from the xfs_repair man page:

       -M    Disable  multi-threaded  mode. Normally, xfs_repair runs with
	     twice the number of threads as processors.

We have the second generation multithreading code out for review
right now. e.g:

http://oss.sgi.com/archives/xfs/2007-06/msg00069.html

xfs_repair also uses direct I/O and does it's own userspace block
caching and so avoids the problems involved with low memory, context
unaware cache reclaim and blockdev cache thrashing.

And to top it all off, some of the prefetch smarts we added result
in reading multiple sparse metadata blocks in a single, larger I/O,
so repair is now often bandwidth bound rather than seek bound...

All I'm trying to say here is that you shouldn't assume that the
problems a particular filesystem fsck has is common to all the
rest....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 23:30           ` David Chinner
@ 2007-06-14 23:41             ` Andrew Morton
  2007-06-15  0:29               ` David Chinner
  0 siblings, 1 reply; 44+ messages in thread
From: Andrew Morton @ 2007-06-14 23:41 UTC (permalink / raw)
  To: David Chinner; +Cc: clameter, linux-kernel, hch

> On Fri, 15 Jun 2007 09:30:02 +1000 David Chinner <dgc@sgi.com> wrote:
> On Thu, Jun 14, 2007 at 03:04:17PM -0700, Andrew Morton wrote:
> > fsck is single-threaded (hence no locking issues) and operates against the
> > blockdev pagecache and does a _lot_ of small reads (indirect blocks,
> > especially).
> 
> Commenting purely about the above statement (and not on large pages
> or block sizes), xfs-repair has had multithreaded capability for some
> time now. E.g. from the xfs_repair man page:
> 
>        -M    Disable  multi-threaded  mode. Normally, xfs_repair runs with
> 	     twice the number of threads as processors.
> 
> 
> We have the second generation multithreading code out for review
> right now. e.g:
> 
> http://oss.sgi.com/archives/xfs/2007-06/msg00069.html
> 
> xfs_repair also uses direct I/O and does it's own userspace block
> caching and so avoids the problems involved with low memory, context
> unaware cache reclaim and blockdev cache thrashing.

umm, that sounds like a mistake to me.  fscks tend to get run when there's
no swap online.  A small system with a large disk risks going oom and can
no longer be booted.  Whereas if the fsck relies upon kernel caching it'll
run slower but will complete.

> And to top it all off, some of the prefetch smarts we added result
> in reading multiple sparse metadata blocks in a single, larger I/O,
> so repair is now often bandwidth bound rather than seek bound...
> 
> All I'm trying to say here is that you shouldn't assume that the
> problems a particular filesystem fsck has is common to all the
> rest....

Yup.  I was of course referring to fsck.extN.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 20:06 ` [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support Andrew Morton
  2007-06-14 21:07   ` Christoph Hellwig
  2007-06-14 21:20   ` Christoph Lameter
@ 2007-06-14 23:54   ` David Chinner
  2007-07-02 18:16   ` Badari Pulavarty
  3 siblings, 0 replies; 44+ messages in thread
From: David Chinner @ 2007-06-14 23:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, linux-kernel, Christoph Hellwig

On Thu, Jun 14, 2007 at 01:06:45PM -0700, Andrew Morton wrote:
> On Thu, 14 Jun 2007 12:38:39 -0700
> clameter@sgi.com wrote:
> 
> > This patchset cleans up the page cache handling by replacing
> > open coded shifts and adds through inline function calls.
> 
> If we never inflict variable PAGE_CACHE_SIZE upon the kernel, these changes
> become pointless obfuscation.

The open coding of shifts, masks, and other associated cruft is a real
problem. It leads to ugly and hard to understand code when you have to do
anything complex. That means when you come back to that code 6 months later,
you've got to take to the time to understand exactly what all that logic is
doing again.

IMO, xfs_page_state_convert() is a great example of where open coding
of PAGE_CACHE_SIZE manipulations lead to eye-bleeding code. This
patch set would go a long way to help clean up that mess.

IOWs, like hch, I think this patch set stands on it's own merit
regardless of concerns over variable page cache page sizes....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 23:41             ` Andrew Morton
@ 2007-06-15  0:29               ` David Chinner
  0 siblings, 0 replies; 44+ messages in thread
From: David Chinner @ 2007-06-15  0:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Chinner, clameter, linux-kernel, hch

On Thu, Jun 14, 2007 at 04:41:18PM -0700, Andrew Morton wrote:
> > On Fri, 15 Jun 2007 09:30:02 +1000 David Chinner <dgc@sgi.com> wrote:
> > xfs_repair also uses direct I/O and does it's own userspace block
> > caching and so avoids the problems involved with low memory, context
> > unaware cache reclaim and blockdev cache thrashing.
> 
> umm, that sounds like a mistake to me.  fscks tend to get run when there's
> no swap online.  A small system with a large disk risks going oom and can
> no longer be booted. 

xfs_repair is never run at boot time - we don't force periodic
boot time checks like ext3/4 does so this isn't a problem.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 22:49             ` Andrew Morton
@ 2007-06-15  0:45               ` Christoph Lameter
  2007-06-15  1:40                 ` Andrew Morton
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-06-15  0:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, hch

On Thu, 14 Jun 2007, Andrew Morton wrote:

> > I do not think that the 100% users will do kernel compiles all day like 
> > we do. We likely would prefer 4k page size for our small text files.
> 
> There are many, many applications which use small files.

There is no problem with them using 4k page size concurrently to a higher 
page size for other files.

> > I never understood the point of that exercise. If you have variable page 
> > size then the 64k page size can be used specific to files that benefit 
> > from it. Typically usage scenarios are video audio streaming I/O, large 
> > picture files, large documents with embedded images. These are the major
> > usage scenarioes today and we suck the. Our DVD/CD subsystems are 
> > currently not capable of directly reading from these devices into the page 
> > cache since they do not do I/O in 4k chunks.
> 
> So with sufficient magical kernel heuristics or operator intervention, some
> people will gain some benefit from 64k pagesize.  Most people with most
> workloads will remain where they are: shoving zillions of physically
> discontiguous pages into fixed-size sg lists.

Magical? There is nothing magical about doing transfers in the size that 
is supported by a device. That is good sense.

> > Every 64k block contains more information and the number of pages managed
> > is reduced by a factor of 16. Less seeks , less tlb pressure , less reads, 
> > more cpu cache and cpu cache prefetch friendly behavior.
> 
> argh.  Everything you say is just wrong.  A fsck involves zillions of
> discontiguous small reads.  It is largely seek-bound, so there is no
> benefit to be had here.  Your proposed change will introduce regressions by
> causing larger amounts of physical reading and large amounts of memory
> consumption.

Of course there is. The seeks are reduced since there are an factor 
of 16 less metadata blocks. fsck does not read files. It just reads 
metadata structures. And the larger contiguous areas the faster.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-15  0:45               ` Christoph Lameter
@ 2007-06-15  1:40                 ` Andrew Morton
  2007-06-15  2:04                   ` Christoph Lameter
  0 siblings, 1 reply; 44+ messages in thread
From: Andrew Morton @ 2007-06-15  1:40 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, hch

On Thu, 14 Jun 2007 17:45:43 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:

> On Thu, 14 Jun 2007, Andrew Morton wrote:
> 
> > > I do not think that the 100% users will do kernel compiles all day like 
> > > we do. We likely would prefer 4k page size for our small text files.
> > 
> > There are many, many applications which use small files.
> 
> There is no problem with them using 4k page size concurrently to a higher 
> page size for other files.

There will be files which should use 64k but which instead end up using 4k.

There will be files which should use 4k but which instead end up using 64k.

Because determining which size to use requires either operator intervention
or kernel heuristics, both of which will be highly unreliable.

It's better to just make 4k pages go faster.

> > > I never understood the point of that exercise. If you have variable page 
> > > size then the 64k page size can be used specific to files that benefit 
> > > from it. Typically usage scenarios are video audio streaming I/O, large 
> > > picture files, large documents with embedded images. These are the major
> > > usage scenarioes today and we suck the. Our DVD/CD subsystems are 
> > > currently not capable of directly reading from these devices into the page 
> > > cache since they do not do I/O in 4k chunks.
> > 
> > So with sufficient magical kernel heuristics or operator intervention, some
> > people will gain some benefit from 64k pagesize.  Most people with most
> > workloads will remain where they are: shoving zillions of physically
> > discontiguous pages into fixed-size sg lists.
> 
> Magical? There is nothing magical about doing transfers in the size that 
> is supported by a device. That is good sense.

By magical heuristics I'm referring to the (required) tricks and guesses
which the kernel will need to deploy to be able to guess which page-size it
should use for each file.

Because without such heuristics, none of this new stuff which you're
proposing would ever get used by 90% of apps on 90% of machines.

> > > Every 64k block contains more information and the number of pages managed
> > > is reduced by a factor of 16. Less seeks , less tlb pressure , less reads, 
> > > more cpu cache and cpu cache prefetch friendly behavior.
> > 
> > argh.  Everything you say is just wrong.  A fsck involves zillions of
> > discontiguous small reads.  It is largely seek-bound, so there is no
> > benefit to be had here.  Your proposed change will introduce regressions by
> > causing larger amounts of physical reading and large amounts of memory
> > consumption.
> 
> Of course there is. The seeks are reduced since there are an factor 
> of 16 less metadata blocks. fsck does not read files. It just reads 
> metadata structures. And the larger contiguous areas the faster.

Some metadata is contiguous: inode tables, some directories (if they got
lucky), bitmap tables.  But fsck surely reads them in a single swoop
anyway, so there's no gain there.

Other metadata (indirect blocks) are 100% discontiguous, and reading those
with a 64k IO into 64k of memory is completely dumb.

And yes, I'm referring to the 90% case again.  The one we want to
optimise for.




^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-15  1:40                 ` Andrew Morton
@ 2007-06-15  2:04                   ` Christoph Lameter
  2007-06-15  2:23                     ` Andrew Morton
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Lameter @ 2007-06-15  2:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, hch

On Thu, 14 Jun 2007, Andrew Morton wrote:

> There will be files which should use 64k but which instead end up using 4k.
> 
> There will be files which should use 4k but which instead end up using 64k.
> 
> Because determining which size to use requires either operator intervention
> or kernel heuristics, both of which will be highly unreliable.
> 
> It's better to just make 4k pages go faster.

Initially its quite easy to have a filesystem for your 4k files (basically 
the distro you are running) and an archive for video / audio etc files 
that has 64k size for data. In the future filesystem may support sizes set 
per directory. Basically if things get to slow you can pull the lever.

> > Magical? There is nothing magical about doing transfers in the size that 
> > is supported by a device. That is good sense.
> 
> By magical heuristics I'm referring to the (required) tricks and guesses
> which the kernel will need to deploy to be able to guess which page-size it
> should use for each file.
> 
> Because without such heuristics, none of this new stuff which you're
> proposing would ever get used by 90% of apps on 90% of machines.

In the patchset V3 one f.e. simply formats a volume by specifying the 
desired blocksize. If one gets into trouble with fsck and other slowdown 
associated with large file I/O then they are going to be quite fast to 
format a partition with larger blocksize. Its a know technology in many 
Unixes.

The approach essentially gives one freedom to choose a page size. This is 
a tradeoff between desired speed, expected file sizes, filesystem behavior 
and acceptable fragmentation overhead. If we do this approach then I think 
we will see the mkfs.XXX  tools to automatically make intelligent choices
on which page size to use. They are all stuck at 4k at the moment.

> > Of course there is. The seeks are reduced since there are an factor 
> > of 16 less metadata blocks. fsck does not read files. It just reads 
> > metadata structures. And the larger contiguous areas the faster.
> 
> Some metadata is contiguous: inode tables, some directories (if they got
> lucky), bitmap tables.  But fsck surely reads them in a single swoop
> anyway, so there's no gain there.

The metadata needs to refer to 1/16th of the earlier pages that need to be 
tracked. metadata is shrunk significantly.

> Other metadata (indirect blocks) are 100% discontiguous, and reading those
> with a 64k IO into 64k of memory is completely dumb.

The effect of a larger page size is that the filesystem will 
place more meta data into a single page instead of spreading it out. 
Reading a mass of meta data with a 64k read is an intelligent choice to 
make in particular if there is a large series of such reads.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-15  2:04                   ` Christoph Lameter
@ 2007-06-15  2:23                     ` Andrew Morton
  2007-06-15  2:37                       ` Christoph Lameter
  2007-06-15  9:03                       ` David Chinner
  0 siblings, 2 replies; 44+ messages in thread
From: Andrew Morton @ 2007-06-15  2:23 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-kernel, hch

On Thu, 14 Jun 2007 19:04:27 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:

> > > Of course there is. The seeks are reduced since there are an factor 
> > > of 16 less metadata blocks. fsck does not read files. It just reads 
> > > metadata structures. And the larger contiguous areas the faster.
> > 
> > Some metadata is contiguous: inode tables, some directories (if they got
> > lucky), bitmap tables.  But fsck surely reads them in a single swoop
> > anyway, so there's no gain there.
> 
> The metadata needs to refer to 1/16th of the earlier pages that need to be 
> tracked. metadata is shrunk significantly.

Only if the filesystems are altered to use larger blocksizes and if the
operator then chooses to use that feature.  Then they suck for small-sized
(and even medium-sized) files.

So you're still talking about corner cases: specialised applications which
require careful setup and administrator intervention.

What can we do to optimise the common case?

> > Other metadata (indirect blocks) are 100% discontiguous, and reading those
> > with a 64k IO into 64k of memory is completely dumb.
> 
> The effect of a larger page size is that the filesystem will 
> place more meta data into a single page instead of spreading it out. 
> Reading a mass of meta data with a 64k read is an intelligent choice to 
> make in particular if there is a large series of such reads.

Again: requires larger blocksize: specialised, uninteresting for what will
remain the common case: 4k blocksize.

The alleged fsck benefit is also unrelated to variable PAGE_CACHE_SIZE. 
It's a feature of larger (unweildy?) blocksize, and xfs already has that
working (doesn't it?)

There may be some benefits to some future version of ext4.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-15  2:23                     ` Andrew Morton
@ 2007-06-15  2:37                       ` Christoph Lameter
  2007-06-15  9:03                       ` David Chinner
  1 sibling, 0 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-06-15  2:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, hch

On Thu, 14 Jun 2007, Andrew Morton wrote:

> > The metadata needs to refer to 1/16th of the earlier pages that need to be 
> > tracked. metadata is shrunk significantly.
> 
> Only if the filesystems are altered to use larger blocksizes and if the
> operator then chooses to use that feature.  Then they suck for small-sized
> (and even medium-sized) files.

Nope. File systems already support that. The changes to XFS and ext2 are 
basically just doing the cleanups that we are discussing here plus some 
changes to set_blocksize.

> So you're still talking about corner cases: specialised applications which
> require careful setup and administrator intervention.
> 
> What can we do to optimise the common case?

The common filesystem will be able to support large block sizes easily. 
Most filesystems already run on 16k and 64k page size platforms and do 
just fine. All the work is already done. Just the VM needs to give them 
support for lager page sizes on smaller page size platforms.

This is optimizing the common case.

> The alleged fsck benefit is also unrelated to variable PAGE_CACHE_SIZE. 
> It's a feature of larger (unweildy?) blocksize, and xfs already has that
> working (doesn't it?)

It has a hack with severe limitations like we have done in many other 
components of the kernel. These hacks can be removed if the large 
blocksize support is merged. XFS still has the problem that the block 
layer without page cache support for higher pages cannot easily deal with 
large contiguous pages.

> There may be some benefits to some future version of ext4.

I have already run ext4 with 64k blocksize on x86_64 with 4k. It has been 
done for years with 64k page size on IA64 and powerpc and there is no fs 
issue with running it on 4k platforms with the large blocksize patchset.
The filesystems work reliably. The core linux code is the issue that we 
need to solve and this is the beginning of doing so.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-15  2:23                     ` Andrew Morton
  2007-06-15  2:37                       ` Christoph Lameter
@ 2007-06-15  9:03                       ` David Chinner
  1 sibling, 0 replies; 44+ messages in thread
From: David Chinner @ 2007-06-15  9:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, linux-kernel, hch

On Thu, Jun 14, 2007 at 07:23:40PM -0700, Andrew Morton wrote:
> On Thu, 14 Jun 2007 19:04:27 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> > > > Of course there is. The seeks are reduced since there are an factor 
> > > > of 16 less metadata blocks. fsck does not read files. It just reads 
> > > > metadata structures. And the larger contiguous areas the faster.
> > > 
> > > Some metadata is contiguous: inode tables, some directories (if they got
> > > lucky), bitmap tables.  But fsck surely reads them in a single swoop
> > > anyway, so there's no gain there.
> > 
> > The metadata needs to refer to 1/16th of the earlier pages that need to be 
> > tracked. metadata is shrunk significantly.
> 
> Only if the filesystems are altered to use larger blocksizes and if the
> operator then chooses to use that feature.  Then they suck for small-sized
> (and even medium-sized) files.

Devil's Advocate:

In that case, we should remove support for block sizes smaller than
a page because they suck for large-sized (and even medium sized)
files and we shouldn't allow people to use them.

> So you're still talking about corner cases: specialised applications which
> require careful setup and administrator intervention.

Yes, like 512 byte block size filesystems using large directory
block sizes for dedicated mail servers. i.e. optimised for large
numbers of small files in each directory.

> What can we do to optimise the common case?

The common case is pretty good already for common case workloads.

What we need to do is provide options for workloads where tuning the
common case config is simply not sufficient. We already provide the
option to optimise for small file sizes, but we have no option to
optimise for large file sizes....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 22:04         ` Andrew Morton
  2007-06-14 22:22           ` Christoph Lameter
  2007-06-14 23:30           ` David Chinner
@ 2007-06-15 15:05           ` Dave Kleikamp
  2 siblings, 0 replies; 44+ messages in thread
From: Dave Kleikamp @ 2007-06-15 15:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, linux-kernel, hch

On Thu, 2007-06-14 at 15:04 -0700, Andrew Morton wrote:
> > On Thu, 14 Jun 2007 14:37:33 -0700 (PDT) Christoph Lameter <clameter@sgi.com> wrote:
> > On Thu, 14 Jun 2007, Andrew Morton wrote:
> > 
> > > We want the 100% case.
> > 
> > Yes that is what we intend to do. Universal support for larger blocksize. 
> > I.e. your desktop filesystem will use 64k page size and server platforms 
> > likely much larger.
> 
> With 64k pagesize the amount of memory required to hold a kernel tree (say)
> will go from 270MB to 1400MB.   This is not an optimisation.
> 
> Several 64k pagesize people have already spent time looking at various
> tail-packing schemes to get around this serious problem.  And that's on
> _server_ class machines.  Large ones.  I don't think
> laptop/desktop/samll-server machines would want to go anywhere near this.

I'm one of the ones investigating 64 KB pagesize tail-packing schemes,
and I believe Christoph's cleanups will reduce the intrusiveness and
improve the readability of a tail-packing solution.  I'll add my vote in
support of these patches.

Thanks,
Shaggy
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 21:32     ` Andrew Morton
  2007-06-14 21:37       ` Christoph Lameter
@ 2007-06-17  1:25       ` Arjan van de Ven
  2007-06-17  5:02         ` Matt Mackall
  1 sibling, 1 reply; 44+ messages in thread
From: Arjan van de Ven @ 2007-06-17  1:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, linux-kernel, hch


> You: conceptully-new add-on which benefits 0.25% of the user base, provided
> they select the right config options and filesystem.
> 
> Me: simpler enhancement which benefits 100% of the user base (ie: includes
> 4k blocksize, 4k pagesize) and which also fixes your performance problem
> with that HBA.

note that at least 2.6 is doing this "sort of", better than 2.4 at
least. (30% hitrate or something like that).

In addition, systems with an IOMMU (many really large systems have that,
as well as several x86 ones, with more and more of that in the future),
this is a moot point since the IOMMU will just linearize for the device.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-17  1:25       ` Arjan van de Ven
@ 2007-06-17  5:02         ` Matt Mackall
  2007-06-18  2:08           ` Christoph Lameter
  0 siblings, 1 reply; 44+ messages in thread
From: Matt Mackall @ 2007-06-17  5:02 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Andrew Morton, Christoph Lameter, linux-kernel, hch

On Sat, Jun 16, 2007 at 06:25:00PM -0700, Arjan van de Ven wrote:
> 
> > You: conceptully-new add-on which benefits 0.25% of the user base, provided
> > they select the right config options and filesystem.
> > 
> > Me: simpler enhancement which benefits 100% of the user base (ie: includes
> > 4k blocksize, 4k pagesize) and which also fixes your performance problem
> > with that HBA.
> 
> note that at least 2.6 is doing this "sort of", better than 2.4 at
> least. (30% hitrate or something like that).

Is it? Last I looked it had reverted to handing out reverse-contiguous
pages.

You can see this by running /proc/pid/pagemap through hexdump.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-17  5:02         ` Matt Mackall
@ 2007-06-18  2:08           ` Christoph Lameter
  2007-06-18  3:00             ` Arjan van de Ven
  2007-06-18  4:50             ` William Lee Irwin III
  0 siblings, 2 replies; 44+ messages in thread
From: Christoph Lameter @ 2007-06-18  2:08 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Arjan van de Ven, Andrew Morton, linux-kernel, hch

On Sun, 17 Jun 2007, Matt Mackall wrote:

> Is it? Last I looked it had reverted to handing out reverse-contiguous
> pages.

I thought that was fixed? Bill Irwin was working on it.

But the contiguous pages usually only work shortly after boot. After 
awhile memory gets sufficiently scrambled that the coalescing in the I/O 
layer becomes ineffective.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-18  2:08           ` Christoph Lameter
@ 2007-06-18  3:00             ` Arjan van de Ven
  2007-06-18  4:50             ` William Lee Irwin III
  1 sibling, 0 replies; 44+ messages in thread
From: Arjan van de Ven @ 2007-06-18  3:00 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Matt Mackall, Andrew Morton, linux-kernel, hch

On Sun, 2007-06-17 at 19:08 -0700, Christoph Lameter wrote:
> On Sun, 17 Jun 2007, Matt Mackall wrote:
> 
> > Is it? Last I looked it had reverted to handing out reverse-contiguous
> > pages.
> 
> I thought that was fixed? Bill Irwin was working on it.
> 
> But the contiguous pages usually only work shortly after boot. After 
> awhile memory gets sufficiently scrambled that the coalescing in the I/O 
> layer becomes ineffective.

the buddy allocator at least defragments itself somewhat (granted, it's
not perfect and the per cpu page queues spoil the game too...)

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-18  2:08           ` Christoph Lameter
  2007-06-18  3:00             ` Arjan van de Ven
@ 2007-06-18  4:50             ` William Lee Irwin III
  1 sibling, 0 replies; 44+ messages in thread
From: William Lee Irwin III @ 2007-06-18  4:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matt Mackall, Arjan van de Ven, Andrew Morton, linux-kernel, hch

On Sun, 17 Jun 2007, Matt Mackall wrote:
>> Is it? Last I looked it had reverted to handing out reverse-contiguous
>> pages.

On Sun, Jun 17, 2007 at 07:08:41PM -0700, Christoph Lameter wrote:
> I thought that was fixed? Bill Irwin was working on it.
> But the contiguous pages usually only work shortly after boot. After 
> awhile memory gets sufficiently scrambled that the coalescing in the I/O 
> layer becomes ineffective.

It fell off the bottom of my priority queue, sorry.


-- wli

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support
  2007-06-14 20:06 ` [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support Andrew Morton
                     ` (2 preceding siblings ...)
  2007-06-14 23:54   ` David Chinner
@ 2007-07-02 18:16   ` Badari Pulavarty
  3 siblings, 0 replies; 44+ messages in thread
From: Badari Pulavarty @ 2007-07-02 18:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: clameter, lkml, Christoph Hellwig

On Thu, 2007-06-14 at 13:06 -0700, Andrew Morton wrote:
> On Thu, 14 Jun 2007 12:38:39 -0700
> clameter@sgi.com wrote:
> 
> > This patchset cleans up the page cache handling by replacing
> > open coded shifts and adds through inline function calls.
> 

Some of us (crazy) people are trying to support read for hugetlbfs
in order to get oprofile work on large-page-backed-executables by
libhugetlbfs.

Currently, I can't use any generic support. I have this ugly patch
to get oprofile work. Christoph's clean ups would allow me to set
per-mapping pagesize and get this to work, without any hacks.

Thanks,
Badari

 fs/hugetlbfs/inode.c |  117 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)

Index: linux/fs/hugetlbfs/inode.c
===================================================================
--- linux.orig/fs/hugetlbfs/inode.c	2007-05-18 04:16:27.000000000 -0700
+++ linux/fs/hugetlbfs/inode.c	2007-06-22 10:46:09.000000000 -0700
@@ -160,6 +160,122 @@ full_search:
 #endif
 
 /*
+ * Support for read()
+ */
+static int
+hugetlbfs_read_actor(struct page *page, unsigned long offset,
+			char __user *buf, unsigned long count,
+			unsigned long size)
+{
+	char *kaddr;
+	unsigned long to_copy;
+	int i, chunksize;
+
+	if (size > count)
+		size = count;
+
+	/* Find which 4k chunk and offset with in that chunk */
+	i = offset >> PAGE_CACHE_SHIFT;
+	offset = offset & ~PAGE_CACHE_MASK;
+	to_copy = size;
+
+	while (to_copy) {
+		chunksize = PAGE_CACHE_SIZE;
+		if (offset)
+			chunksize -= offset;
+		if (chunksize > to_copy)
+			chunksize = to_copy;
+
+#if 0
+printk("Coping i=%d page: %p offset %d chunk %d\n", i, &page[i], offset, chunksize);
+#endif
+		kaddr = kmap(&page[i]);
+		memcpy(buf, kaddr + offset, chunksize);
+		kunmap(&page[i]);
+		offset = 0;
+		to_copy -= chunksize;
+		buf += chunksize;
+		i++;
+	}
+	return size;
+}
+
+
+ssize_t
+hugetlbfs_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
+{
+	struct address_space *mapping = filp->f_mapping;
+	struct inode *inode = mapping->host;
+	unsigned long index = *ppos >> HPAGE_SHIFT;
+	unsigned long end_index;
+	loff_t isize;
+	unsigned long offset;
+	ssize_t retval = 0;
+
+	/* validate user buffer and len */
+	if (len == 0)
+		goto out;
+
+	isize = i_size_read(inode);
+	if (!isize)
+		goto out;
+
+	offset = *ppos & ~HPAGE_MASK;
+	end_index = (isize - 1) >> HPAGE_SHIFT;
+	for (;;) {
+		struct page *page;
+		unsigned long nr, ret;
+
+		/* nr is the maximum number of bytes to copy from this page */
+		nr = HPAGE_SIZE;
+		if (index >= end_index) {
+			if (index > end_index)
+				goto out;
+			nr = ((isize - 1) & ~HPAGE_MASK) + 1;
+			if (nr <= offset) {
+				goto out;
+			}
+		}
+		nr = nr - offset;
+
+		/* Find the page */
+		page = find_get_page(mapping, index);
+		if (unlikely(page == NULL)) {
+			/*
+			 * We can't find the page in the cache - bail out
+			 * TODO - should we zero out the user buffer ?
+			 */
+			goto out;
+		}
+#if 0
+printk("Found page %p at index %d offset %d nr %d\n", page, index, offset, nr);
+#endif
+
+		/*
+		 * Ok, we have the page, so now we can copy it to user space...
+		 */
+		ret = hugetlbfs_read_actor(page, offset, buf, len, nr);
+		if (ret < 0) {
+			retval = retval ? : ret;
+			goto out;
+		}
+
+		offset += ret;
+		retval += ret;
+		len -= ret;
+		index += offset >> HPAGE_SHIFT;
+		offset &= ~HPAGE_MASK;
+
+		page_cache_release(page);
+		if (ret == nr && len)
+			continue;
+		goto out;
+	}
+out:
+	return retval;
+}
+
+/*
  * Read a page. Again trivial. If it didn't already exist
  * in the page cache, it is zero-filled.
  */
@@ -565,6 +681,7 @@ static void init_once(void *foo, kmem_ca
 }
 
 struct file_operations hugetlbfs_file_operations = {
+ 	.read			= hugetlbfs_read,
 	.mmap			= hugetlbfs_file_mmap,
 	.fsync			= simple_sync_file,
 	.get_unmapped_area	= hugetlb_get_unmapped_area,



^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2007-07-02 18:14 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-14 19:38 [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support clameter
2007-06-14 19:38 ` [patch 01/14] Define functions for page cache handling clameter
2007-06-14 19:56   ` Sam Ravnborg
2007-06-14 19:58     ` Christoph Lameter
2007-06-14 20:07       ` Sam Ravnborg
2007-06-14 19:38 ` [patch 02/14] Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user clameter
2007-06-14 19:38 ` [patch 03/14] Use page_cache_xx function in mm/filemap.c clameter
2007-06-14 19:38 ` [patch 04/14] Use page_cache_xxx in mm/page-writeback.c clameter
2007-06-14 19:38 ` [patch 05/14] Use page_cache_xxx in mm/truncate.c clameter
2007-06-14 19:38 ` [patch 06/14] Use page_cache_xxx in mm/rmap.c clameter
2007-06-14 19:38 ` [patch 07/14] Use page_cache_xx in mm/filemap_xip.c clameter
2007-06-14 19:38 ` [patch 08/14] Use page_cache_xx in mm/migrate.c clameter
2007-06-14 19:38 ` [patch 09/14] Use page_cache_xx in fs/libfs.c clameter
2007-06-14 19:38 ` [patch 10/14] Use page_cache_xx in fs/sync clameter
2007-06-14 19:38 ` [patch 11/14] Use page_cache_xx in fs/buffer.c clameter
2007-06-14 19:38 ` [patch 12/14] Use page_cache_xxx in mm/mpage.c clameter
2007-06-14 19:38 ` [patch 13/14] Use page_cache_xxx in mm/fadvise.c clameter
2007-06-14 19:38 ` [patch 14/14] Use page_cache_xx in fs/splice.c clameter
2007-06-14 20:06 ` [patch 00/14] Page cache cleanup in anticipation of Large Blocksize support Andrew Morton
2007-06-14 21:07   ` Christoph Hellwig
2007-06-14 21:25     ` Dave McCracken
2007-06-14 21:20   ` Christoph Lameter
2007-06-14 21:32     ` Andrew Morton
2007-06-14 21:37       ` Christoph Lameter
2007-06-14 22:04         ` Andrew Morton
2007-06-14 22:22           ` Christoph Lameter
2007-06-14 22:49             ` Andrew Morton
2007-06-15  0:45               ` Christoph Lameter
2007-06-15  1:40                 ` Andrew Morton
2007-06-15  2:04                   ` Christoph Lameter
2007-06-15  2:23                     ` Andrew Morton
2007-06-15  2:37                       ` Christoph Lameter
2007-06-15  9:03                       ` David Chinner
2007-06-14 23:30           ` David Chinner
2007-06-14 23:41             ` Andrew Morton
2007-06-15  0:29               ` David Chinner
2007-06-15 15:05           ` Dave Kleikamp
2007-06-17  1:25       ` Arjan van de Ven
2007-06-17  5:02         ` Matt Mackall
2007-06-18  2:08           ` Christoph Lameter
2007-06-18  3:00             ` Arjan van de Ven
2007-06-18  4:50             ` William Lee Irwin III
2007-06-14 23:54   ` David Chinner
2007-07-02 18:16   ` Badari Pulavarty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).