linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] readahead cleanups and interleaved readahead take 2
       [not found] <20070721035733.951838089@mail.ustc.edu.cn>
@ 2007-07-21  3:57 ` Fengguang Wu
       [not found] ` <20070721035850.977231489@mail.ustc.edu.cn>
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel

Linus,

To save you from some merge conflicts, I rebased this readahead patchset
to 2.6.22-git5.

The following patches are based on yesterday's discussions, compiled and
tested OK.

smaller file_ra_state:
	[PATCH 1/8] compacting file_ra_state                                          
	[PATCH 2/8] mmap read-around simplification                                   
	[PATCH 3/8] combine file_ra_state.prev_index/prev_offset into prev_pos        

code cleanups:
	[PATCH 4/8] trivial filemap.c cleanups                                        
	[PATCH 5/8] remove several readahead macros                                   
	[PATCH 6/8] remove the limit max_sectors_kb imposed on max_readahead_kb       

support of interleaved reads:
	[PATCH 7/8] introduce radix_tree_scan_hole()                                  
	[PATCH 8/8] basic support of interleaved reads                                


The diffstat is

 block/ll_rw_blk.c          |    9 -----
 fs/ext3/dir.c              |    2 -
 fs/ext4/dir.c              |    2 -
 fs/splice.c                |    2 -
 include/linux/fs.h         |   14 +++-----
 include/linux/mm.h         |    2 -
 include/linux/radix-tree.h |    2 +
 lib/radix-tree.c           |   34 ++++++++++++++++++++
 mm/filemap.c               |   31 +++++++++---------
 mm/readahead.c             |   58 +++++++++++++++++++----------------
 10 files changed, 92 insertions(+), 64 deletions(-)

Regards,
Fengguang Wu
---

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/8] compacting file_ra_state
       [not found] ` <20070721035850.977231489@mail.ustc.edu.cn>
@ 2007-07-21  3:57   ` Fengguang Wu
       [not found]   ` <20070721040644.GA9750@mail.ustc.edu.cn>
  1 sibling, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Andi Kleen

[-- Attachment #1: short-rasize.patch --]
[-- Type: text/plain, Size: 1624 bytes --]

Use 'unsigned int' instead of 'unsigned long' for readahead sizes.

This helps reduce memory consumption on 64bit CPU when
a lot of files are opened.

CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 include/linux/fs.h |    8 ++++----
 mm/readahead.c     |    2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/fs.h
+++ linux-2.6.22-git15/include/linux/fs.h
@@ -697,12 +697,12 @@ struct fown_struct {
  * Track a single file's readahead state
  */
 struct file_ra_state {
-	pgoff_t start;                  /* where readahead started */
-	unsigned long size;             /* # of readahead pages */
-	unsigned long async_size;       /* do asynchronous readahead when
+	pgoff_t start;			/* where readahead started */
+	unsigned int size;		/* # of readahead pages */
+	unsigned int async_size;	/* do asynchronous readahead when
 					   there are only # of pages ahead */
 
-	unsigned long ra_pages;		/* Maximum readahead window */
+	unsigned int ra_pages;		/* Maximum readahead window */
 	unsigned long mmap_hit;		/* Cache hit stat for mmap accesses */
 	unsigned long mmap_miss;	/* Cache miss stat for mmap accesses */
 	unsigned long prev_index;	/* Cache last read() position */
--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -350,7 +350,7 @@ ondemand_readahead(struct address_space 
 		   bool hit_readahead_marker, pgoff_t offset,
 		   unsigned long req_size)
 {
-	unsigned long max;	/* max readahead pages */
+	int max;	/* max readahead pages */
 	int sequential;
 
 	max = ra->ra_pages;

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/8] mmap read-around simplification
       [not found] ` <20070721035851.185553787@mail.ustc.edu.cn>
@ 2007-07-21  3:57   ` Fengguang Wu
  0 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel

[-- Attachment #1: remove-mmap-hit.patch --]
[-- Type: text/plain, Size: 1352 bytes --]

Fold file_ra_state.mmap_hit into file_ra_state.mmap_miss
and make it an int.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 include/linux/fs.h |    3 +--
 mm/filemap.c       |    4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/fs.h
+++ linux-2.6.22-git15/include/linux/fs.h
@@ -703,8 +703,7 @@ struct file_ra_state {
 					   there are only # of pages ahead */
 
 	unsigned int ra_pages;		/* Maximum readahead window */
-	unsigned long mmap_hit;		/* Cache hit stat for mmap accesses */
-	unsigned long mmap_miss;	/* Cache miss stat for mmap accesses */
+	int mmap_miss;			/* Cache miss stat for mmap accesses */
 	unsigned long prev_index;	/* Cache last read() position */
 	unsigned int prev_offset;	/* Offset where last read() ended in a page */
 };
--- linux-2.6.22-git15.orig/mm/filemap.c
+++ linux-2.6.22-git15/mm/filemap.c
@@ -1369,7 +1369,7 @@ retry_find:
 		 * Do we miss much more than hit in this file? If so,
 		 * stop bothering with read-ahead. It will only hurt.
 		 */
-		if (ra->mmap_miss > ra->mmap_hit + MMAP_LOTSAMISS)
+		if (ra->mmap_miss > MMAP_LOTSAMISS)
 			goto no_cached_page;
 
 		/*
@@ -1395,7 +1395,7 @@ retry_find:
 	}
 
 	if (!did_readaround)
-		ra->mmap_hit++;
+		ra->mmap_miss--;
 
 	/*
 	 * We have a locked page in the page cache, now we need to check

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/8] combine file_ra_state.prev_index/prev_offset into prev_pos
       [not found] ` <20070721035851.321030363@mail.ustc.edu.cn>
@ 2007-07-21  3:57   ` Fengguang Wu
  0 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  3:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, linux-kernel, Peter Zijlstra, Christoph Lameter

[-- Attachment #1: merge-start-prev_index.patch --]
[-- Type: text/plain, Size: 4994 bytes --]

Combine the file_ra_state members
				unsigned long prev_index
				unsigned int prev_offset
into
				loff_t prev_pos

It is more consistent and better supports huge files.

Thanks to Peter for the nice proposal!

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 fs/ext3/dir.c      |    2 +-
 fs/ext4/dir.c      |    2 +-
 fs/splice.c        |    2 +-
 include/linux/fs.h |    3 +--
 mm/filemap.c       |   11 ++++++-----
 mm/readahead.c     |   15 ++++++++-------
 6 files changed, 18 insertions(+), 17 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/fs.h
+++ linux-2.6.22-git15/include/linux/fs.h
@@ -704,8 +704,7 @@ struct file_ra_state {
 
 	unsigned int ra_pages;		/* Maximum readahead window */
 	int mmap_miss;			/* Cache miss stat for mmap accesses */
-	unsigned long prev_index;	/* Cache last read() position */
-	unsigned int prev_offset;	/* Offset where last read() ended in a page */
+	loff_t prev_pos;		/* Cache last read() position */
 };
 
 /*
--- linux-2.6.22-git15.orig/mm/filemap.c
+++ linux-2.6.22-git15/mm/filemap.c
@@ -879,8 +879,8 @@ void do_generic_mapping_read(struct addr
 	cached_page = NULL;
 	index = *ppos >> PAGE_CACHE_SHIFT;
 	next_index = index;
-	prev_index = ra.prev_index;
-	prev_offset = ra.prev_offset;
+	prev_index = ra.prev_pos >> PAGE_CACHE_SHIFT;
+	prev_offset = ra.prev_pos & (PAGE_CACHE_SIZE-1);
 	last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
@@ -966,7 +966,6 @@ page_ok:
 		index += offset >> PAGE_CACHE_SHIFT;
 		offset &= ~PAGE_CACHE_MASK;
 		prev_offset = offset;
-		ra.prev_offset = offset;
 
 		page_cache_release(page);
 		if (ret == nr && desc->count)
@@ -1056,7 +1055,9 @@ no_cached_page:
 
 out:
 	*_ra = ra;
-	_ra->prev_index = prev_index;
+	_ra->prev_pos = prev_index;
+	_ra->prev_pos <<= PAGE_CACHE_SHIFT;
+	_ra->prev_pos |= prev_offset;
 
 	*ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset;
 	if (cached_page)
@@ -1415,7 +1416,7 @@ retry_find:
 	 * Found the page and have a reference on it.
 	 */
 	mark_page_accessed(page);
-	ra->prev_index = page->index;
+	ra->prev_pos = page->index << PAGE_CACHE_SHIFT;
 	vmf->page = page;
 	return ret | VM_FAULT_LOCKED;
 
--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -45,7 +45,7 @@ void
 file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping)
 {
 	ra->ra_pages = mapping->backing_dev_info->ra_pages;
-	ra->prev_index = -1;
+	ra->prev_pos = -1;
 }
 EXPORT_SYMBOL_GPL(file_ra_state_init);
 
@@ -326,7 +326,7 @@ static unsigned long get_next_ra_size(st
  * indicator. The flag won't be set on already cached pages, to avoid the
  * readahead-for-nothing fuss, saving pointless page cache lookups.
  *
- * prev_index tracks the last visited page in the _previous_ read request.
+ * prev_pos tracks the last visited byte in the _previous_ read request.
  * It should be maintained by the caller, and will be used for detecting
  * small random reads. Note that the readahead algorithm checks loosely
  * for sequential patterns. Hence interleaved reads might be served as
@@ -350,11 +350,9 @@ ondemand_readahead(struct address_space 
 		   bool hit_readahead_marker, pgoff_t offset,
 		   unsigned long req_size)
 {
-	int max;	/* max readahead pages */
-	int sequential;
-
-	max = ra->ra_pages;
-	sequential = (offset - ra->prev_index <= 1UL) || (req_size > max);
+	int	max = ra->ra_pages;	/* max readahead pages */
+	pgoff_t prev_offset;
+	int	sequential;
 
 	/*
 	 * It's the expected callback offset, assume sequential access.
@@ -368,6 +366,9 @@ ondemand_readahead(struct address_space 
 		goto readit;
 	}
 
+	prev_offset = ra->prev_pos >> PAGE_CACHE_SHIFT;
+	sequential = offset - prev_offset <= 1UL || req_size > max;
+
 	/*
 	 * Standalone, small read.
 	 * Read as is, and do not pollute the readahead state.
--- linux-2.6.22-git15.orig/fs/ext3/dir.c
+++ linux-2.6.22-git15/fs/ext3/dir.c
@@ -143,7 +143,7 @@ static int ext3_readdir(struct file * fi
 					sb->s_bdev->bd_inode->i_mapping,
 					&filp->f_ra, filp,
 					index, 1);
-			filp->f_ra.prev_index = index;
+			filp->f_ra.prev_pos = index << PAGE_CACHE_SHIFT;
 			bh = ext3_bread(NULL, inode, blk, 0, &err);
 		}
 
--- linux-2.6.22-git15.orig/fs/ext4/dir.c
+++ linux-2.6.22-git15/fs/ext4/dir.c
@@ -142,7 +142,7 @@ static int ext4_readdir(struct file * fi
 					sb->s_bdev->bd_inode->i_mapping,
 					&filp->f_ra, filp,
 					index, 1);
-			filp->f_ra.prev_index = index;
+			filp->f_ra.prev_pos = index << PAGE_CACHE_SHIFT;
 			bh = ext4_bread(NULL, inode, blk, 0, &err);
 		}
 
--- linux-2.6.22-git15.orig/fs/splice.c
+++ linux-2.6.22-git15/fs/splice.c
@@ -447,7 +447,7 @@ fill_it:
 	 */
 	while (page_nr < nr_pages)
 		page_cache_release(pages[page_nr++]);
-	in->f_ra.prev_index = index;
+	in->f_ra.prev_pos = index << PAGE_CACHE_SHIFT;
 
 	if (spd.nr_pages)
 		return splice_to_pipe(pipe, &spd);

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/8] trivial filemap.c cleanups
       [not found] ` <20070721035851.461364420@mail.ustc.edu.cn>
@ 2007-07-21  3:57   ` Fengguang Wu
  0 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel

[-- Attachment #1: cleanup-filemap.patch --]
[-- Type: text/plain, Size: 2144 bytes --]

- remove unused local next_index in do_generic_mapping_read()
- convert some 'unsigned long' to pgoff_t
- wrap a long line

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 mm/filemap.c |   16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

--- linux-2.6.22-git15.orig/mm/filemap.c
+++ linux-2.6.22-git15/mm/filemap.c
@@ -866,11 +866,10 @@ void do_generic_mapping_read(struct addr
 			     read_actor_t actor)
 {
 	struct inode *inode = mapping->host;
-	unsigned long index;
-	unsigned long offset;
-	unsigned long last_index;
-	unsigned long next_index;
-	unsigned long prev_index;
+	pgoff_t index;
+	pgoff_t offset;
+	pgoff_t last_index;
+	pgoff_t prev_index;
 	unsigned int prev_offset;
 	struct page *cached_page;
 	int error;
@@ -878,7 +877,6 @@ void do_generic_mapping_read(struct addr
 
 	cached_page = NULL;
 	index = *ppos >> PAGE_CACHE_SHIFT;
-	next_index = index;
 	prev_index = ra.prev_pos >> PAGE_CACHE_SHIFT;
 	prev_offset = ra.prev_pos & (PAGE_CACHE_SIZE-1);
 	last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
@@ -1219,7 +1217,8 @@ out:
 }
 EXPORT_SYMBOL(generic_file_aio_read);
 
-int file_send_actor(read_descriptor_t * desc, struct page *page, unsigned long offset, unsigned long size)
+int file_send_actor(read_descriptor_t * desc, struct page *page,
+			unsigned long offset, unsigned long size)
 {
 	ssize_t written;
 	unsigned long count = desc->count;
@@ -1272,7 +1271,6 @@ asmlinkage ssize_t sys_readahead(int fd,
 }
 
 #ifdef CONFIG_MMU
-static int FASTCALL(page_cache_read(struct file * file, unsigned long offset));
 /**
  * page_cache_read - adds requested page to the page cache if not already there
  * @file:	file to read
@@ -1281,7 +1279,7 @@ static int FASTCALL(page_cache_read(stru
  * This adds the requested page to the page cache if it isn't already there,
  * and schedules an I/O to read in its contents from disk.
  */
-static int fastcall page_cache_read(struct file * file, unsigned long offset)
+static int fastcall page_cache_read(struct file * file, pgoff_t offset)
 {
 	struct address_space *mapping = file->f_mapping;
 	struct page *page; 

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/8] remove several readahead macros
       [not found] ` <20070721035851.638623804@mail.ustc.edu.cn>
@ 2007-07-21  3:57   ` Fengguang Wu
  0 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel

[-- Attachment #1: readahead-macros-cleanup.patch --]
[-- Type: text/plain, Size: 1515 bytes --]

Remove VM_MAX_CACHE_HIT, MAX_RA_PAGES and MIN_RA_PAGES.

Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 include/linux/mm.h |    2 --
 mm/readahead.c     |   10 +---------
 2 files changed, 1 insertion(+), 11 deletions(-)

--- linux-2.6.22-git15.orig/include/linux/mm.h
+++ linux-2.6.22-git15/include/linux/mm.h
@@ -1136,8 +1136,6 @@ int write_one_page(struct page *page, in
 /* readahead.c */
 #define VM_MAX_READAHEAD	128	/* kbytes */
 #define VM_MIN_READAHEAD	16	/* kbytes (includes current page) */
-#define VM_MAX_CACHE_HIT    	256	/* max pages in a row in cache before
-					 * turning readahead off */
 
 int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 			pgoff_t offset, unsigned long nr_to_read);
--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -21,16 +21,8 @@ void default_unplug_io_fn(struct backing
 }
 EXPORT_SYMBOL(default_unplug_io_fn);
 
-/*
- * Convienent macros for min/max read-ahead pages.
- * Note that MAX_RA_PAGES is rounded down, while MIN_RA_PAGES is rounded up.
- * The latter is necessary for systems with large page size(i.e. 64k).
- */
-#define MAX_RA_PAGES	(VM_MAX_READAHEAD*1024 / PAGE_CACHE_SIZE)
-#define MIN_RA_PAGES	DIV_ROUND_UP(VM_MIN_READAHEAD*1024, PAGE_CACHE_SIZE)
-
 struct backing_dev_info default_backing_dev_info = {
-	.ra_pages	= MAX_RA_PAGES,
+	.ra_pages	= VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
 	.state		= 0,
 	.capabilities	= BDI_CAP_MAP_COPY,
 	.unplug_io_fn	= default_unplug_io_fn,

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 6/8] remove the limit max_sectors_kb imposed on max_readahead_kb
       [not found] ` <20070721035851.791763729@mail.ustc.edu.cn>
@ 2007-07-21  3:57   ` Fengguang Wu
  0 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Jens Axboe

[-- Attachment #1: remove-readahead-size-limit.patch --]
[-- Type: text/plain, Size: 1296 bytes --]

Remove the size limit max_sectors_kb imposed on max_readahead_kb.

The size restriction is unreasonable. Especially when max_sectors_kb cannot
grow larger than max_hw_sectors_kb, which can be rather small for some disk
drives.

Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
---
 block/ll_rw_blk.c |    9 ---------
 1 file changed, 9 deletions(-)

--- linux-2.6.22-git15.orig/block/ll_rw_blk.c
+++ linux-2.6.22-git15/block/ll_rw_blk.c
@@ -3946,7 +3946,6 @@ queue_max_sectors_store(struct request_q
 			max_hw_sectors_kb = q->max_hw_sectors >> 1,
 			page_kb = 1 << (PAGE_CACHE_SHIFT - 10);
 	ssize_t ret = queue_var_store(&max_sectors_kb, page, count);
-	int ra_kb;
 
 	if (max_sectors_kb > max_hw_sectors_kb || max_sectors_kb < page_kb)
 		return -EINVAL;
@@ -3955,14 +3954,6 @@ queue_max_sectors_store(struct request_q
 	 * values synchronously:
 	 */
 	spin_lock_irq(q->queue_lock);
-	/*
-	 * Trim readahead window as well, if necessary:
-	 */
-	ra_kb = q->backing_dev_info.ra_pages << (PAGE_CACHE_SHIFT - 10);
-	if (ra_kb > max_sectors_kb)
-		q->backing_dev_info.ra_pages =
-				max_sectors_kb >> (PAGE_CACHE_SHIFT - 10);
-
 	q->max_sectors = max_sectors_kb << 1;
 	spin_unlock_irq(q->queue_lock);
 

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 7/8] introduce radix_tree_scan_hole()
       [not found] ` <20070721035851.946351617@mail.ustc.edu.cn>
@ 2007-07-21  3:57   ` Fengguang Wu
  0 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Nick Piggin

[-- Attachment #1: radixtree-introduce-scan-hole-data-functions.patch --]
[-- Type: text/plain, Size: 2263 bytes --]

Introduce radix_tree_scan_hole(root, index, max_scan) to scan radix tree
for the first hole. It will be used in interleaved readahead.

The implementation is dumb and obviously correct.
It can help debug(and document) the possible smart one in future.

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---

 include/linux/radix-tree.h |    2 ++
 lib/radix-tree.c           |   34 ++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

--- linux-2.6.22-git15.orig/include/linux/radix-tree.h
+++ linux-2.6.22-git15/include/linux/radix-tree.h
@@ -155,6 +155,8 @@ void *radix_tree_delete(struct radix_tre
 unsigned int
 radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
 			unsigned long first_index, unsigned int max_items);
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+				unsigned long index, unsigned long max_scan);
 int radix_tree_preload(gfp_t gfp_mask);
 void radix_tree_init(void);
 void *radix_tree_tag_set(struct radix_tree_root *root,
--- linux-2.6.22-git15.orig/lib/radix-tree.c
+++ linux-2.6.22-git15/lib/radix-tree.c
@@ -599,6 +599,40 @@ int radix_tree_tag_get(struct radix_tree
 EXPORT_SYMBOL(radix_tree_tag_get);
 #endif
 
+static unsigned long
+radix_tree_scan_hole_dumb(struct radix_tree_root *root,
+				unsigned long index, unsigned long max_scan)
+{
+	unsigned long i;
+
+	for (i = 0; i < max_scan; i++) {
+		if (!radix_tree_lookup(root, index))
+			break;
+		if (++index == 0)
+			break;
+	}
+
+	return index;
+}
+
+/**
+ *	radix_tree_scan_hole    -    scan for hole
+ *	@root:		radix tree root
+ *	@index:		index key
+ *	@max_scan:      advice on max items to scan (it may scan a little more)
+ *
+ *      Scan forward from @index for a hole/empty item, stop when
+ *      - hit hole
+ *      - wrap-around to index 0
+ *      - @max_scan or more items scanned
+ */
+unsigned long radix_tree_scan_hole(struct radix_tree_root *root,
+				unsigned long index, unsigned long max_scan)
+{
+	return radix_tree_scan_hole_dumb(root, index, max_scan);
+}
+EXPORT_SYMBOL(radix_tree_scan_hole);
+
 static unsigned int
 __lookup(struct radix_tree_node *slot, void **results, unsigned long index,
 	unsigned int max_items, unsigned long *next_index)

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 8/8] basic support of interleaved reads
       [not found] ` <20070721035852.104255316@mail.ustc.edu.cn>
@ 2007-07-21  3:57   ` Fengguang Wu
  0 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Nick Piggin, Rusty Russell

[-- Attachment #1: readahead-interleaved-reads.patch --]
[-- Type: text/plain, Size: 3320 bytes --]

This is a simplified version of the pagecache context based readahead.
It handles the case of multiple threads reading on the same fd and invalidating
each others' readahead state. It does the trick by scanning the pagecache and
recovering the current read stream's readahead status.

The algorithm works in a opportunistic way, in that it do not try to detect
interleaved reads _actively_, which requires a probe into the page cache(which
means a little more overheads for random reads). It only tries to handle a
previously started sequential readahead whose state was overwritten by
another concurrent stream, and it can do this job pretty well.

Negative and positive examples(or what you can expect from it):

1) it cannot detect and serve perfect request-by-request interleaved reads
   right:
	time	stream 1  stream 2
	0 	1         
	1 	          1001
	2 	2
	3 	          1002
	4 	3
	5 	          1003
	6 	4
	7 	          1004
	8 	5
	9	          1005
Here no single readahead will be carried out.

2) However, if it's two concurrent reads by two threads, the chance of the
   initial sequential readahead be started is huge. Once the first sequential
   readahead is started for a stream, this patch will ensure that the readahead
   window continues to rampup and won't be disturbed by other streams.

	time	stream 1  stream 2
	0 	1         
	1 	2
	2 	          1001
	3 	3
	4 	          1002
	5 	          1003
	6 	4
	7 	5
	8 	          1004
	9 	6
	10	          1005
	11	7
	12	          1006
	13	          1007
Here steam 1 will start a readahead at page 2, and stream 2 will start its
first readahead at page 1003. From then on the two streams will be served right.

Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
---
 mm/readahead.c |   33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

--- linux-2.6.22-git15.orig/mm/readahead.c
+++ linux-2.6.22-git15/mm/readahead.c
@@ -371,6 +371,29 @@ ondemand_readahead(struct address_space 
 	}
 
 	/*
+	 * Hit a marked page without valid readahead state.
+	 * E.g. interleaved reads.
+	 * Query the pagecache for async_size, which normally equals to
+	 * readahead size. Ramp it up and use it as the new readahead size.
+	 */
+	if (hit_readahead_marker) {
+		pgoff_t start;
+
+		read_lock_irq(&mapping->tree_lock);
+		start = radix_tree_scan_hole(&mapping->page_tree, offset, max+1);
+		read_unlock_irq(&mapping->tree_lock);
+
+		if (!start || start - offset > max)
+			return 0;
+
+		ra->start = start;
+		ra->size = start - offset;	/* old async_size */
+		ra->size = get_next_ra_size(ra, max);
+		ra->async_size = ra->size;
+		goto readit;
+	}
+
+	/*
 	 * It may be one of
 	 * 	- first read on start of file
 	 * 	- sequential cache miss
@@ -381,16 +404,6 @@ ondemand_readahead(struct address_space 
 	ra->size = get_init_ra_size(req_size, max);
 	ra->async_size = ra->size > req_size ? ra->size - req_size : ra->size;
 
-	/*
-	 * Hit on a marked page without valid readahead state.
-	 * E.g. interleaved reads.
-	 * Not knowing its readahead pos/size, bet on the minimal possible one.
-	 */
-	if (hit_readahead_marker) {
-		ra->start++;
-		ra->size = get_next_ra_size(ra, max);
-	}
-
 readit:
 	return ra_submit(ra, mapping, filp);
 }

--

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] compacting file_ra_state
       [not found]   ` <20070721040644.GA9750@mail.ustc.edu.cn>
@ 2007-07-21  4:06     ` Fengguang Wu
  2007-07-21  4:27       ` Linus Torvalds
  0 siblings, 1 reply; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  4:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Andi Kleen

Sorry, forgot to prefix the patch titles with [readahead].
Should I repost?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] compacting file_ra_state
  2007-07-21  4:06     ` Fengguang Wu
@ 2007-07-21  4:27       ` Linus Torvalds
       [not found]         ` <20070721042939.GA28875@mail.ustc.edu.cn>
  2007-07-21  5:57         ` Andi Kleen
  0 siblings, 2 replies; 16+ messages in thread
From: Linus Torvalds @ 2007-07-21  4:27 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: Andrew Morton, linux-kernel, Andi Kleen



On Sat, 21 Jul 2007, Fengguang Wu wrote:
>
> Sorry, forgot to prefix the patch titles with [readahead].
> Should I repost?

Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 
even if it does mean missing the merge window this time around. 

		Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] compacting file_ra_state
       [not found]         ` <20070721042939.GA28875@mail.ustc.edu.cn>
@ 2007-07-21  4:29           ` Fengguang Wu
  0 siblings, 0 replies; 16+ messages in thread
From: Fengguang Wu @ 2007-07-21  4:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel, Andi Kleen

On Fri, Jul 20, 2007 at 09:27:01PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 21 Jul 2007, Fengguang Wu wrote:
> >
> > Sorry, forgot to prefix the patch titles with [readahead].
> > Should I repost?
> 
> Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 
> even if it does mean missing the merge window this time around. 

OK. Let me repost it...


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] compacting file_ra_state
  2007-07-21  4:27       ` Linus Torvalds
       [not found]         ` <20070721042939.GA28875@mail.ustc.edu.cn>
@ 2007-07-21  5:57         ` Andi Kleen
  2007-07-21  6:03           ` Andrew Morton
  2007-07-21  6:13           ` Linus Torvalds
  1 sibling, 2 replies; 16+ messages in thread
From: Andi Kleen @ 2007-07-21  5:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Fengguang Wu, Andrew Morton, linux-kernel, Andi Kleen

On Fri, Jul 20, 2007 at 09:27:01PM -0700, Linus Torvalds wrote:
> 
> 
> On Sat, 21 Jul 2007, Fengguang Wu wrote:
> >
> > Sorry, forgot to prefix the patch titles with [readahead].
> > Should I repost?
> 
> Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 

Haven't the readahead patches already essentially been in -mm* for some time?
I thought the new patches were some some restructured code, but essentially
the tested algorithms? 

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] compacting file_ra_state
  2007-07-21  5:57         ` Andi Kleen
@ 2007-07-21  6:03           ` Andrew Morton
  2007-07-21  6:13           ` Linus Torvalds
  1 sibling, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2007-07-21  6:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Linus Torvalds, Fengguang Wu, linux-kernel

On Sat, 21 Jul 2007 07:57:06 +0200 Andi Kleen <andi@firstfloor.org> wrote:

> On Fri, Jul 20, 2007 at 09:27:01PM -0700, Linus Torvalds wrote:
> > 
> > 
> > On Sat, 21 Jul 2007, Fengguang Wu wrote:
> > >
> > > Sorry, forgot to prefix the patch titles with [readahead].
> > > Should I repost?
> > 
> > Not for me, but on the other hand, I'd prefer for this to be in -mm a bit, 
> 
> Haven't the readahead patches already essentially been in -mm* for some time?
> I thought the new patches were some some restructured code, but essentially
> the tested algorithms? 
> 

The all-singing all-dancging readahead code was in -mm for maybe a year. 
Then the much-reduced, feasible-for-merging code was in -mm for several
months.  It went mainline this week.

This new patch series is some optimisation and algorithm tweaking on top of
the recently-merged well-tested stuff.  Waiting for 2.6.24 is appropriate.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] compacting file_ra_state
  2007-07-21  5:57         ` Andi Kleen
  2007-07-21  6:03           ` Andrew Morton
@ 2007-07-21  6:13           ` Linus Torvalds
  2007-07-21  6:17             ` Andi Kleen
  1 sibling, 1 reply; 16+ messages in thread
From: Linus Torvalds @ 2007-07-21  6:13 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Fengguang Wu, Andrew Morton, linux-kernel



On Sat, 21 Jul 2007, Andi Kleen wrote:
> 
> Haven't the readahead patches already essentially been in -mm* for some time?

They have indeed, and I merged them this week.

> I thought the new patches were some some restructured code, but 
> essentially the tested algorithms?

No, this series is a further cleanup on top of the restructured code, with 
some new features too. The series _looks_ fine to me, but it still makes 
sense to go through -mm, I think.

		Linus

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/8] compacting file_ra_state
  2007-07-21  6:13           ` Linus Torvalds
@ 2007-07-21  6:17             ` Andi Kleen
  0 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2007-07-21  6:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andi Kleen, Fengguang Wu, Andrew Morton, linux-kernel

On Fri, Jul 20, 2007 at 11:13:34PM -0700, Linus Torvalds wrote:
> > I thought the new patches were some some restructured code, but 
> > essentially the tested algorithms?
> 
> No, this series is a further cleanup on top of the restructured code, with 
> some new features too. The series _looks_ fine to me, but it still makes 
> sense to go through -mm, I think.

What I meant was:

iirc he first had a "very complex" patchkit in -mm* (the one 
with multiple predictors working together); then that got then replaced
with a simpler easier to review one and now he's feeding the stuff from
the complex one piece by piece back.

Just pointed out that the newer stuff was likely already in -mm* for some
time, just before one of the simplifications.

But I guess more testing cannot hurt anyways.

Anyways it's good we're finally making forward progress on this. It has 
the potential for real nice performance gains.

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-07-21  6:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20070721035733.951838089@mail.ustc.edu.cn>
2007-07-21  3:57 ` [PATCH 0/8] readahead cleanups and interleaved readahead take 2 Fengguang Wu
     [not found] ` <20070721035850.977231489@mail.ustc.edu.cn>
2007-07-21  3:57   ` [PATCH 1/8] compacting file_ra_state Fengguang Wu
     [not found]   ` <20070721040644.GA9750@mail.ustc.edu.cn>
2007-07-21  4:06     ` Fengguang Wu
2007-07-21  4:27       ` Linus Torvalds
     [not found]         ` <20070721042939.GA28875@mail.ustc.edu.cn>
2007-07-21  4:29           ` Fengguang Wu
2007-07-21  5:57         ` Andi Kleen
2007-07-21  6:03           ` Andrew Morton
2007-07-21  6:13           ` Linus Torvalds
2007-07-21  6:17             ` Andi Kleen
     [not found] ` <20070721035851.185553787@mail.ustc.edu.cn>
2007-07-21  3:57   ` [PATCH 2/8] mmap read-around simplification Fengguang Wu
     [not found] ` <20070721035851.321030363@mail.ustc.edu.cn>
2007-07-21  3:57   ` [PATCH 3/8] combine file_ra_state.prev_index/prev_offset into prev_pos Fengguang Wu
     [not found] ` <20070721035851.461364420@mail.ustc.edu.cn>
2007-07-21  3:57   ` [PATCH 4/8] trivial filemap.c cleanups Fengguang Wu
     [not found] ` <20070721035851.638623804@mail.ustc.edu.cn>
2007-07-21  3:57   ` [PATCH 5/8] remove several readahead macros Fengguang Wu
     [not found] ` <20070721035851.791763729@mail.ustc.edu.cn>
2007-07-21  3:57   ` [PATCH 6/8] remove the limit max_sectors_kb imposed on max_readahead_kb Fengguang Wu
     [not found] ` <20070721035851.946351617@mail.ustc.edu.cn>
2007-07-21  3:57   ` [PATCH 7/8] introduce radix_tree_scan_hole() Fengguang Wu
     [not found] ` <20070721035852.104255316@mail.ustc.edu.cn>
2007-07-21  3:57   ` [PATCH 8/8] basic support of interleaved reads Fengguang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).