linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re:  [PATCH] Re: Fast patch for Severe I/O performance regression 2.6.6 to 2.6.7 or 2.6.8-rc3
@ 2004-08-16 21:25 Berkley Shands
  0 siblings, 0 replies; 2+ messages in thread
From: Berkley Shands @ 2004-08-16 21:25 UTC (permalink / raw)
  To: akpm, linuxram; +Cc: badari, berkley, linux-kernel, miklos, shrybman

	I'll be happy to continue trying this patch. I've just got to get over
the sata panics in 2.6.8.1 to get to it, and the nfs lockups in 2.6.8.1,
but that is another post when I figure out what happened.

thanks

berkley

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH] Re: Fast patch for Severe I/O performance regression 2.6.6 to 2.6.7 or 2.6.8-rc3
  2004-08-08  8:22 Ram Pai
@ 2004-08-16 20:30 ` Ram Pai
  0 siblings, 0 replies; 2+ messages in thread
From: Ram Pai @ 2004-08-16 20:30 UTC (permalink / raw)
  To: akpm; +Cc: Mr. Berkley Shands, linux-kernel, miklos, shrybman, badari

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2532 bytes --]

On Sun, 8 Aug 2004, Ram Pai wrote:

> On Fri, 6 Aug 2004, Mr. Berkley Shands wrote:
> 
> > in 2.6.8-rc3/mm/readahead.c line 475 (about label do_io:)
> > #if 0
> >             ra->next_size = min(average , (unsigned long)max);
> > #endif
> > 
> > the comment for the above is here after an lseek. The lseek IS inside 
> > the window, but the code will always
> > destroy the window and start again. The above patch corrects the 
> > performance problem,
> > but it would be better to do nothing if the lseek is still within the 
> > read ahead window.
> 
> Ok. I can see your point. I did introduce a subtle change in behavior
> in 2.6.8-rc3. The change in behavior is: the current window got populated
> based on the average number of contiguous pages accessed  in the past.
> Earlier to that patch, the current window got populated based on the
> amount of locality in the current window, seen in the past.
> 
> Try this patch and see if things get back to normal. This patch 
> populates the current window based on the average amount of locality 
> noticed in the current window. It should help your case, but who knows
> which other benchmark will scream :( . Its hard to keep every workload happy.
> 
> In any case try and see if this helps your case atleast. Meanwhile I will
> run my set of benchmarks and see what gets effected. 

Andrew,

	Here is a consolidated readahead patch against mm tree that takes care
of the performance regression seen with multiple threaded writes to the same
file descriptor. 

	The patch does the following:

	1. Instead of calculating the average count of sequential
		access in the read patterns, it calculates the
		average amount of hits in the current window.
	2. This average is used to guide the size of the next current
		window.
	3. Since the field serial_cnt in the ra structure does not
	 	make sense with the introduction of the new logic,
		I have renamed that field as currnt_wnd_hit.


	This patch will help the read patterns that are not neccessarily
sequential but have sufficient locality. However it may regress random
workload. 

	Results:
		
	1. Berkley Shands has reported great performance with this
		patch.
	2. iozone showed negligible effect on various read patterns.
	3. DSS workload saw neglible change.
	4. Sysbench saw a small improvement.

	I am not sure what good/bad effects this patch will cause
	on other workloads. So can we keep this patch in your tree
	for a while and see which benchmarks scream.
	I have generated this patch w.r.t to 2.6.8-rc4-mm1.

RP

[-- Attachment #2: Type: TEXT/PLAIN, Size: 3183 bytes --]

diff -pruN ram/linux-2.6.8-rc4/include/linux/fs.h linux-2.6.8-rc4/include/linux/fs.h
--- ram/linux-2.6.8-rc4/include/linux/fs.h	2004-08-16 19:55:53.753441464 -0700
+++ linux-2.6.8-rc4/include/linux/fs.h	2004-08-16 20:01:45.996892320 -0700
@@ -556,8 +556,8 @@ struct file_ra_state {
 	unsigned long prev_page;	/* Cache last read() position */
 	unsigned long ahead_start;	/* Ahead window */
 	unsigned long ahead_size;
-	unsigned long serial_cnt;	/* measure of sequentiality */
-	unsigned long average;		/* another measure of sequentiality */
+	unsigned long currnt_wnd_hit;	/* locality in the current window */
+	unsigned long average;		/* size of next current window */
 	unsigned long ra_pages;		/* Maximum readahead window */
 	unsigned long mmap_hit;		/* Cache hit stat for mmap accesses */
 	unsigned long mmap_miss;	/* Cache miss stat for mmap accesses */
diff -pruN ram/linux-2.6.8-rc4/mm/readahead.c linux-2.6.8-rc4/mm/readahead.c
--- ram/linux-2.6.8-rc4/mm/readahead.c	2004-08-16 19:55:54.243366984 -0700
+++ linux-2.6.8-rc4/mm/readahead.c	2004-08-16 20:01:46.554807504 -0700
@@ -384,25 +384,10 @@ page_cache_readahead(struct address_spac
 		first_access=1;
 		ra->next_size = max / 2;
 		ra->prev_page = offset;
-		ra->serial_cnt++;
+		ra->currnt_wnd_hit++;
 		goto do_io;
 	}
 
-	if (offset == ra->prev_page + 1) {
-		if (ra->serial_cnt <= (max * 2))
-			ra->serial_cnt++;
-	} else {
-		/*
-		 * to avoid rounding errors, ensure that 'average'
-		 * tends towards the value of ra->serial_cnt.
-		 */
-		average = ra->average;
-		if (average < ra->serial_cnt) {
-			average++;
-		}
-		ra->average = (average + ra->serial_cnt) / 2;
-		ra->serial_cnt = 1;
-	}
 	ra->prev_page = offset;
 
 	if (offset >= ra->start && offset <= (ra->start + ra->size)) {
@@ -411,12 +396,22 @@ page_cache_readahead(struct address_spac
 		 * page beyond the end.  Expand the next readahead size.
 		 */
 		ra->next_size += 2;
+
+		if (ra->currnt_wnd_hit <= (max * 2))
+			ra->currnt_wnd_hit++;
 	} else {
 		/*
 		 * A miss - lseek, pagefault, pread, etc.  Shrink the readahead
 		 * window.
 		 */
 		ra->next_size -= 2;
+
+		average = ra->average;
+		if (average < ra->currnt_wnd_hit) {
+			average++;
+		}
+		ra->average = (average + ra->currnt_wnd_hit) / 2;
+		ra->currnt_wnd_hit = 1;
 	}
 
 	if ((long)ra->next_size > (long)max)
@@ -468,7 +463,11 @@ do_io:
 			  * pages shall be accessed in the next
 			  * current window.
 			  */
-			ra->next_size = min(ra->average , (unsigned long)max);
+			average = ra->average;
+			if (ra->currnt_wnd_hit > average)
+				average = (ra->currnt_wnd_hit + ra->average + 1) / 2;
+
+			ra->next_size = min(average , (unsigned long)max);
 		}
 		ra->start = offset;
 		ra->size = ra->next_size;
@@ -501,8 +500,8 @@ do_io:
 			 * random. Hence don't bother to readahead.
 			 */
 			average = ra->average;
-			if (ra->serial_cnt > average)
-				average = (ra->serial_cnt + ra->average + 1) / 2;
+			if (ra->currnt_wnd_hit > average)
+				average = (ra->currnt_wnd_hit + ra->average + 1) / 2;
 
 			if (average > max) {
 				ra->ahead_start = ra->start + ra->size;

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-08-16 21:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-16 21:25 [PATCH] Re: Fast patch for Severe I/O performance regression 2.6.6 to 2.6.7 or 2.6.8-rc3 Berkley Shands
  -- strict thread matches above, loose matches on Subject: below --
2004-08-08  8:22 Ram Pai
2004-08-16 20:30 ` [PATCH] " Ram Pai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).