linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
@ 2006-02-24 20:22 Marr
  2006-02-25  5:16 ` Andrew Morton
  0 siblings, 1 reply; 23+ messages in thread
From: Marr @ 2006-02-24 20:22 UTC (permalink / raw)
  To: linux-kernel

Greetings,

*** Please CC: me on replies -- I'm not subscribed.

Short Problem Description/Question:

When switching from kernel 2.4.31 to 2.6.13 (with everything else the same), 
there is a drastic increase in the time required to perform 'fseek()' on 
larger files (e.g. 4.3 MB, using ReiserFS [in case it matters], in my test 
case).

It seems that any seeks in a range larger than 128KB (regardless of the file 
size or the position within the file) cause the performace to drop 
precipitously. As near as I can determine, this happens because the virtual 
memory manager (VMM) in 2.6.13 is not caching the full 4.3 MB file. In fact, 
only a maximum of a 128KB segment of the file seems to be cached.

Can anyone please explain this change in behavior and/or recommend a 2.6.x VM 
setting to revert to the old (_much_ faster) 'fseek()' behavior from 2.4.x 
kernels?

-----------------------------------

More Details:

I'm running Slackware 10.2 (2.4.31 and 2.6.13 stock kernels) on a 400 MHz AMD 
K6-2 laptop with 192MB of RAM.

I have an application that does many (20,000 - 50,000) 'fseek()' calls on the 
same large file.  In 2.4.31 (and other earlier 2.4.x kernels), it runs very 
fast, even on large files (e.g. 4.3 MB).

I culled the problem down to a C code sample (see below).

Some timing tests with 20,000 'fseek()' calls:

   Kernel 2.4.31: 1st run -- 0m8.0s; 2nd run 0m0.6s;

   Kernel 2.6.13: 1st run -- 32.0s; 2nd run 29.0s;

Some timing tests with 200,000 'fseek()' calls:

   Kernel 2.4.31: 6.0s

   Kernel 2.6.13: 4m50s

Clearly, the 2.4.31 results are speedy because the whole 4MB file has been 
cached.

What I cannot figure out is this: what has changed in 2.6.x kernels to cause 
the performance to degrade so drastically?!?

Assuming it's somehow related to the 2.6.x VMM code, I've read everything I 
could in the 'usr/src/linux-2.6.13/Documentation/vm/' directory and I've run 
'vmstat' and dumped the various '/proc/sys/vm/*' settings. I've tried 
tweaking settings (some [most?] of which I don't fully understand [e.g. 
'/proc/sys/vm/lowmem_reserve_ratio']). I've tried scanning the VM code for 
clues but, not being a Virtual Memory guru, I've come up empty. I've searched 
the web and LKML to no avail.

I'm completely at a loss -- any suggestions would be much welcomed!

-----------------------------------

Here's a quick 'n' dirty test routine I wrote which demonstrates the problem 
on a 4MB file generated with this command:

   dd if=/dev/zero of=/tmp/fseek-4MB bs=1024 count=4096

Compile:

   gcc -o fseek-test fseek-test.c

Run (1st parm [required] is filename; 2nd parm [optional, 20K is default] is 
loop count):

   fseek-test /tmp/fseek-4MB 20000

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

int main (int argc, char *argv[])
{
   if (argc < 2) {
      printf("You must specify the filename!\n");
   }
   else {
      FILE *inp_fh;
      if ((inp_fh = fopen(argv[1], "rb")) == 0) {
         printf("Error ('%s') opening data file ('%s') for input!\n", 
                strerror(errno), argv[1]);
      }
      else {
         int j, pos;
         int max_calls = 20000;
         if (argc > 2) {
            max_calls = atoi(argv[2]);
            if (max_calls < 100) max_calls = 100;
            if (max_calls > 999999) max_calls = 999999;
         }
         printf("Performing %d calls to 'fseek()' on file '%s'...\n", 
                max_calls, argv[1]);
         for (j=0; j < max_calls; j++) {
            pos = (int)(((double)random() / (double)RAND_MAX) * 4000000.0);
            if (fseek(inp_fh, pos, SEEK_SET)) {
               printf("Error ('%s') seeking to position %d!\n", 
                      strerror(errno), pos);
            }
         }
         fclose(inp_fh);
      }
   }
   exit(0);
}

-----------------------------------

Any advice is much appreciated... TIA!

*** Please CC: me on replies -- I'm not subscribed.

Bill Marr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-24 20:22 Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change? Marr
@ 2006-02-25  5:16 ` Andrew Morton
  2006-02-26 13:07   ` Ingo Oeser
                     ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Andrew Morton @ 2006-02-25  5:16 UTC (permalink / raw)
  To: Marr; +Cc: linux-kernel, reiserfs-dev

Marr <marr@flex.com> wrote:
>
> ..
>
> When switching from kernel 2.4.31 to 2.6.13 (with everything else the same), 
> there is a drastic increase in the time required to perform 'fseek()' on 
> larger files (e.g. 4.3 MB, using ReiserFS [in case it matters], in my test 
> case).
> 
> It seems that any seeks in a range larger than 128KB (regardless of the file 
> size or the position within the file) cause the performace to drop 
> precipitously.
>

Interesting.

What's happening is that glibc does a read from the file within each
fseek().  Which might seem a bit silly because the app could seek somewhere
else without doing any IO.  But then the app would be silly too.

Also, glibc is using the value returned in struct stat's blksize (a hint as
to this file's preferred read chunk size) as, umm, a hint as to this file's
preferred read size.

Most filesystems return 4k in stat.blksize.  But in 2.6, reiserfs bumped
that to 128k to get good I/O patterns.   Consequently this:

>          for (j=0; j < max_calls; j++) {
>             pos = (int)(((double)random() / (double)RAND_MAX) * 4000000.0);
>             if (fseek(inp_fh, pos, SEEK_SET)) {
>                printf("Error ('%s') seeking to position %d!\n", 
>                       strerror(errno), pos);
>             }
>          }

runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
on every fseek.

- There may be a libc stdio function which allows you to tune this
  behaviour.

- libc should probably be a bit more defensive about this anyway -
  plainly the filesystem is being silly.

- You can alter the filesystem's behaviour by mounting with the
  `nolargeio=1' option.  That sets stat.blksize back to 4k.

  This will alter the behaviour of every reiserfs filesystem in the
  machine.  Even the already mounted ones.

  `mount -o remount,nolargeio=1' can probably also be used.  But that
  won't affect inodes which are already in cache - a umount/mount cycle may
  be needed.

  If you like, you can just mount and unmount a different reiserfs
  filesystem to switch this reiserfs filesystem's behaviour.  IOW: the
  reiserfs guys were lazy and went and made this a global variable :(

- fseek is a pretty dumb function anyway - you're better off with
  stateless functions like pread() - half the number of syscalls, don't
  have to track where the file pointer is at.  I don't know if there's a
  pread()-like function in stdio though?

No happy answers there, sorry.  But a workaround.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-25  5:16 ` Andrew Morton
@ 2006-02-26 13:07   ` Ingo Oeser
  2006-02-26 13:50     ` Nick Piggin
  2006-02-27 20:24   ` Marr
  2006-02-27 21:53   ` Hans Reiser
  2 siblings, 1 reply; 23+ messages in thread
From: Ingo Oeser @ 2006-02-26 13:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Marr, reiserfs-dev

[-- Attachment #1: Type: text/plain, Size: 1394 bytes --]

On Saturday, 25. February 2006 06:16, Andrew Morton wrote:
> runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
> on every fseek.

Thats the bug. If I seek, I never like to have an read issued.
seek should just return whether the result is a valid offset 
in the underlying object. 

It is perfectly valid to have a real time device which produces data 
very fast and where you are allowed to skip without reading anything.

This device coul be a pipe, which just allows forward seeking for exactly
this (implemented by me some years ago).

> - fseek is a pretty dumb function anyway - you're better off with
>   stateless functions like pread() - half the number of syscalls, don't
>   have to track where the file pointer is at.  I don't know if there's a
>   pread()-like function in stdio though?

pread and anything else not using RELATIVE descriptor offsets are not
very useful for pipe like interfaces that can seek, but just forward.

There are even cases, where you can seek forward and backward, but 
only with relative offsets ever, because you have a circular buffer indexed by time.
If you like to get the last N minutes, the relative index is always stable, 
but the absolute offset jumps.

So I hope glibc will fix fseek to work as advertised.

But for the simple file case all your answers are valid.

Regards

Ingo Oeser

[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-26 13:07   ` Ingo Oeser
@ 2006-02-26 13:50     ` Nick Piggin
  2006-02-26 14:11       ` Arjan van de Ven
  2006-02-27 20:52       ` Hans Reiser
  0 siblings, 2 replies; 23+ messages in thread
From: Nick Piggin @ 2006-02-26 13:50 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: linux-kernel, Andrew Morton, Marr, reiserfs-dev

Ingo Oeser wrote:
> On Saturday, 25. February 2006 06:16, Andrew Morton wrote:
> 
>>runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
>>on every fseek.
> 
> 
> Thats the bug. If I seek, I never like to have an read issued.
> seek should just return whether the result is a valid offset 
> in the underlying object. 
> 
> It is perfectly valid to have a real time device which produces data 
> very fast and where you are allowed to skip without reading anything.
> 
> This device coul be a pipe, which just allows forward seeking for exactly
> this (implemented by me some years ago).
> 
> 
>>- fseek is a pretty dumb function anyway - you're better off with
>>  stateless functions like pread() - half the number of syscalls, don't
>>  have to track where the file pointer is at.  I don't know if there's a
>>  pread()-like function in stdio though?
> 
> 
> pread and anything else not using RELATIVE descriptor offsets are not
> very useful for pipe like interfaces that can seek, but just forward.
> 
> There are even cases, where you can seek forward and backward, but 
> only with relative offsets ever, because you have a circular buffer indexed by time.
> If you like to get the last N minutes, the relative index is always stable, 
> but the absolute offset jumps.
> 
> So I hope glibc will fix fseek to work as advertised.
> 
> But for the simple file case all your answers are valid.
> 

Not really. The app is not silly if it does an fseek() then a _write_.
Writing page sized and aligned chunks should not require previously
uptodate pagecache, so doing a pre-read like this is a complete waste.

Actually glibc tries to turn this pre-read off if the seek is to a page
aligned offset, presumably to handle this case. However a big write
would only have to RMW the first and last partial pages, so pre-reading
128KB in this case is wrong.

And I would also say a 4K read is wrong as well, because a big read will
be less efficient due to the extra syscall and small IO.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-26 13:50     ` Nick Piggin
@ 2006-02-26 14:11       ` Arjan van de Ven
  2006-02-27 20:52       ` Hans Reiser
  1 sibling, 0 replies; 23+ messages in thread
From: Arjan van de Ven @ 2006-02-26 14:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ingo Oeser, linux-kernel, Andrew Morton, Marr, reiserfs-dev

On Mon, 2006-02-27 at 00:50 +1100, Nick Piggin wrote:
> 
> Not really. The app is not silly if it does an fseek() then a _write_.
> Writing page sized and aligned chunks should not require previously
> uptodate pagecache, so doing a pre-read like this is a complete waste.
> 
> Actually glibc tries to turn this pre-read off if the seek is to a page
> aligned offset, presumably to handle this case. However a big write
> would only have to RMW the first and last partial pages, so pre-reading
> 128KB in this case is wrong.
> 
> And I would also say a 4K read is wrong as well, because a big read will
> be less efficient due to the extra syscall and small IO.


I can very much see the point of issuing a sys_readahead instead.....



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-25  5:16 ` Andrew Morton
  2006-02-26 13:07   ` Ingo Oeser
@ 2006-02-27 20:24   ` Marr
  2006-02-27 21:53   ` Hans Reiser
  2 siblings, 0 replies; 23+ messages in thread
From: Marr @ 2006-02-27 20:24 UTC (permalink / raw)
  To: linux-kernel; +Cc: reiserfs-dev

On Saturday 25 February 2006 12:16am, Andrew Morton wrote:
> Marr <marr@flex.com> wrote:
> > ..
> >
> > When switching from kernel 2.4.31 to 2.6.13 (with everything else the
> > same), there is a drastic increase in the time required to perform
> > 'fseek()' on larger files (e.g. 4.3 MB, using ReiserFS [in case it
> > matters], in my test case).
> >
> > It seems that any seeks in a range larger than 128KB (regardless of the
> > file size or the position within the file) cause the performace to drop
> > precipitously.
>
> Interesting.
>
> What's happening is that glibc does a read from the file within each
> fseek().  Which might seem a bit silly because the app could seek somewhere
> else without doing any IO.  But then the app would be silly too.
>
> Also, glibc is using the value returned in struct stat's blksize (a hint as
> to this file's preferred read chunk size) as, umm, a hint as to this file's
> preferred read size.
>
> Most filesystems return 4k in stat.blksize.  But in 2.6, reiserfs bumped
> that to 128k to get good I/O patterns.   Consequently this:
> >          for (j=0; j < max_calls; j++) {
> >             pos = (int)(((double)random() / (double)RAND_MAX) *
> > 4000000.0); if (fseek(inp_fh, pos, SEEK_SET)) {
> >                printf("Error ('%s') seeking to position %d!\n",
> >                       strerror(errno), pos);
> >             }
> >          }
>
> runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
> on every fseek.

(...snip...)

> - You can alter the filesystem's behaviour by mounting with the
>   `nolargeio=1' option.  That sets stat.blksize back to 4k.

Greetings again,

*** Please CC: me on replies -- I'm not subscribed.

First off, many thanks to all who replied. A special "thank you" to Andrew 
Morton for his valuable insight -- very much appreciated!

Apologies for my delay in replying. I wanted to do some proper testing in 
order to have something intelligent to report.

Based on Andrew's excellent advice, I've re-tested. As before, I tested under 
the stock (Slackware 10.2) 2.4.31 and 2.6.13 kernels. This time, I tested 
ext2, ext3, and reiserfs (with and without the 'nolargeio=1' mount option) 
filesystems.

Some notes on the testing:

   (1) This is on a faster machine and a faster hard disk drive than the 
testing from my initial email, so the absolute times are not meaningful in 
comparison.

   (2) I found (unsurprisingly) that ext2 and ext3 times were very similar, so 
I'm reporting them as one here.

   (3) I'm only reporting the times for the 2nd and subsequent runs of the 
'fdisk_seek' test. On all cases (except for the 2.6.13 kernel with reiserfs 
without the 'nolargeio=1' setting), the 1st run after mounting the filesystem 
was predictably slower (uncached file content). The 2nd and subsequent runs 
are all close enough to be considered identical.

   (4) All tests were done on the same 4MB zero-filled file described in my 
initial email.

Timing tests with 200,000 randomized 'fseek()' calls:

   Kernel 2.4.31:

      ext2/3: 2.8s
      reiserfs (w/o 'nolargeio=1'): 2.8s

   Kernel 2.6.13:

      ext2/3: 3.0s
      reiserfs (w/o 'nolargeio=1'): 2m12s (ouch!)
      reiserfs (with 'nolargeio=1'): 3.0s

Basically, the "reiserfs without 'nolargeio=1' option on a 2.6.x kernel" is 
the "problem child". Every run, from the 1st to the nth, takes the same 
amount of time and is _incredibly_ slow for any application which is doing a 
lot of file seeking outside of a 128KB window.

Clearly, however, there are 2 workarounds when using a 2.6.x kernel: (A) Use 
ext2/ext3 or (B) use the 'nolargeio=1' mount option when using reiserfs.

Aside: For some reason, the 'nolargeio' option for the 'reiserfs' filesystem 
is not mentioned on their page of such info:

   http://www.namesys.com/mount-options.html

On Saturday 25 February 2006 12:16am, Andrew Morton wrote:
> No happy answers there, sorry.  But a workaround.

Actually, 2 workarounds, both good ones. Thanks again, Andrew, for your 
excellent advice!

*** Please CC: me on replies -- I'm not subscribed.

Bill Marr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-26 13:50     ` Nick Piggin
  2006-02-26 14:11       ` Arjan van de Ven
@ 2006-02-27 20:52       ` Hans Reiser
  2006-02-28  0:34         ` Nick Piggin
  1 sibling, 1 reply; 23+ messages in thread
From: Hans Reiser @ 2006-02-27 20:52 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ingo Oeser, linux-kernel, Andrew Morton, Marr, reiserfs-dev

Sounds like the real problem is that glibc is doing filesystem
optimizations without making them conditional on the filesystem type. 
Does anyone know the email address of the glibc guy so we can ask him
not to do that?

My entry for the ugliest thought of the day: I wonder if the kernel can
test the glibc version and.....

Hans

Nick Piggin wrote:

>
> Actually glibc tries to turn this pre-read off if the seek is to a page
> aligned offset, presumably to handle this case. However a big write
> would only have to RMW the first and last partial pages, so pre-reading
> 128KB in this case is wrong.
>
> And I would also say a 4K read is wrong as well, because a big read will
> be less efficient due to the extra syscall and small IO.
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-25  5:16 ` Andrew Morton
  2006-02-26 13:07   ` Ingo Oeser
  2006-02-27 20:24   ` Marr
@ 2006-02-27 21:53   ` Hans Reiser
  2006-02-28  0:03     ` Bill Davidsen
  2 siblings, 1 reply; 23+ messages in thread
From: Hans Reiser @ 2006-02-27 21:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Marr, linux-kernel, reiserfs-dev

Andrew Morton wrote:

>
>runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
>on every fseek.
>
>- There may be a libc stdio function which allows you to tune this
>  behaviour.
>
>- libc should probably be a bit more defensive about this anyway -
>  plainly the filesystem is being silly.
>  
>
I really thank you for isolating the problem, but I don't see how you
can do other than blame glibc for this.  The recommended IO size is only
relevant to uncached data, and glibc is using it regardless of whether
or not it is cached or uncached.   Do I misunderstand something myself here?

>- You can alter the filesystem's behaviour by mounting with the
>  `nolargeio=1' option.  That sets stat.blksize back to 4k.
>
>  This will alter the behaviour of every reiserfs filesystem in the
>  machine.  Even the already mounted ones.
>
>  `mount -o remount,nolargeio=1' can probably also be used.  But that
>  won't affect inodes which are already in cache - a umount/mount cycle may
>  be needed.
>
>  If you like, you can just mount and unmount a different reiserfs
>  filesystem to switch this reiserfs filesystem's behaviour.  IOW: the
>  reiserfs guys were lazy and went and made this a global variable :(
>
>- fseek is a pretty dumb function anyway - you're better off with
>  stateless functions like pread() - half the number of syscalls, don't
>  have to track where the file pointer is at.  I don't know if there's a
>  pread()-like function in stdio though?
>
>No happy answers there, sorry.  But a workaround.
>
>
>  
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-27 21:53   ` Hans Reiser
@ 2006-02-28  0:03     ` Bill Davidsen
  2006-02-28 18:38       ` Hans Reiser
  2006-03-05 23:02       ` Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?) Linda Walsh
  0 siblings, 2 replies; 23+ messages in thread
From: Bill Davidsen @ 2006-02-28  0:03 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Marr, linux-kernel, reiserfs-dev

Hans Reiser wrote:
> Andrew Morton wrote:
> 
>> runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
>> on every fseek.
>>
>> - There may be a libc stdio function which allows you to tune this
>>  behaviour.
>>
>> - libc should probably be a bit more defensive about this anyway -
>>  plainly the filesystem is being silly.
>>  
>>
> I really thank you for isolating the problem, but I don't see how you
> can do other than blame glibc for this.  The recommended IO size is only
> relevant to uncached data, and glibc is using it regardless of whether
> or not it is cached or uncached.   Do I misunderstand something myself here?

I think the issue is not "blame" but what effect this behavior would 
have on things like database loads, where seek-write would be common. 
Good to get this info to users and admins.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-27 20:52       ` Hans Reiser
@ 2006-02-28  0:34         ` Nick Piggin
  2006-02-28 18:42           ` Hans Reiser
  2006-02-28 18:51           ` Hans Reiser
  0 siblings, 2 replies; 23+ messages in thread
From: Nick Piggin @ 2006-02-28  0:34 UTC (permalink / raw)
  To: Hans Reiser; +Cc: Ingo Oeser, linux-kernel, Andrew Morton, Marr, reiserfs-dev

Hans Reiser wrote:

>Sounds like the real problem is that glibc is doing filesystem
>optimizations without making them conditional on the filesystem type. 
>

I'm not sure that it should even be conditional on the filesystem type...
To me it seems silly to even bother doing it, although I guess there
is another level of buffering involved which might mean it makes more
sense.

>Does anyone know the email address of the glibc guy so we can ask him
>not to do that?
>
>

Ulrich Drepper I guess. But don't tell him I sent you ;)

>My entry for the ugliest thought of the day: I wonder if the kernel can
>test the glibc version and.....
>
>Hans
>
>Nick Piggin wrote:
>
>
>>Actually glibc tries to turn this pre-read off if the seek is to a page
>>aligned offset, presumably to handle this case. However a big write
>>would only have to RMW the first and last partial pages, so pre-reading
>>128KB in this case is wrong.
>>
>>And I would also say a 4K read is wrong as well, because a big read will
>>be less efficient due to the extra syscall and small IO.
>>
>>
--

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-28  0:03     ` Bill Davidsen
@ 2006-02-28 18:38       ` Hans Reiser
  2006-03-05 23:02       ` Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?) Linda Walsh
  1 sibling, 0 replies; 23+ messages in thread
From: Hans Reiser @ 2006-02-28 18:38 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Marr, linux-kernel, reiserfs-dev, philb

Bill Davidsen wrote:

> Hans Reiser wrote:
>
>> Andrew Morton wrote:
>>
>>> runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k
>>> read
>>> on every fseek.
>>>
>>> - There may be a libc stdio function which allows you to tune this
>>>  behaviour.
>>>
>>> - libc should probably be a bit more defensive about this anyway -
>>>  plainly the filesystem is being silly.
>>>  
>>>
>> I really thank you for isolating the problem, but I don't see how you
>> can do other than blame glibc for this.  The recommended IO size is only
>> relevant to uncached data, and glibc is using it regardless of whether
>> or not it is cached or uncached.   Do I misunderstand something
>> myself here?
>
>
> I think the issue is not "blame" but what effect this behavior would
> have on things like database loads, where seek-write would be common.
> Good to get this info to users and admins.
>
Well, ok, let me phrase it as "this should be fixed in glibc".   Does
anyone know who the maintainer for it is?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-28  0:34         ` Nick Piggin
@ 2006-02-28 18:42           ` Hans Reiser
  2006-02-28 18:51           ` Hans Reiser
  1 sibling, 0 replies; 23+ messages in thread
From: Hans Reiser @ 2006-02-28 18:42 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ingo Oeser, linux-kernel, Andrew Morton, Marr, reiserfs-dev

Nick Piggin wrote:

> Hans Reiser wrote:
>
>> Sounds like the real problem is that glibc is doing filesystem
>> optimizations without making them conditional on the filesystem type.
>
>
> I'm not sure that it should even be conditional on the filesystem type...
> To me it seems silly to even bother doing it, although I guess there
> is another level of buffering involved which might mean it makes more
> sense.
>
I was not saying that filesystem optimizations should be done in glibc
rather than in the kernel, I was merely forgoing judgement on that
point.   Actually, I rather doubt that they should be in glibc, but
maybe someday someone will come up with some legit example of where it
belongs in glibc.  I cannot think of one myself though.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
  2006-02-28  0:34         ` Nick Piggin
  2006-02-28 18:42           ` Hans Reiser
@ 2006-02-28 18:51           ` Hans Reiser
  1 sibling, 0 replies; 23+ messages in thread
From: Hans Reiser @ 2006-02-28 18:51 UTC (permalink / raw)
  To: Nick Piggin, drepper
  Cc: Ingo Oeser, linux-kernel, Andrew Morton, Marr, reiserfs-dev

Ulrich, it seems that glibc is doing something that looks like some sort
of attempt at a filesystem optimization for fseek() which really ought
to be in the filesystems instead of glibc.  Could you comment, and
assuming you agree, fix it for us?

It particularly affects ReiserFS V3 performance in a highly negative
way, because we set stat.blksize to 128k.  stat.blksize is intended to
hint what the preferred IO size is for an FS.

Could you read this thread and contribute to it?

Hans

The most important part of the thread to read was:

Marr <marr@flex.com> wrote:
  

>>
>> ..
>>
>> When switching from kernel 2.4.31 to 2.6.13 (with everything else the same), 
>> there is a drastic increase in the time required to perform 'fseek()' on 
>> larger files (e.g. 4.3 MB, using ReiserFS [in case it matters], in my test 
>> case).
>> 
>> It seems that any seeks in a range larger than 128KB (regardless of the file 
>> size or the position within the file) cause the performace to drop 
>> precipitously.
>>
>    
>

Interesting.

What's happening is that glibc does a read from the file within each
fseek().  Which might seem a bit silly because the app could seek somewhere
else without doing any IO.  But then the app would be silly too.

Also, glibc is using the value returned in struct stat's blksize (a hint as
to this file's preferred read chunk size) as, umm, a hint as to this file's
preferred read size.

Most filesystems return 4k in stat.blksize.  But in 2.6, reiserfs bumped
that to 128k to get good I/O patterns.   Consequently this:

  

>>          for (j=0; j < max_calls; j++) {
>>             pos = (int)(((double)random() / (double)RAND_MAX) * 4000000.0);
>>             if (fseek(inp_fh, pos, SEEK_SET)) {
>>                printf("Error ('%s') seeking to position %d!\n", 
>>                       strerror(errno), pos);
>>             }
>>          }
>    
>

runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k read
on every fseek.



Nick Piggin wrote:

> Hans Reiser wrote:
>
>> Sounds like the real problem is that glibc is doing filesystem
>> optimizations without making them conditional on the filesystem type.
>
>
> I'm not sure that it should even be conditional on the filesystem type...
> To me it seems silly to even bother doing it, although I guess there
> is another level of buffering involved which might mean it makes more
> sense.
>
>
>> My entry for the ugliest thought of the day: I wonder if the kernel can
>> test the glibc version and.....
>>
>> Hans
>>
>> Nick Piggin wrote:
>>
>>
>>> Actually glibc tries to turn this pre-read off if the seek is to a page
>>> aligned offset, presumably to handle this case. However a big write
>>> would only have to RMW the first and last partial pages, so pre-reading
>>> 128KB in this case is wrong.
>>>
>>> And I would also say a 4K read is wrong as well, because a big read
>>> will
>>> be less efficient due to the extra syscall and small IO.
>>>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?)
  2006-02-28  0:03     ` Bill Davidsen
  2006-02-28 18:38       ` Hans Reiser
@ 2006-03-05 23:02       ` Linda Walsh
  2006-03-07 19:53         ` Marr
  1 sibling, 1 reply; 23+ messages in thread
From: Linda Walsh @ 2006-03-05 23:02 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Marr, linux-kernel, reiserfs-dev

Does this happen with a seek call as well, or is this limited
to fseek?

if you look at "hdparm's" idea of read-ahead, what does it say
for the device?.  I.e.:

hdparm /dev/hda:

There is a line entitled "readahead".  What does it say?

I noticed that this seems to default to "256" sectors, or 128K
in 2.6.

This may be unrelated, but what does the kernel do with
this number?  I seem to remember this being set to ~8-16 (4-8K)
in 2.4.  I thought it was the number of sectors to read ahead, by
default, when a read was done, but I haven't noticed a performance
degradation like I would expect for such a large read-ahead value.

On the other hand: you do seem to be experiencing something consistent
with that setting.  I'm not sure under what circumstances the kernel
uses the "readahead" value as a number of sectors to read ahead...

Have the disk read routines changed with respect to this value?

-linda
< bottom or top posting is a personal preference somewhat based
on the email tool one uses.  In a GUI, bottom posting often means
you can't see what the person wrote without skipping to the end
of message.  When dealing with Chronological information, it
often makes more sense to put the most recent information _first>

Bill Davidsen wrote:
> Hans Reiser wrote:
>> Andrew Morton wrote:
>>> runs like a dog on 2.6's reiserfs.  libc is doing a (probably) 128k 
>>> read
>>> on every fseek.
>>>
>>> - There may be a libc stdio function which allows you to tune this
>>>  behaviour.
>>>
>>> - libc should probably be a bit more defensive about this anyway -
>>>  plainly the filesystem is being silly.
>> I really thank you for isolating the problem, but I don't see how you
>> can do other than blame glibc for this.  The recommended IO size is only
>> relevant to uncached data, and glibc is using it regardless of whether
>> or not it is cached or uncached.   Do I misunderstand something 
>> myself here?
> I think the issue is not "blame" but what effect this behavior would 
> have on things like database loads, where seek-write would be common. 
> Good to get this info to users and admins.
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?)
  2006-03-05 23:02       ` Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?) Linda Walsh
@ 2006-03-07 19:53         ` Marr
  2006-03-07 21:15           ` Linda Walsh
  0 siblings, 1 reply; 23+ messages in thread
From: Marr @ 2006-03-07 19:53 UTC (permalink / raw)
  To: Linda Walsh
  Cc: Bill Davidsen, linux-kernel, reiserfs-dev, Andrew Morton, marr

On Sunday 05 March 2006 6:02pm, Linda Walsh wrote:
> Does this happen with a seek call as well, or is this limited
> to fseek?
>
> if you look at "hdparm's" idea of read-ahead, what does it say
> for the device?.  I.e.:
>
> hdparm /dev/hda:
>
> There is a line entitled "readahead".  What does it say?

Linda,

I don't know (based on your email addressing) if you were directing this 
question at me, but since I'm the guy who originally reported this issue, 
here are my 'hdparm' results on my (standard Slackware 10.2) ReiserFS 
filesystem:

   2.6.13 (with 'nolargeio=1' for reiserfs mount): 
      readahead    = 256 (on)

   2.6.13 (without 'nolargeio=1' for reiserfs mount): 
      readahead    = 256 (on)

   2.4.31 ('nolargeio' option irrelevant/unavailable for 2.4.x): 
      readahead    = 8 (on)

*** Please CC: me on replies -- I'm not subscribed.

Regards,
Bill Marr

> I noticed that this seems to default to "256" sectors, or 128K
> in 2.6.
>
> This may be unrelated, but what does the kernel do with
> this number?  I seem to remember this being set to ~8-16 (4-8K)
> in 2.4.  I thought it was the number of sectors to read ahead, by
> default, when a read was done, but I haven't noticed a performance
> degradation like I would expect for such a large read-ahead value.
>
> On the other hand: you do seem to be experiencing something consistent
> with that setting.  I'm not sure under what circumstances the kernel
> uses the "readahead" value as a number of sectors to read ahead...
>
> Have the disk read routines changed with respect to this value?
>
> -linda

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?)
  2006-03-07 19:53         ` Marr
@ 2006-03-07 21:15           ` Linda Walsh
  2006-03-12 21:53             ` Marr
  0 siblings, 1 reply; 23+ messages in thread
From: Linda Walsh @ 2006-03-07 21:15 UTC (permalink / raw)
  To: Marr; +Cc: Bill Davidsen, linux-kernel, reiserfs-dev, Andrew Morton

Marr wrote:
> On Sunday 05 March 2006 6:02pm, Linda Walsh wrote:
>> Does this happen with a seek call as well, or is this limited
>> to fseek?
>>
>> if you look at "hdparm's" idea of read-ahead, what does it say
>> for the device?.  I.e.:
>>
>> hdparm /dev/hda:
>>
>> There is a line entitled "readahead".  What does it say?
>
> Linda,
>
> I don't know (based on your email addressing) if you were directing this 
> question at me, but since I'm the guy who originally reported this issue, 
> here are my 'hdparm' results on my (standard Slackware 10.2) ReiserFS 
> filesystem:
>
>    2.6.13 (with 'nolargeio=1' for reiserfs mount): 
>       readahead    = 256 (on)
>
>    2.6.13 (without 'nolargeio=1' for reiserfs mount): 
>       readahead    = 256 (on)
>
>    2.4.31 ('nolargeio' option irrelevant/unavailable for 2.4.x): 
>       readahead    = 8 (on)
>
> *** Please CC: me on replies -- I'm not subscribed.
>
> Regards,
> Bill Marr
--------
    Could you retry your test with read-ahead set to a smaller
value?  Say the same as in 2.4 (8) or 16 and see if that changes
anything?

hdparm -a8 /dev/hdx
  or
hdparm -a16 /dev/hdx




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?)
  2006-03-07 21:15           ` Linda Walsh
@ 2006-03-12 21:53             ` Marr
  2006-03-12 22:15               ` Mark Lord
  0 siblings, 1 reply; 23+ messages in thread
From: Marr @ 2006-03-12 21:53 UTC (permalink / raw)
  To: Linda Walsh
  Cc: Bill Davidsen, linux-kernel, reiserfs-dev, Andrew Morton, marr

On Tuesday 07 March 2006 4:15pm, Linda Walsh wrote:
> Marr wrote:
> > On Sunday 05 March 2006 6:02pm, Linda Walsh wrote:
> >> Does this happen with a seek call as well, or is this limited
> >> to fseek?
> >>
> >> if you look at "hdparm's" idea of read-ahead, what does it say
> >> for the device?.  I.e.:
> >>
> >> hdparm /dev/hda:
> >>
> >> There is a line entitled "readahead".  What does it say?
> >
> > Linda,
> >
> > I don't know (based on your email addressing) if you were directing this
> > question at me, but since I'm the guy who originally reported this issue,
> > here are my 'hdparm' results on my (standard Slackware 10.2) ReiserFS
> > filesystem:
> >
> >    2.6.13 (with 'nolargeio=1' for reiserfs mount):
> >       readahead    = 256 (on)
> >
> >    2.6.13 (without 'nolargeio=1' for reiserfs mount):
> >       readahead    = 256 (on)
> >
> >    2.4.31 ('nolargeio' option irrelevant/unavailable for 2.4.x):
> >       readahead    = 8 (on)
> >
> > *** Please CC: me on replies -- I'm not subscribed.
> >
> > Regards,
> > Bill Marr
>
> --------
>     Could you retry your test with read-ahead set to a smaller
> value?  Say the same as in 2.4 (8) or 16 and see if that changes
> anything?
>
> hdparm -a8 /dev/hdx
>   or
> hdparm -a16 /dev/hdx

Linda (et al),

Sorry for the delayed reply. I finally got a chance to run another test (but 
on a different machine than the last time, so don't try to compare old timing 
numbers with these numbers).

I went ahead and tried all permutations, just to be sure. As before, these 
reported times are all for 200,000 random 'fseek()' calls on the same 
zero-filled 4MB file on a standard Slackware 10.2 ReiserFS partition and 
kernels.

(Values shown for 'readahead' are set by 'hdparm -a## /dev/hda' command.)

-----------------------------------
Timing Results:

On 2.6.13, *without* 'nolargeio=1': 4m35s (ouch!) for _all_ variants (256, 16, 
8) of 'readahead'

On 2.6.13, _with_ 'nolargeio=1': 0m6s for _all_ variants (256, 16, 8) of 
'readahead'

On 2.4.31: 0m6s for _all_ variants (128 [256 is illegal -- 'BLKRASET failed: 
Invalid argument'], 16, 8) of 'readahead'

-----------------------------------

I half-expected to see improvement for the '2.6.13 without nolargeio=1' case 
when lowering the read-ahead from 256 sectors to 16 or 8 sectors, but there 
clearly was no improvement whatsoever. 

I tried turning 'readahead' off entirely ('hdparm -A0 /dev/hda') and, although 
it correctly reported "setting drive read-lookahead to 0 (off)", an immediate 
follow-on query ('hdparm /dev/hda') showed that it was still ON ("readahead = 
256 (on)")!  I went ahead and ran the test again anyway and (unsurprisingly) 
got the same excessive times (4m35s) for 200K seeks.

Confused, but still (for now) happily using the 'nolargeio=1' workaround with 
all my 2.6.13 kernels with ReiserFS....   :^/

*** Please CC: me on replies -- I'm not subscribed.
   
Regards,
Bill Marr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?)
  2006-03-12 21:53             ` Marr
@ 2006-03-12 22:15               ` Mark Lord
  2006-03-13  4:36                 ` Marr
  0 siblings, 1 reply; 23+ messages in thread
From: Mark Lord @ 2006-03-12 22:15 UTC (permalink / raw)
  To: Marr
  Cc: Linda Walsh, Bill Davidsen, linux-kernel, reiserfs-dev, Andrew Morton

Marr wrote:
>
> I tried turning 'readahead' off entirely ('hdparm -A0 /dev/hda') and, although 

No, that should be "hdparm -a0 /dev/hda" (lowercase "-a").
And the same "-a" for all of your other test variants.

If you did it all with "-A", then the results are invalid,
and need to be redone.

The hdparm manpage explains this, but in a nutshell, "-A" is the
low-level drive firmware "look-ahead" mechanism, whereas "-a" is
the Linux kernel "read-ahead" scheme.

In general, most uppercase hdparm flags are drive *firmware* settings.

Cheers

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?)
  2006-03-12 22:15               ` Mark Lord
@ 2006-03-13  4:36                 ` Marr
  2006-03-13 14:41                   ` Mark Lord
  0 siblings, 1 reply; 23+ messages in thread
From: Marr @ 2006-03-13  4:36 UTC (permalink / raw)
  To: Mark Lord
  Cc: Linda Walsh, Bill Davidsen, linux-kernel, reiserfs-dev,
	Andrew Morton, marr

On Sunday 12 March 2006 5:15pm, Mark Lord wrote:
> Marr wrote:
> > I tried turning 'readahead' off entirely ('hdparm -A0 /dev/hda') and,
> > although
>
> No, that should be "hdparm -a0 /dev/hda" (lowercase "-a").

Aha, you're right! Thanks for the clarification.

> And the same "-a" for all of your other test variants.
>
> If you did it all with "-A", then the results are invalid,
> and need to be redone.

Actually, that's impossible to do ('hdparm' won't take such settings with 
'-A'). And, as my original email stated:

   (Values shown for 'readahead' are set by 'hdparm -a## /dev/hda' command.)

In other words, the important tests were done correctly. Sorry I didn't make 
it clearer, but that last test with '-A0' was a complete afterthought (based 
on what I saw on a quick look at the 'man hdparm' page) and in no way negates 
the results from the first part of the tests.

> The hdparm manpage explains this, but in a nutshell, "-A" is the 
> low-level drive firmware "look-ahead" mechanism, whereas "-a" is
> the Linux kernel "read-ahead" scheme.

You are, of course, correct. Unfortunately, my 'man hdparm' page ("Version 6.1 
April 2005") doesn't make this as clear as it could be. The distinction is 
subtle. To quote the '-a'/'-A' part:

-a     Get/set sector count for filesystem read-ahead.  This is used to
              improve performance in  sequential  reads  of  large  files,  by
              prefetching  additional  blocks  in  anticipation  of them being
              needed by the running  task.   In  the  current  kernel  version
              (2.0.10)  this  has  a default setting of 8 sectors (4KB).  This
              value seems good for most purposes, but in a system  where  most
              file  accesses are random seeks, a smaller setting might provide
              better performance.  Also, many IDE drives also have a  separate
              built-in  read-ahead  function,  which alleviates the need for a
              filesystem read-ahead in many situations.

-A     Disable/enable the IDE drive's read-lookahead  feature  (usually
              ON by default).  Usage: -A0 (disable) or -A1 (enable).

A bad interpretation on my part. Thanks again for setting me straight.

Anyway, not that it really matters, but I re-did the testing with '-a0' and it 
didn't help one iota. The 2.6.13 kernel on ReiserFS (without using 
'nolargeio=1' as a mount option) still takes about 4m35s to fseek 200,000 
times on that 4MB file, even with 'hdparm -a0 /dev/hda' in effect.

*** Please CC: me on replies -- I'm not subscribed.
   
Regards,
Bill Marr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?)
  2006-03-13  4:36                 ` Marr
@ 2006-03-13 14:41                   ` Mark Lord
  2006-03-13 18:15                     ` Hans Reiser
  2006-03-13 20:00                     ` Marr
  0 siblings, 2 replies; 23+ messages in thread
From: Mark Lord @ 2006-03-13 14:41 UTC (permalink / raw)
  To: Marr
  Cc: Linda Walsh, Bill Davidsen, linux-kernel, reiserfs-dev, Andrew Morton

Marr wrote:
>
> Anyway, not that it really matters, but I re-did the testing with '-a0' and it 
> didn't help one iota. The 2.6.13 kernel on ReiserFS (without using 
> 'nolargeio=1' as a mount option) still takes about 4m35s to fseek 200,000 
> times on that 4MB file, even with 'hdparm -a0 /dev/hda' in effect.

Does it make a difference when done on the filesystem *partition*
rather than the base drive?  At one time, this mattered, and it may
still work that way today.

Eg.  hdparm -a0 /dev/hda3   rather than   hdparm -a0 /dev/hda

??

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?)
  2006-03-13 14:41                   ` Mark Lord
@ 2006-03-13 18:15                     ` Hans Reiser
  2006-03-13 20:00                     ` Marr
  1 sibling, 0 replies; 23+ messages in thread
From: Hans Reiser @ 2006-03-13 18:15 UTC (permalink / raw)
  To: Mark Lord, drepper
  Cc: Marr, Linda Walsh, Bill Davidsen, linux-kernel, reiserfs-dev,
	Andrew Morton

Ulrich, what are your plans regarding fixing this?  Are you just going
to ignore it or?

Hans

Mark Lord wrote:

> Marr wrote:
>
>>
>> Anyway, not that it really matters, but I re-did the testing with
>> '-a0' and it didn't help one iota. The 2.6.13 kernel on ReiserFS
>> (without using 'nolargeio=1' as a mount option) still takes about
>> 4m35s to fseek 200,000 times on that 4MB file, even with 'hdparm -a0
>> /dev/hda' in effect.
>
>
> Does it make a difference when done on the filesystem *partition*
> rather than the base drive?  At one time, this mattered, and it may
> still work that way today.
>
> Eg.  hdparm -a0 /dev/hda3   rather than   hdparm -a0 /dev/hda
>
> ??
>
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?)
  2006-03-13 14:41                   ` Mark Lord
  2006-03-13 18:15                     ` Hans Reiser
@ 2006-03-13 20:00                     ` Marr
  1 sibling, 0 replies; 23+ messages in thread
From: Marr @ 2006-03-13 20:00 UTC (permalink / raw)
  To: Mark Lord
  Cc: Linda Walsh, Bill Davidsen, linux-kernel, reiserfs-dev,
	Andrew Morton, marr

On Monday 13 March 2006 9:41am, Mark Lord wrote:
> Marr wrote:
> > Anyway, not that it really matters, but I re-did the testing with '-a0'
> > and it didn't help one iota. The 2.6.13 kernel on ReiserFS (without using
> > 'nolargeio=1' as a mount option) still takes about 4m35s to fseek 200,000
> > times on that 4MB file, even with 'hdparm -a0 /dev/hda' in effect.
>
> Does it make a difference when done on the filesystem *partition*
> rather than the base drive?  At one time, this mattered, and it may
> still work that way today.
>
> Eg.  hdparm -a0 /dev/hda3   rather than   hdparm -a0 /dev/hda
>
> ??

Unfortunately, it makes no difference. That is, after successfully setting 
'-a0' on the partition in question (instead of the whole HDD device itself), 
the 200,000 random 'fseek()' calls still take about 4m35s on ReiserFS 
(without using 'nolargeio=1' as a mount option) under kernel 2.6.13.

P.S. I've CC:ed you and the others on my reply to Al Boldi's request for the 
'hdparm -I /dev/hda' information, in case it helps at all.

Thanks for your inputs, Mark -- much appreciated!

*** Please CC: me on replies -- I'm not subscribed.

Regards,
Bill Marr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?
       [not found] <5JRJO-6Al-7@gated-at.bofh.it>
@ 2006-02-24 23:31 ` Robert Hancock
  0 siblings, 0 replies; 23+ messages in thread
From: Robert Hancock @ 2006-02-24 23:31 UTC (permalink / raw)
  To: Marr, linux-kernel

Marr wrote:
> Clearly, the 2.4.31 results are speedy because the whole 4MB file has been 
> cached.

I don't think this is clear at all. The entire file should always be 
cached, not doing this would be insane.

> What I cannot figure out is this: what has changed in 2.6.x kernels to cause 
> the performance to degrade so drastically?!?

fseek() is a C library call, not a system call itself - there may be 
something that glibc is doing differently. Are you using the same glibc 
version with both kernels?

Just from this program it could be something else entirely that explains 
the difference in speed, like the random number generator..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-03-13 20:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-24 20:22 Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change? Marr
2006-02-25  5:16 ` Andrew Morton
2006-02-26 13:07   ` Ingo Oeser
2006-02-26 13:50     ` Nick Piggin
2006-02-26 14:11       ` Arjan van de Ven
2006-02-27 20:52       ` Hans Reiser
2006-02-28  0:34         ` Nick Piggin
2006-02-28 18:42           ` Hans Reiser
2006-02-28 18:51           ` Hans Reiser
2006-02-27 20:24   ` Marr
2006-02-27 21:53   ` Hans Reiser
2006-02-28  0:03     ` Bill Davidsen
2006-02-28 18:38       ` Hans Reiser
2006-03-05 23:02       ` Readahead value 128K? (was Re: Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change?) Linda Walsh
2006-03-07 19:53         ` Marr
2006-03-07 21:15           ` Linda Walsh
2006-03-12 21:53             ` Marr
2006-03-12 22:15               ` Mark Lord
2006-03-13  4:36                 ` Marr
2006-03-13 14:41                   ` Mark Lord
2006-03-13 18:15                     ` Hans Reiser
2006-03-13 20:00                     ` Marr
     [not found] <5JRJO-6Al-7@gated-at.bofh.it>
2006-02-24 23:31 ` Drastic Slowdown of 'fseek()' Calls From 2.4 to 2.6 -- VMM Change? Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).