All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: unexpected results from simple test?
  2009-06-22 10:41 unexpected results from simple test? Richard Kennedy
@ 2009-06-22 10:40 ` Gurudas Pai
  2009-06-22 10:53   ` Jens Axboe
  2009-06-22 10:51 ` Jens Axboe
  1 sibling, 1 reply; 9+ messages in thread
From: Gurudas Pai @ 2009-06-22 10:40 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: fio


Richard Kennedy wrote:
> I been running a hopefully simple fio test, but I don't understand the
> results.
> 
> the test :-
> 
> fio --name=f1 --directory=/sda --rw=rw --size=400m
> 	--name=f2 --directory=/sdb --rw=write --size=1g --startdelay=10
> 
> on linux 2.6.30-git
> 
> running this takes quite while, 'time fio ...' gives
> real	3m49.174s
> user	0m0.838s
> sys	0m7.990s
> 
> (edited highlights of the fio output)
> f1: (groupid=0, jobs=1): err= 0: pid=5181
>   read : io=200MiB, bw=23,480KiB/s, iops=5,870, runt=  8722msec
>   write: io=200MiB, bw=23,499KiB/s, iops=5,874, runt=  8715msec
> f2: (groupid=0, jobs=1): err= 0: pid=5182
>   write: io=1,024MiB, bw=77,231KiB/s, iops=19,307, runt= 13577msec
> 
> The jobs seem to finish reasonable quickly, but then fio seem to be
> stuck waiting for something. CPU usage is 100% but the CPU frequency
> stays at the lowest level and the system is really sluggish in its
> response. It does recover in time but this test does seem to torture my
> system.
> 
> strace shows fio repeating this :-
> open and read several disk stats, such as 
> 
> open("/sys/block/sdb/stat", O_RDONLY)   = 10
> fstat(10, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9ef70c000
> read(10, "   45168    22897  2239004   1505"..., 4096) = 99
> close(10)                               = 0
> munmap(0x7ff9ef70c000, 4096)            = 0
> 
> then...
> 
> gettimeofday({1245665343, 22984}, NULL) = 0
> gettimeofday({1245665343, 23015}, NULL) = 0
> ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
> gettimeofday({1245665343, 23086}, NULL) = 0
> gettimeofday({1245665343, 23115}, NULL) = 0
> gettimeofday({1245665343, 23150}, NULL) = 0
> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
> rt_sigreturn(0)                         = -1 EINTR (Interrupted system call)
> wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
> nanosleep({0, 10000000}, NULL)          = 0
> wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
> nanosleep({0, 10000000}, NULL)          = 0
> ...
> wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
> nanosleep({0, 10000000}, NULL)          = 0
> 
> then back to the open & reads..
> 
> The machine is a AMD X2 64bit with 3G memory & fio is 1.26.
> 
> Any ideas what's going on? or suggestions to help track it down?

Your test is doing buffered IO, so around ~1.4 GB of memory is consumed 
by io units, so machine is sluggish. And when test is about to finish, 
fio does fsync, so all data is sync'd up, and it is taking little long 
time. and the main thread/process is waiting for fsync to complete. 
Please run the same test with direct IO .
Add "direct=1" in your jobfile.

Thanks,
-Guru

^ permalink raw reply	[flat|nested] 9+ messages in thread

* unexpected results from simple test?
@ 2009-06-22 10:41 Richard Kennedy
  2009-06-22 10:40 ` Gurudas Pai
  2009-06-22 10:51 ` Jens Axboe
  0 siblings, 2 replies; 9+ messages in thread
From: Richard Kennedy @ 2009-06-22 10:41 UTC (permalink / raw)
  To: fio

I been running a hopefully simple fio test, but I don't understand the
results.

the test :-

fio --name=f1 --directory=/sda --rw=rw --size=400m
	--name=f2 --directory=/sdb --rw=write --size=1g --startdelay=10

on linux 2.6.30-git

running this takes quite while, 'time fio ...' gives
real	3m49.174s
user	0m0.838s
sys	0m7.990s

(edited highlights of the fio output)
f1: (groupid=0, jobs=1): err= 0: pid=5181
  read : io=200MiB, bw=23,480KiB/s, iops=5,870, runt=  8722msec
  write: io=200MiB, bw=23,499KiB/s, iops=5,874, runt=  8715msec
f2: (groupid=0, jobs=1): err= 0: pid=5182
  write: io=1,024MiB, bw=77,231KiB/s, iops=19,307, runt= 13577msec

The jobs seem to finish reasonable quickly, but then fio seem to be
stuck waiting for something. CPU usage is 100% but the CPU frequency
stays at the lowest level and the system is really sluggish in its
response. It does recover in time but this test does seem to torture my
system.

strace shows fio repeating this :-
open and read several disk stats, such as 

open("/sys/block/sdb/stat", O_RDONLY)   = 10
fstat(10, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9ef70c000
read(10, "   45168    22897  2239004   1505"..., 4096) = 99
close(10)                               = 0
munmap(0x7ff9ef70c000, 4096)            = 0

then...

gettimeofday({1245665343, 22984}, NULL) = 0
gettimeofday({1245665343, 23015}, NULL) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
gettimeofday({1245665343, 23086}, NULL) = 0
gettimeofday({1245665343, 23115}, NULL) = 0
gettimeofday({1245665343, 23150}, NULL) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
rt_sigreturn(0)                         = -1 EINTR (Interrupted system call)
wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
nanosleep({0, 10000000}, NULL)          = 0
wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
nanosleep({0, 10000000}, NULL)          = 0
...
wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
nanosleep({0, 10000000}, NULL)          = 0

then back to the open & reads..

The machine is a AMD X2 64bit with 3G memory & fio is 1.26.

Any ideas what's going on? or suggestions to help track it down?

thanks

Richard 
 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unexpected results from simple test?
  2009-06-22 10:41 unexpected results from simple test? Richard Kennedy
  2009-06-22 10:40 ` Gurudas Pai
@ 2009-06-22 10:51 ` Jens Axboe
  2009-06-22 11:16   ` Richard Kennedy
  2009-06-22 14:20   ` Richard Kennedy
  1 sibling, 2 replies; 9+ messages in thread
From: Jens Axboe @ 2009-06-22 10:51 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: fio

On Mon, Jun 22 2009, Richard Kennedy wrote:
> I been running a hopefully simple fio test, but I don't understand the
> results.
> 
> the test :-
> 
> fio --name=f1 --directory=/sda --rw=rw --size=400m
> 	--name=f2 --directory=/sdb --rw=write --size=1g --startdelay=10
> 
> on linux 2.6.30-git
> 
> running this takes quite while, 'time fio ...' gives
> real	3m49.174s
> user	0m0.838s
> sys	0m7.990s

That does looke very odd.

> (edited highlights of the fio output)
> f1: (groupid=0, jobs=1): err= 0: pid=5181
>   read : io=200MiB, bw=23,480KiB/s, iops=5,870, runt=  8722msec
>   write: io=200MiB, bw=23,499KiB/s, iops=5,874, runt=  8715msec
> f2: (groupid=0, jobs=1): err= 0: pid=5182
>   write: io=1,024MiB, bw=77,231KiB/s, iops=19,307, runt= 13577msec
> 
> The jobs seem to finish reasonable quickly, but then fio seem to be
> stuck waiting for something. CPU usage is 100% but the CPU frequency
> stays at the lowest level and the system is really sluggish in its
> response. It does recover in time but this test does seem to torture my
> system.
> 
> strace shows fio repeating this :-
> open and read several disk stats, such as 
> 
> open("/sys/block/sdb/stat", O_RDONLY)   = 10
> fstat(10, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9ef70c000
> read(10, "   45168    22897  2239004   1505"..., 4096) = 99
> close(10)                               = 0
> munmap(0x7ff9ef70c000, 4096)            = 0
> 
> then...
> 
> gettimeofday({1245665343, 22984}, NULL) = 0
> gettimeofday({1245665343, 23015}, NULL) = 0
> ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
> gettimeofday({1245665343, 23086}, NULL) = 0
> gettimeofday({1245665343, 23115}, NULL) = 0
> gettimeofday({1245665343, 23150}, NULL) = 0
> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
> rt_sigreturn(0)                         = -1 EINTR (Interrupted system call)
> wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
> nanosleep({0, 10000000}, NULL)          = 0
> wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
> nanosleep({0, 10000000}, NULL)          = 0
> ...
> wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
> nanosleep({0, 10000000}, NULL)          = 0
> 
> then back to the open & reads..

This is normal, since it looks like you are stracing the parent process.
You'll want to inspect the two processes that are actually performing
the IO.

> The machine is a AMD X2 64bit with 3G memory & fio is 1.26.

Did you pull changes into that fio repo and not do a make clean before
compiling it? The build system is a bit broken, so it wont always
rebuild everything if eg fio.h had changes to structures. And that'll
cause weird behaviour sometimes.

In any case, please try and download a new fresh version and see if that
makes it any better.

There was also a recent linux kernel futex problem that fio triggered,
so it could also be a kernel issue. Again, knowing what the two other
jobs are doing would help.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unexpected results from simple test?
  2009-06-22 10:40 ` Gurudas Pai
@ 2009-06-22 10:53   ` Jens Axboe
  0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2009-06-22 10:53 UTC (permalink / raw)
  To: Gurudas Pai; +Cc: Richard Kennedy, fio

On Mon, Jun 22 2009, Gurudas Pai wrote:
>
> Richard Kennedy wrote:
>> I been running a hopefully simple fio test, but I don't understand the
>> results.
>>
>> the test :-
>>
>> fio --name=f1 --directory=/sda --rw=rw --size=400m
>> 	--name=f2 --directory=/sdb --rw=write --size=1g --startdelay=10
>>
>> on linux 2.6.30-git
>>
>> running this takes quite while, 'time fio ...' gives
>> real	3m49.174s
>> user	0m0.838s
>> sys	0m7.990s
>>
>> (edited highlights of the fio output)
>> f1: (groupid=0, jobs=1): err= 0: pid=5181
>>   read : io=200MiB, bw=23,480KiB/s, iops=5,870, runt=  8722msec
>>   write: io=200MiB, bw=23,499KiB/s, iops=5,874, runt=  8715msec
>> f2: (groupid=0, jobs=1): err= 0: pid=5182
>>   write: io=1,024MiB, bw=77,231KiB/s, iops=19,307, runt= 13577msec
>>
>> The jobs seem to finish reasonable quickly, but then fio seem to be
>> stuck waiting for something. CPU usage is 100% but the CPU frequency
>> stays at the lowest level and the system is really sluggish in its
>> response. It does recover in time but this test does seem to torture my
>> system.
>>
>> strace shows fio repeating this :-
>> open and read several disk stats, such as 
>>
>> open("/sys/block/sdb/stat", O_RDONLY)   = 10
>> fstat(10, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9ef70c000
>> read(10, "   45168    22897  2239004   1505"..., 4096) = 99
>> close(10)                               = 0
>> munmap(0x7ff9ef70c000, 4096)            = 0
>>
>> then...
>>
>> gettimeofday({1245665343, 22984}, NULL) = 0
>> gettimeofday({1245665343, 23015}, NULL) = 0
>> ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
>> gettimeofday({1245665343, 23086}, NULL) = 0
>> gettimeofday({1245665343, 23115}, NULL) = 0
>> gettimeofday({1245665343, 23150}, NULL) = 0
>> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
>> rt_sigreturn(0)                         = -1 EINTR (Interrupted system call)
>> wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
>> nanosleep({0, 10000000}, NULL)          = 0
>> wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
>> nanosleep({0, 10000000}, NULL)          = 0
>> ...
>> wait4(5321, 0x7fffcfed49d4, WNOHANG, NULL) = 0
>> nanosleep({0, 10000000}, NULL)          = 0
>>
>> then back to the open & reads..
>>
>> The machine is a AMD X2 64bit with 3G memory & fio is 1.26.
>>
>> Any ideas what's going on? or suggestions to help track it down?
>
> Your test is doing buffered IO, so around ~1.4 GB of memory is consumed  
> by io units, so machine is sluggish. And when test is about to finish,  
> fio does fsync, so all data is sync'd up, and it is taking little long  
> time. and the main thread/process is waiting for fsync to complete.  
> Please run the same test with direct IO .
> Add "direct=1" in your jobfile.

It wont fsync by default, so I don't think that is it. But if we get an
overview of where the other fio processes are stuck, it'll be a bit
easier to see what the problem is here :-)

If you are in fsync, then fio will also list an 'F' for that job in the
status line. So that should be easy to tell.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unexpected results from simple test?
  2009-06-22 10:51 ` Jens Axboe
@ 2009-06-22 11:16   ` Richard Kennedy
  2009-06-22 14:20   ` Richard Kennedy
  1 sibling, 0 replies; 9+ messages in thread
From: Richard Kennedy @ 2009-06-22 11:16 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

On Mon, 2009-06-22 at 12:51 +0200, Jens Axboe wrote:

> 
> Did you pull changes into that fio repo and not do a make clean before
> compiling it? The build system is a bit broken, so it wont always
> rebuild everything if eg fio.h had changes to structures. And that'll
> cause weird behaviour sometimes.
> In any case, please try and download a new fresh version and see if that
> makes it any better.
> 
> There was also a recent linux kernel futex problem that fio triggered,
> so it could also be a kernel issue. Again, knowing what the two other
> jobs are doing would help.
> 

I built fio from fio-1.26.tar.gz so it was in a clean directory.
I've just downloaded the 1.28 snapshot and will try that, but I can try
the git version if you like. 

I get the status of both jobs but it get stuck at 

> Jobs: 1 (f=0): [_W] [100.0% done] [     0/     0 kb/s] [eta 00m:00s] 

I'll try 1.28 & let you know.

I haven't updated the kernel is a few days, so I can try the latest git
as well.

regards
Richard




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unexpected results from simple test?
  2009-06-22 10:51 ` Jens Axboe
  2009-06-22 11:16   ` Richard Kennedy
@ 2009-06-22 14:20   ` Richard Kennedy
  2009-06-22 17:57     ` Jens Axboe
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Kennedy @ 2009-06-22 14:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio


> In any case, please try and download a new fresh version and see if that
> makes it any better.
> 
> There was also a recent linux kernel futex problem that fio triggered,
> so it could also be a kernel issue. Again, knowing what the two other
> jobs are doing would help.
> 
Hi Jens,

'strace -f' running on a freshly built fio 1.28

It runs somewhat slowly but I just get lots of reads & writes like these

[pid  6597] gettimeofday({1245679454, 899777}, NULL) = 0
[pid  6597] lseek(8, 419278848, SEEK_SET) = 419278848
[pid  6597] gettimeofday({1245679454, 899917}, NULL) = 0
[pid  6597] gettimeofday({1245679454, 899988}, NULL) = 0
[pid  6597] gettimeofday({1245679454, 900059}, NULL) = 0
[pid  6597] write(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
\326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
[pid  6597] gettimeofday({1245679454, 900244}, NULL) = 0
[pid  6597] lseek(8, 419282944, SEEK_SET) = 419282944
[pid  6597] gettimeofday({1245679454, 900391}, NULL) = 0
[pid  6597] gettimeofday({1245679454, 900463}, NULL) = 0
[pid  6597] gettimeofday({1245679454, 900534}, NULL) = 0
[pid  6597] read(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
\326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
[pid  6597] gettimeofday({1245679454, 900705}, NULL) = 0
[pid  6597] lseek(8, 419287040, SEEK_SET) = 419287040
[pid  6597] gettimeofday({1245679454, 900845}, NULL) = 0
[pid  6597] gettimeofday({1245679454, 900914}, NULL) = 0
[pid  6597] gettimeofday({1245679454, 900985}, NULL) = 0
[pid  6597] write(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
\326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
[pid  6597] gettimeofday({1245679454, 901171}, NULL) = 0
[pid  6597] lseek(8, 419291136, SEEK_SET) = 419291136
[pid  6597] gettimeofday({1245679454, 901309}, NULL) = 0
[pid  6597] gettimeofday({1245679454, 901379}, NULL) = 0
[pid  6597] gettimeofday({1245679454, 901460}, NULL) = 0
[pid  6597] read(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
\326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096


where pid 6597 is job f1

I'll rebuild my kernel from the latest git, and check that.

regards
Richard




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unexpected results from simple test?
  2009-06-22 14:20   ` Richard Kennedy
@ 2009-06-22 17:57     ` Jens Axboe
  2009-06-23 10:02       ` Richard Kennedy
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2009-06-22 17:57 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: fio

On Mon, Jun 22 2009, Richard Kennedy wrote:
> 
> > In any case, please try and download a new fresh version and see if that
> > makes it any better.
> > 
> > There was also a recent linux kernel futex problem that fio triggered,
> > so it could also be a kernel issue. Again, knowing what the two other
> > jobs are doing would help.
> > 
> Hi Jens,
> 
> 'strace -f' running on a freshly built fio 1.28
> 
> It runs somewhat slowly but I just get lots of reads & writes like these
> 
> [pid  6597] gettimeofday({1245679454, 899777}, NULL) = 0
> [pid  6597] lseek(8, 419278848, SEEK_SET) = 419278848
> [pid  6597] gettimeofday({1245679454, 899917}, NULL) = 0
> [pid  6597] gettimeofday({1245679454, 899988}, NULL) = 0
> [pid  6597] gettimeofday({1245679454, 900059}, NULL) = 0
> [pid  6597] write(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> [pid  6597] gettimeofday({1245679454, 900244}, NULL) = 0
> [pid  6597] lseek(8, 419282944, SEEK_SET) = 419282944
> [pid  6597] gettimeofday({1245679454, 900391}, NULL) = 0
> [pid  6597] gettimeofday({1245679454, 900463}, NULL) = 0
> [pid  6597] gettimeofday({1245679454, 900534}, NULL) = 0
> [pid  6597] read(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> [pid  6597] gettimeofday({1245679454, 900705}, NULL) = 0
> [pid  6597] lseek(8, 419287040, SEEK_SET) = 419287040
> [pid  6597] gettimeofday({1245679454, 900845}, NULL) = 0
> [pid  6597] gettimeofday({1245679454, 900914}, NULL) = 0
> [pid  6597] gettimeofday({1245679454, 900985}, NULL) = 0
> [pid  6597] write(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> [pid  6597] gettimeofday({1245679454, 901171}, NULL) = 0
> [pid  6597] lseek(8, 419291136, SEEK_SET) = 419291136
> [pid  6597] gettimeofday({1245679454, 901309}, NULL) = 0
> [pid  6597] gettimeofday({1245679454, 901379}, NULL) = 0
> [pid  6597] gettimeofday({1245679454, 901460}, NULL) = 0
> [pid  6597] read(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> 
> 
> where pid 6597 is job f1
> 
> I'll rebuild my kernel from the latest git, and check that.

So it's still running, perhaps the eta is just off. Is write back
caching disabled on that drive?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unexpected results from simple test?
  2009-06-22 17:57     ` Jens Axboe
@ 2009-06-23 10:02       ` Richard Kennedy
  2009-06-23 10:54         ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Kennedy @ 2009-06-23 10:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

On Mon, 2009-06-22 at 19:57 +0200, Jens Axboe wrote:
> On Mon, Jun 22 2009, Richard Kennedy wrote:
> > 
> > > In any case, please try and download a new fresh version and see if that
> > > makes it any better.
> > > 
> > > There was also a recent linux kernel futex problem that fio triggered,
> > > so it could also be a kernel issue. Again, knowing what the two other
> > > jobs are doing would help.
> > > 
> > Hi Jens,
> > 
> > 'strace -f' running on a freshly built fio 1.28
> > 
> > It runs somewhat slowly but I just get lots of reads & writes like these
> > 
> > [pid  6597] gettimeofday({1245679454, 899777}, NULL) = 0
> > [pid  6597] lseek(8, 419278848, SEEK_SET) = 419278848
> > [pid  6597] gettimeofday({1245679454, 899917}, NULL) = 0
> > [pid  6597] gettimeofday({1245679454, 899988}, NULL) = 0
> > [pid  6597] gettimeofday({1245679454, 900059}, NULL) = 0
> > [pid  6597] write(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> > \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> > [pid  6597] gettimeofday({1245679454, 900244}, NULL) = 0
> > [pid  6597] lseek(8, 419282944, SEEK_SET) = 419282944
> > [pid  6597] gettimeofday({1245679454, 900391}, NULL) = 0
> > [pid  6597] gettimeofday({1245679454, 900463}, NULL) = 0
> > [pid  6597] gettimeofday({1245679454, 900534}, NULL) = 0
> > [pid  6597] read(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> > \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> > [pid  6597] gettimeofday({1245679454, 900705}, NULL) = 0
> > [pid  6597] lseek(8, 419287040, SEEK_SET) = 419287040
> > [pid  6597] gettimeofday({1245679454, 900845}, NULL) = 0
> > [pid  6597] gettimeofday({1245679454, 900914}, NULL) = 0
> > [pid  6597] gettimeofday({1245679454, 900985}, NULL) = 0
> > [pid  6597] write(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> > \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> > [pid  6597] gettimeofday({1245679454, 901171}, NULL) = 0
> > [pid  6597] lseek(8, 419291136, SEEK_SET) = 419291136
> > [pid  6597] gettimeofday({1245679454, 901309}, NULL) = 0
> > [pid  6597] gettimeofday({1245679454, 901379}, NULL) = 0
> > [pid  6597] gettimeofday({1245679454, 901460}, NULL) = 0
> > [pid  6597] read(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> > \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> > 
> > 
> > where pid 6597 is job f1
> > 
> > I'll rebuild my kernel from the latest git, and check that.
> 
> So it's still running, perhaps the eta is just off. Is write back
> caching disabled on that drive?
> 

It seems better on the latest git kernel, the time for the complete test
is now 1m30.
Although the eta times do seem strange. As fio starts up I see extremely
large eta's flash up , i.e. some huge number of days, but I haven't
managed to capture one yet, and it says it's finished long before it
actually stops.

Yes, strangely the write_cache is disabled on sda, but enabled on sdb.

I'll turn it on and see if there's any difference.

regards
Richard
  




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unexpected results from simple test?
  2009-06-23 10:02       ` Richard Kennedy
@ 2009-06-23 10:54         ` Jens Axboe
  0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2009-06-23 10:54 UTC (permalink / raw)
  To: Richard Kennedy; +Cc: fio

On Tue, Jun 23 2009, Richard Kennedy wrote:
> On Mon, 2009-06-22 at 19:57 +0200, Jens Axboe wrote:
> > On Mon, Jun 22 2009, Richard Kennedy wrote:
> > > 
> > > > In any case, please try and download a new fresh version and see if that
> > > > makes it any better.
> > > > 
> > > > There was also a recent linux kernel futex problem that fio triggered,
> > > > so it could also be a kernel issue. Again, knowing what the two other
> > > > jobs are doing would help.
> > > > 
> > > Hi Jens,
> > > 
> > > 'strace -f' running on a freshly built fio 1.28
> > > 
> > > It runs somewhat slowly but I just get lots of reads & writes like these
> > > 
> > > [pid  6597] gettimeofday({1245679454, 899777}, NULL) = 0
> > > [pid  6597] lseek(8, 419278848, SEEK_SET) = 419278848
> > > [pid  6597] gettimeofday({1245679454, 899917}, NULL) = 0
> > > [pid  6597] gettimeofday({1245679454, 899988}, NULL) = 0
> > > [pid  6597] gettimeofday({1245679454, 900059}, NULL) = 0
> > > [pid  6597] write(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> > > \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> > > [pid  6597] gettimeofday({1245679454, 900244}, NULL) = 0
> > > [pid  6597] lseek(8, 419282944, SEEK_SET) = 419282944
> > > [pid  6597] gettimeofday({1245679454, 900391}, NULL) = 0
> > > [pid  6597] gettimeofday({1245679454, 900463}, NULL) = 0
> > > [pid  6597] gettimeofday({1245679454, 900534}, NULL) = 0
> > > [pid  6597] read(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> > > \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> > > [pid  6597] gettimeofday({1245679454, 900705}, NULL) = 0
> > > [pid  6597] lseek(8, 419287040, SEEK_SET) = 419287040
> > > [pid  6597] gettimeofday({1245679454, 900845}, NULL) = 0
> > > [pid  6597] gettimeofday({1245679454, 900914}, NULL) = 0
> > > [pid  6597] gettimeofday({1245679454, 900985}, NULL) = 0
> > > [pid  6597] write(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> > > \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> > > [pid  6597] gettimeofday({1245679454, 901171}, NULL) = 0
> > > [pid  6597] lseek(8, 419291136, SEEK_SET) = 419291136
> > > [pid  6597] gettimeofday({1245679454, 901309}, NULL) = 0
> > > [pid  6597] gettimeofday({1245679454, 901379}, NULL) = 0
> > > [pid  6597] gettimeofday({1245679454, 901460}, NULL) = 0
> > > [pid  6597] read(8, "+\265bB\305\243g+\214\313\313\377\20H\237\356\266
> > > \326X\335<\37\317K\207\317,/\335J\206\267\33"..., 4096) = 4096
> > > 
> > > 
> > > where pid 6597 is job f1
> > > 
> > > I'll rebuild my kernel from the latest git, and check that.
> > 
> > So it's still running, perhaps the eta is just off. Is write back
> > caching disabled on that drive?
> > 
> 
> It seems better on the latest git kernel, the time for the complete test
> is now 1m30.
> Although the eta times do seem strange. As fio starts up I see extremely
> large eta's flash up , i.e. some huge number of days, but I haven't
> managed to capture one yet, and it says it's finished long before it
> actually stops.
> 
> Yes, strangely the write_cache is disabled on sda, but enabled on sdb.
> 
> I'll turn it on and see if there's any difference.

So at this point I'm not sure exactly what you see as the problem. If
write caching is disabled, a read/write mix or randwrite is going to be
really slow. 90 seconds is ~4MB/sec, that sounds very plausible.

The eta may not be too good, probably because initially you are just
offloading to page cache. Once you start hitting platter, speeds will
plummet.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-06-23 10:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-22 10:41 unexpected results from simple test? Richard Kennedy
2009-06-22 10:40 ` Gurudas Pai
2009-06-22 10:53   ` Jens Axboe
2009-06-22 10:51 ` Jens Axboe
2009-06-22 11:16   ` Richard Kennedy
2009-06-22 14:20   ` Richard Kennedy
2009-06-22 17:57     ` Jens Axboe
2009-06-23 10:02       ` Richard Kennedy
2009-06-23 10:54         ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.