All of lore.kernel.org
 help / color / mirror / Atom feed
* SATA-performance: Linux vs. FreeBSD
@ 2007-02-12 14:02 Martin A. Fink
  2007-02-12 17:04 ` Andi Kleen
  2007-02-15  5:48 ` Tejun Heo
  0 siblings, 2 replies; 33+ messages in thread
From: Martin A. Fink @ 2007-02-12 14:02 UTC (permalink / raw)
  To: linux-kernel

Dear all,

I did some performance tests that made me really wonder:

My Hardware:
Asus P5LD2 board with Intel i945P chipset, ICH7R southbridge
CPU Intel Core 2 Duo E6300 at 1.86 GHz, 2 MB Cache
1 GB RAM
My Software:
OpenSuSE 10.2 with Linux kernel 2.6.18, x86-64 architecture
FreeBSD 6.2

Testdrives:
1. HDD: Seagate ST3250820AS RPM 7200.9, 8 MB Cache, 250 GB, SATA-II
   (Harddisk Drive)
2. SSD: Adtron AF25FB, 27GB, SATA Revision 1.0a (Solid State Disk)

What I did:
I wrote blocks of 1 MB size to file. Each 1 GB I made a fsync and took the 
time. For those tests with filesystems I wrote files of 1 GB size, otherwise 
I just wrote to the raw device.

Results: -1-

Test					OpenSuSE(AHCI)			FreeBSD(AHCI)
---------------------------------------------------------------------------------------------------------------------------------------
SSD(vfat 25GB)			41+/-2 MB/s at 4-10%		15+/-0 MB/s at 2% CPU
SSD(raw  25GB) 		26+/-1 MB/s at 4-10% CPU	48+/-0 MB/s at 1% CPU
SSD(ext3 25GB)		39+/-5 MB/s at 10-15% CPU	34+/-0 MB/s at 14% CPU
SSD(ext2 25GB)		42+/-1 MB/s at 10-15% CPU	32+/-0 MB/s at 10% CPU
---------------------------------------------------------------------------------------------------------------------------------------

Test					OpenSuSE (AHCI off)		FreeBSD (AHCI off)
---------------------------------------------------------------------------------------------------------------------------------------
SSD(vfat 25GB)			22+/-4 MB/s at 6-19% CPU	--
SSD(raw  25GB)		33+/-4 MB/s at 7-14% CPU	41+/-0 MB/s at 1% CPU
SSD(ext2 25GB)		27+/-6 MB/s at 6-14% CPU	--
---------------------------------------------------------------------------------------------------------------------------------------

Question 1:
Can anybody explain to me, why writing to a SATA-I device with AHCI consumes 
so much CPU time using Linux, while it takes almost no CPU time on FreeBSD 
6.2 ? Especially comparing values of writing to the raw device?

Question 2:
Can anybody explain to me, why writing to a solid state disk (a kind of memory 
that always has the same constant bandwidth) has such big standard errors in 
writing rate using Linux (between 1 to 6 MB/s error) while FreeBSD gives an 
almost constant writing rate (as one would expect it for a SSD) ?

Question 3:
Why is writing to a raw device in Linux slower than using e.g. ext2 ? And why 
is Linux writing rate much lower (-12.5 % for the best case) compared to 
writing rate of FreeBSD?

Question 4:
When writing to the SATA-II HDD Linux is around 10% slower than FreeBSD when 
using ext3, but around as fast as FreeBSD when writing raw. Why?


How can I improve the speed of Linux,
Thanks for advices

Martin

PS: part of my testcode:

  int fd=open(fileName, O_WRONLY | O_CREAT | O_TRUNC, 0666);
  (void)gettimeofday(&start, 0);
  for (long bl=0; bl < blocksPerGigaByte; ++bl)
    write(fd, block, blockSize);
  fsync(fd);
  (void)gettimeofday(&ende, 0);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 17:04 ` Andi Kleen
@ 2007-02-12 16:27   ` Martin A. Fink
  2007-02-12 18:41     ` Andi Kleen
  2007-02-12 16:37   ` Martin A. Fink
  2007-02-12 17:42   ` Martin A. Fink
  2 siblings, 1 reply; 33+ messages in thread
From: Martin A. Fink @ 2007-02-12 16:27 UTC (permalink / raw)
  To: linux-kernel

Am Montag, 12. Februar 2007 18:04 schrieb Andi Kleen:
> "Martin A. Fink" <fink@mpe.mpg.de> writes:
> > 
> > What I did:
> > I wrote blocks of 1 MB size to file. Each 1 GB I made a fsync and took the 
> > time. For those tests with filesystems I wrote files of 1 GB size, 
otherwise 
> > I just wrote to the raw device.
> 
> Newer Linux versions depending on the disk and the file system will tell
> the disk to flush the buffers to disk on fsync. FreeBSD might or might not
> do that, but if it doesn't it would explain the difference.

If you call fsync in BSD then you get what you expect. anything that is still 
not on disk will be written. Afterwards fsync returns... So this should be 
the same like with linux?!
> 
> > 
> > Results: -1-
> > 
> > Test					OpenSuSE(AHCI)			FreeBSD(AHCI)
> > 
---------------------------------------------------------------------------------------------------------------------------------------
> > SSD(vfat 25GB)			41+/-2 MB/s at 4-10%		15+/-0 MB/s at 2% CPU
> 
> vfat is certainly not a performance optimized file system.
That is just a minor test.
> 
> > SSD(raw  25GB) 		26+/-1 MB/s at 4-10% CPU	48+/-0 MB/s at 1% CPU

The above line is what makes me wondering !!!

> > SSD(ext3 25GB)		39+/-5 MB/s at 10-15% CPU	34+/-0 MB/s at 14% CPU
> > SSD(ext2 25GB)		42+/-1 MB/s at 10-15% CPU	32+/-0 MB/s at 10% CPU
> 
> 
> You could use oprofile (http://oprofile.sourceforge.net) to find out
> where the CPU is being used.
> 
> 
> > 
---------------------------------------------------------------------------------------------------------------------------------------
> > 
> > Test					OpenSuSE (AHCI off)		FreeBSD (AHCI off)
> > 
---------------------------------------------------------------------------------------------------------------------------------------
> > SSD(vfat 25GB)			22+/-4 MB/s at 6-19% CPU	--
> > SSD(raw  25GB)		33+/-4 MB/s at 7-14% CPU	41+/-0 MB/s at 1% CPU
> 
> I remember vaguely (but I might be wrong here) the standard block
> character devices on FreeBSD are buffered, while raw is truly
> unbuffered on Linux. Naive programs (no optimized IO threads or aio) 
> on truly unbuffered devices tend to perform poorly because they
> don't do any write behind.

But the big question still is -- buffered or not -- where do the big 
variations within linux come frome? I am not writing small blocks. I write 
huge amounts of data. So the buffer will always be full. And: Linux is even 
slower then BSD if it can use a buffer. The maximum performance of Linux is 
42 MB/s (buffered) while the maximum performance of BSD is 48 MB/s (buffered 
or not -- i don't know).
If I use a normal SATA-II disk, there are no differences between BSD and Linux 
when writing to the raw device... So it cant be a buffer-problem alone.
> 
> It might also useful if you post the libata related parts of your
> boot log.

> > 
> > Question 2:
> > Can anybody explain to me, why writing to a solid state disk (a kind of 
> > memory  
> > that always has the same constant bandwidth) has such big standard errors 
> > in  
> > writing rate using Linux (between 1 to 6 MB/s error) while FreeBSD gives 
> > an  
> > almost constant writing rate (as one would expect it for a SSD) ?
> 
> Could be buffered vs unbuffered. Unbuffered single threaded writes
> tend to be quite variable.
This does not answer the big variation when writing with ext3 of +/- 5 MB/s.

I still don't understand the buffer argument. If one writes 25 GB in blocks of 
1 MB your buffer should be always full...
> 
> > Question 3:
> > Why is writing to a raw device in Linux slower than using e.g. ext2 ? And 
why 
> > is Linux writing rate much lower (-12.5 % for the best case) compared to 
> > writing rate of FreeBSD?
> 
> It's really hard to make raw io perform well without complicated
> efforts because nobody will hide the IO latencies. That is why
> buffered IO is normally recommend

Is there a buffered io device that I can use, but that does not use a 
filesystem?

> 
> -Andi
> 

-- 
Dipl. Physiker
Martin Anton Fink
Max Planck Institute for extraterrestrial Physics
Giessenbachstrasse
85741 Garching
Germany
Tel. +49-(0)89-30000-3645
Fax. +49-(0)89-30000-3569

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 17:04 ` Andi Kleen
  2007-02-12 16:27   ` Martin A. Fink
@ 2007-02-12 16:37   ` Martin A. Fink
  2007-02-12 18:19     ` Stefan Richter
  2007-02-13 19:09     ` Jeff Carr
  2007-02-12 17:42   ` Martin A. Fink
  2 siblings, 2 replies; 33+ messages in thread
From: Martin A. Fink @ 2007-02-12 16:37 UTC (permalink / raw)
  To: linux-kernel

Some more info:

:~> strace -c -T -o trace.out dd if=/dev/zero of=test.txt bs=10MB count=200

200+0 Datensätze ein
200+0 Datensätze aus
2000000000 bytes (2,0 GB) copied, 52,8632 seconds, 37,8 MB/s

test.txt:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 93.26    6.845265       33555       204           write
  6.41    0.470283       11757        40        18 open
  0.32    0.023687         116       205           read
  0.00    0.000149           9        16           mmap2
  0.00    0.000119          40         3           munmap
  0.00    0.000081           3        24           close
  0.00    0.000068           6        11           old_mmap
  0.00    0.000064           3        20           fstat64
  0.00    0.000040           4        10           rt_sigaction
  0.00    0.000036          12         3           madvise
  0.00    0.000014           7         2           clock_gettime
  0.00    0.000010           3         3           brk
  0.00    0.000008           8         1           _sysctl
  0.00    0.000007           7         1         1 access
  0.00    0.000006           6         1           mprotect
  0.00    0.000005           5         1           futex
  0.00    0.000004           4         1           uname
  0.00    0.000004           4         1           _llseek
  0.00    0.000003           3         1           rt_sigprocmask
  0.00    0.000003           3         1           getrlimit
  0.00    0.000003           3         1           set_thread_area
  0.00    0.000003           3         1           set_tid_address
------ ----------- ----------- --------- --------- ----------------
100.00    7.339862                   551        19 total

This means, that the CPU is only 7.3 of 52.8 seconds working. This is what one 
can hear: If I run programs where the time they need is the same time as 
strace says, then I have 100% CPU load and the cpu fan starts to blow 
heavily. In the case here, the heat fan does not do anything. It looks like 
the SATA driver simply blocks the CPU while doing whatever...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 14:02 SATA-performance: Linux vs. FreeBSD Martin A. Fink
@ 2007-02-12 17:04 ` Andi Kleen
  2007-02-12 16:27   ` Martin A. Fink
                     ` (2 more replies)
  2007-02-15  5:48 ` Tejun Heo
  1 sibling, 3 replies; 33+ messages in thread
From: Andi Kleen @ 2007-02-12 17:04 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

"Martin A. Fink" <fink@mpe.mpg.de> writes:
> 
> What I did:
> I wrote blocks of 1 MB size to file. Each 1 GB I made a fsync and took the 
> time. For those tests with filesystems I wrote files of 1 GB size, otherwise 
> I just wrote to the raw device.

Newer Linux versions depending on the disk and the file system will tell
the disk to flush the buffers to disk on fsync. FreeBSD might or might not
do that, but if it doesn't it would explain the difference.

> 
> Results: -1-
> 
> Test					OpenSuSE(AHCI)			FreeBSD(AHCI)
> ---------------------------------------------------------------------------------------------------------------------------------------
> SSD(vfat 25GB)			41+/-2 MB/s at 4-10%		15+/-0 MB/s at 2% CPU

vfat is certainly not a performance optimized file system.

> SSD(raw  25GB) 		26+/-1 MB/s at 4-10% CPU	48+/-0 MB/s at 1% CPU
> SSD(ext3 25GB)		39+/-5 MB/s at 10-15% CPU	34+/-0 MB/s at 14% CPU
> SSD(ext2 25GB)		42+/-1 MB/s at 10-15% CPU	32+/-0 MB/s at 10% CPU


You could use oprofile (http://oprofile.sourceforge.net) to find out
where the CPU is being used.


> ---------------------------------------------------------------------------------------------------------------------------------------
> 
> Test					OpenSuSE (AHCI off)		FreeBSD (AHCI off)
> ---------------------------------------------------------------------------------------------------------------------------------------
> SSD(vfat 25GB)			22+/-4 MB/s at 6-19% CPU	--
> SSD(raw  25GB)		33+/-4 MB/s at 7-14% CPU	41+/-0 MB/s at 1% CPU

I remember vaguely (but I might be wrong here) the standard block
character devices on FreeBSD are buffered, while raw is truly
unbuffered on Linux. Naive programs (no optimized IO threads or aio) 
on truly unbuffered devices tend to perform poorly because they
don't do any write behind.

It might also useful if you post the libata related parts of your
boot log.
> 
> Question 2:
> Can anybody explain to me, why writing to a solid state disk (a kind of memory 
> that always has the same constant bandwidth) has such big standard errors in 
> writing rate using Linux (between 1 to 6 MB/s error) while FreeBSD gives an 
> almost constant writing rate (as one would expect it for a SSD) ?

Could be buffered vs unbuffered. Unbuffered single threaded writes
tend to be quite variable.

> Question 3:
> Why is writing to a raw device in Linux slower than using e.g. ext2 ? And why 
> is Linux writing rate much lower (-12.5 % for the best case) compared to 
> writing rate of FreeBSD?

It's really hard to make raw io perform well without complicated
efforts because nobody will hide the IO latencies. That is why
buffered IO is normally recommend

-Andi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 17:04 ` Andi Kleen
  2007-02-12 16:27   ` Martin A. Fink
  2007-02-12 16:37   ` Martin A. Fink
@ 2007-02-12 17:42   ` Martin A. Fink
  2 siblings, 0 replies; 33+ messages in thread
From: Martin A. Fink @ 2007-02-12 17:42 UTC (permalink / raw)
  To: linux-kernel

System Details:

dmesg: (parts)

Bootdata ok (command line is root=/dev/sda7 vga=0x31a    resume=/dev/sda5 
splash=silent)
Linux version 2.6.18.2-34-default (geeko@buildhost) (gcc version 4.1.2 
20061115 (prerelease) (SUSE Linux)) #1 SMP Mon Nov 27 11:46:27 UTC 2006
...
Using ACPI (MADT) for SMP configuration information
...
Intel(R) Core(TM)2 CPU          6300  @ 1.86GHz stepping 06
Brought up 2 CPUs
...
ACPI: Processor [CPU1] (supports 8 throttling states)
ACPI: Processor [CPU2] (supports 8 throttling states)
...
ICH7: IDE controller at PCI slot 0000:00:1f.1
GSI 18 sharing vector 0xD9 and IRQ 18
ACPI: PCI Interrupt 0000:00:1f.1[A] -> GSI 22 (level, low) -> IRQ 217
ICH7: chipset revision 1
ICH7: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: HL-DT-STDVD-RAM GSA-H22N, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
libata version 2.00 loaded.
ahci 0000:00:1f.2: version 2.0
GSI 19 sharing vector 0xE1 and IRQ 19
ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 23 (level, low) -> IRQ 225
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq led clo pio slum part 
ata1: SATA max UDMA/133 cmd 0xFFFFC20000026D00 ctl 0x0 bmdma 0x0 irq 233
ata2: SATA max UDMA/133 cmd 0xFFFFC20000026D80 ctl 0x0 bmdma 0x0 irq 233
ata3: SATA max UDMA/133 cmd 0xFFFFC20000026E00 ctl 0x0 bmdma 0x0 irq 233
ata4: SATA max UDMA/133 cmd 0xFFFFC20000026E80 ctl 0x0 bmdma 0x0 irq 233
scsi0 : ahci
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7, max UDMA/133, 156301488 sectors: LBA48 NCQ (depth 31/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : ahci
ata2: SATA link down (SStatus 0 SControl 300)
scsi2 : ahci
ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata3.00: ATA-6, max UDMA/100, 57337056 sectors: LBA 
ata3.00: ata3: dev 0 multi count 1
ata3.00: applying bridge limits
ata3.00: configured for UDMA/100
scsi3 : ahci
ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata4.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48 NCQ (depth 31/32)
ata4.00: ata4: dev 0 multi count 16
ata4.00: configured for UDMA/133
  Vendor: ATA       Model: ST380811AS        Rev: 3.AA
Losing some ticks... checking if CPU frequency changed.
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 >
 sda2: <bsd: sda9 sda10 sda11 sda12 sda13 >
sd 0:0:0:0: Attached scsi disk sda
  Vendor: ATA       Model: Adtron A25FB-28G  Rev: BF22
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 57337056 512-byte hdwr sectors (29357 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write through
SCSI device sdb: 57337056 512-byte hdwr sectors (29357 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write through
 sdb: sdb1
sd 2:0:0:0: Attached scsi disk sdb
  Vendor: ATA       Model: ST3250820AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 2:0:0:0: Attached scsi generic sg1 type 0
SCSI device sdc: 488397168 512-byte hdwr sectors (250059 MB)
sdc: Write Protect is off
sdc: Mode Sense: 00 3a 00 00
SCSI device sdc: drive cache: write back
 sdc:
sd 3:0:0:0: Attached scsi disk sdc
sd 3:0:0:0: Attached scsi generic sg2 type 0
...


strace output:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 73.73   49.904049        1947     25627           write
 25.66   17.365062      694602        25           fsync
  0.62    0.416500       59500         7           close
  0.00    0.000000           0         4           read
  0.00    0.000000           0         7           open
  0.00    0.000000           0         5           fstat
  0.00    0.000000           0        16           mmap
  0.00    0.000000           0         7           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           uname
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         4           fadvise64
------ ----------- ----------- --------- --------- ----------------
100.00   67.685611                 25710         1 total

as result of $> strace -c -T -o trace.out PTestY

PTestY run for 12.5 min = 752 seconds
Thus it stuck somewhere in the system for around 690 seconds !!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 18:41     ` Andi Kleen
@ 2007-02-12 17:56       ` Martin A. Fink
  2007-02-12 18:17         ` Ray Lee
                           ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Martin A. Fink @ 2007-02-12 17:56 UTC (permalink / raw)
  To: Andi Kleen, linux-kernel

Am Montag, 12. Februar 2007 19:41 schrieben Sie:
> "Martin A. Fink" <fink@mpe.mpg.de> writes:
> 
> Your mailer seems to be broken. It drops cc.
> > 
> > If you call fsync in BSD then you get what you expect. anything that is 
still 
> > not on disk will be written. Afterwards fsync returns... So this should be 
> > the same like with linux?!
> 
> Not necessarily.  The disk may buffer additionally. Handling that
> differs widely, but modern Linux forces flushes to platter if the hardware 
support 
> it.
> 
> > But the big question still is -- buffered or not -- where do the big 
> > variations within linux come frome? I am not writing small blocks. I write 
> > huge amounts of data.
> 
> 1MB is nowhere near huge by modern standards. Many IO subsystems are
> only happy with multi MB requests. 
> 
> > So the buffer will always be full.
> 
> Hardly. Especially not if you do synchronous fsync inbetween.

Well no. I write 1 GB in blocks of 1 MB. After that I call fsync. Then I 
process the next Gigabyte...
> 
> > If I use a normal SATA-II disk, there are no differences between BSD and 
Linux 
> > when writing to the raw device... So it cant be a buffer-problem alone.
> 
> Yes that is something that needs to be investigated. That is why I suggested
> oprofile if your assertation of a more CPU overhead on Linux is true.
> 
> > I still don't understand the buffer argument. If one writes 25 GB in 
blocks of 
> > 1 MB your buffer should be always full...
> 
> Your mental model of a IO subsystem seems to be quite off.
> Think what happens when you fsync and submit synchronously.

See above, how I do writing.
> 
> It's like sending something down a long pipe and waiting until it arrives
> at the bottom and you hear the echo of the impact. Then only then you send 
again. 
> There will be always long periods when the pipe will be empty.
> 
> If you use large enough blocks these gaps will be quite small and
> might effectively become unimportant, but 1MB is nowhere near big enough 
> for that.

I tested this: When I write in blocks of 8kB or less the effect you describe 
happens. But above 100kB blocksize there is no more increase of speed.

> 
> > Is there a buffered io device that I can use, but that does not use a 
> > filesystem?
> 
> /dev/sdX*. However it has some other issues that also don't make
> it ideal. File systems are usually best.

My experience with filesystems is: I write some data and the write-function 
returns nearly immediatelly. So I write again. Sometimes it returns only 
after some 100-300ms. I think this happens always then when the buffer is 
full and thus linux starts to write to disk. After this happend, it returns 
again nearly immediatelly and after another while the same trouble happen 
again. But not in a regular order...

I have to store big amounts of data coming from 2 digital cameras to disk. 
Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for 
a long period of time. So it is important for me that the harddisk drive is 
reliable in the sense of "if it is capable of 50 MB/s then it should operate 
at this speed. Constantly."

> 
> -Andi
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 17:56       ` Martin A. Fink
@ 2007-02-12 18:17         ` Ray Lee
  2007-02-12 19:08         ` Alan
  2007-02-12 23:31         ` Matthias Schniedermeyer
  2 siblings, 0 replies; 33+ messages in thread
From: Ray Lee @ 2007-02-12 18:17 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: Andi Kleen, linux-kernel

On 2/12/07, Martin A. Fink <fink@mpe.mpg.de> wrote:
> Am Montag, 12. Februar 2007 19:41 schrieben Sie:
> I have to store big amounts of data coming from 2 digital cameras to disk.
> Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for
> a long period of time. So it is important for me that the harddisk drive is
> reliable in the sense of "if it is capable of 50 MB/s then it should operate
> at this speed. Constantly."

Ah, here is a misunderstanding, I think. By default, Linux won't start
writing out dirty buffers until something like 40% of memory is used.
This is to help common workloads where many temporary files are
created and destroyed, or even data that gets written then overwritten
shortly after.

If the kernel were to immediately write out that dirty data, it would
be slower than leaving it in memory for those workloads. But since
that isn't best for everyone, there's a parameter that controls that
dirty threshold. Setting that to a lower value will help even out the
writeout, and start it early, just as you seem to be requesting.

Hmm, it may be one of:

/proc/sys/vm/dirty_ratio
/proc/sys/vm/dirty_background_ratio

Try tweaking those to much lower values and see if that helps.

Ray

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 16:37   ` Martin A. Fink
@ 2007-02-12 18:19     ` Stefan Richter
  2007-02-13 19:09     ` Jeff Carr
  1 sibling, 0 replies; 33+ messages in thread
From: Stefan Richter @ 2007-02-12 18:19 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

Martin A. Fink wrote:
> This means, that the CPU is only 7.3 of 52.8 seconds working.
...
> It looks like 
> the SATA driver simply blocks the CPU while doing whatever...

The system sleeps while waiting for the disk (actually, for the SATA
host port) to be done with its work.

As Andi explained, if the system gives the disk a small task, waits for
the task to be completed, then gives it a next task and so on, latencies
add up and eat into effective bandwidth. Give the disk a whole set of
tasks so that
  - it has immediately something new to do when it finished one task,
  - deep pipes are not mostly empty due to "bubbles" in the pipe,
  - tasks can be reordered to be executed in optimized manner for good
    bandwidth utilization (if software/ firmware/ hardware is present
    which supports this; e.g. the Linux kernel itself),
etc.
Also make each task large so that the ratio of protocol overhead to net
data payload stays minimal.
-- 
Stefan Richter
-=====-=-=== --=- -==--
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 16:27   ` Martin A. Fink
@ 2007-02-12 18:41     ` Andi Kleen
  2007-02-12 17:56       ` Martin A. Fink
  0 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2007-02-12 18:41 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

"Martin A. Fink" <fink@mpe.mpg.de> writes:

Your mailer seems to be broken. It drops cc.
> 
> If you call fsync in BSD then you get what you expect. anything that is still 
> not on disk will be written. Afterwards fsync returns... So this should be 
> the same like with linux?!

Not necessarily.  The disk may buffer additionally. Handling that
differs widely, but modern Linux forces flushes to platter if the hardware support 
it.

> But the big question still is -- buffered or not -- where do the big 
> variations within linux come frome? I am not writing small blocks. I write 
> huge amounts of data.

1MB is nowhere near huge by modern standards. Many IO subsystems are
only happy with multi MB requests. 

> So the buffer will always be full.

Hardly. Especially not if you do synchronous fsync inbetween.

> If I use a normal SATA-II disk, there are no differences between BSD and Linux 
> when writing to the raw device... So it cant be a buffer-problem alone.

Yes that is something that needs to be investigated. That is why I suggested
oprofile if your assertation of a more CPU overhead on Linux is true.

> I still don't understand the buffer argument. If one writes 25 GB in blocks of 
> 1 MB your buffer should be always full...

Your mental model of a IO subsystem seems to be quite off.
Think what happens when you fsync and submit synchronously.

It's like sending something down a long pipe and waiting until it arrives
at the bottom and you hear the echo of the impact. Then only then you send again. 
There will be always long periods when the pipe will be empty.

If you use large enough blocks these gaps will be quite small and
might effectively become unimportant, but 1MB is nowhere near big enough 
for that.

> Is there a buffered io device that I can use, but that does not use a 
> filesystem?

/dev/sdX*. However it has some other issues that also don't make
it ideal. File systems are usually best.

-Andi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 17:56       ` Martin A. Fink
  2007-02-12 18:17         ` Ray Lee
@ 2007-02-12 19:08         ` Alan
  2007-02-12 20:34           ` Nigel Cunningham
  2007-02-13  9:34           ` Martin A. Fink
  2007-02-12 23:31         ` Matthias Schniedermeyer
  2 siblings, 2 replies; 33+ messages in thread
From: Alan @ 2007-02-12 19:08 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: Andi Kleen, linux-kernel

On Mon, 12 Feb 2007 18:56:29 +0100
"Martin A. Fink" <fink@mpe.mpg.de> wrote:

> I have to store big amounts of data coming from 2 digital cameras to disk. 
> Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for 
> a long period of time. So it is important for me that the harddisk drive is 
> reliable in the sense of "if it is capable of 50 MB/s then it should operate 
> at this speed. Constantly."

Hard disks don't do this. They support operations/second based upon
physical and rotational latency constraints, vibration levels, mechanism,
internal layout policy and the need to do housekeeping. 

If you have an ATA7 drive with suitable firmware sets you can talk to it
directly via the SG_IO interface and use the streaming feature set which
is quite different to filesystem type operations and lets you ask the
drive to do this sort of stuff - if you can find any general PC firmware
ones that support it anyway.

I'm not sure you'll get 50MB/sec sustained to work although you might
with a good current drive used for nothing else, a linear stream of data
(no seeking and file system overhead), and a non PCI controller (PCI
Express, host chipset bus etc). 

If you are using a file system then the more you fsync the more I'd
expect you to see stalling as you keep draining whats effectively an 8MB
plus pipeline on a modern drive precisely because fsync does "hitting
disk" guarantees. You also want to be sure you are not journalling data.

Alan



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 19:08         ` Alan
@ 2007-02-12 20:34           ` Nigel Cunningham
  2007-02-13  9:34           ` Martin A. Fink
  1 sibling, 0 replies; 33+ messages in thread
From: Nigel Cunningham @ 2007-02-12 20:34 UTC (permalink / raw)
  To: Alan; +Cc: Martin A. Fink, Andi Kleen, linux-kernel

Hi Alan et al.

On Mon, 2007-02-12 at 19:08 +0000, Alan wrote:
> I'm not sure you'll get 50MB/sec sustained to work although you might
> with a good current drive used for nothing else, a linear stream of data
> (no seeking and file system overhead), and a non PCI controller (PCI
> Express, host chipset bus etc). 

That's Suspend2's usage pattern when given a whole partition, so I can
state without reservation you can get maximum throughput under those
circumstances, even with a PCI controller. Swsusp should do about the
same too.

Nigel


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 17:56       ` Martin A. Fink
  2007-02-12 18:17         ` Ray Lee
  2007-02-12 19:08         ` Alan
@ 2007-02-12 23:31         ` Matthias Schniedermeyer
  2007-02-13  9:25           ` Martin A. Fink
  2 siblings, 1 reply; 33+ messages in thread
From: Matthias Schniedermeyer @ 2007-02-12 23:31 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

Martin A. Fink wrote:
> I have to store big amounts of data coming from 2 digital cameras to disk. 
> Thus I have to write blocks of around 1 MB at 30 to 50 frames per second for 
> a long period of time. So it is important for me that the harddisk drive is 
> reliable in the sense of "if it is capable of 50 MB/s then it should operate 
> at this speed. Constantly."

The good old handful of suggestions:

- Use a dedicated disc for the task.
- Use an empty disc so there is no fragmentation.
- Buy a bigger disk, they have high bandwidths.
- Buy a more "specialized" disc.
  for e.x.: Western Digital Raptor X(*) a 150GB, 10-KRPM S-ATA disc.
- Buy several discs and use RAID 0
  or alternate between discs when writing.
- use XFS. AFAIK XFS has about the best "large file" and "high
bandwidth" characteristics.
- that with XFS you can preallocate the files doesn't seem relevant in
this case. It's more for the case that you write several files
simultaneously over a longer period of time.
- Write to one large file and separate the individual files later.

if you are sure that you don't get a power-failure:
- Disable Write-Barriers, especially on a logging-filesystem.
- Enable write-caching.
(hdparm doesn't appear to be able to do that with a SATA-disc, but
blktool appears to be able to)
The later has a good chance of corrupting your filesystem when you do
get a power-failure!!!



*:
I don't think you want something from the server-line,
SCSI/FibreChannel/...?
IIRC i read a something about the first 100MB/s disc with in the 15-KRPM
league.

Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 23:31         ` Matthias Schniedermeyer
@ 2007-02-13  9:25           ` Martin A. Fink
  2007-02-13 10:08             ` Arjan van de Ven
  2007-02-13 10:16             ` Matthias Schniedermeyer
  0 siblings, 2 replies; 33+ messages in thread
From: Martin A. Fink @ 2007-02-13  9:25 UTC (permalink / raw)
  To: Matthias Schniedermeyer, linux-kernel

Am Dienstag, 13. Februar 2007 00:31 schrieben Sie:
> Martin A. Fink wrote:
> > I have to store big amounts of data coming from 2 digital cameras to disk. 
> > Thus I have to write blocks of around 1 MB at 30 to 50 frames per second 
for 
> > a long period of time. So it is important for me that the harddisk drive 
is 
> > reliable in the sense of "if it is capable of 50 MB/s then it should 
operate 
> > at this speed. Constantly."
> 
> The good old handful of suggestions:
> 
> - Use a dedicated disc for the task.

I used a dedicated disk for this task. No one else besides the task is writing 
to it!

> - Use an empty disc so there is no fragmentation.

All tests were performed on empty disk!

> - Buy a bigger disk, they have high bandwidths.

I have a flash disk from a manufacturer who grants me 48 MB/s. And FreeBSD as 
well as Windows reach this value. Only Linux 2.6.18 is far away from it (42 
MB/s)

> - Buy a more "specialized" disc.

see above

>   for e.x.: Western Digital Raptor X(*) a 150GB, 10-KRPM S-ATA disc.
> - Buy several discs and use RAID 0
>   or alternate between discs when writing.

What I have to build is an application for the International Space Station 
ISS. I am limited with power and space. So If the disk is able to write 
constantly 48 MB/s then the Operating System should do this!

> - use XFS. AFAIK XFS has about the best "large file" and "high
> bandwidth" characteristics.
> - that with XFS you can preallocate the files doesn't seem relevant in
> this case. It's more for the case that you write several files
> simultaneously over a longer period of time.
> - Write to one large file and separate the individual files later.
> 
> if you are sure that you don't get a power-failure:
> - Disable Write-Barriers, especially on a logging-filesystem.
> - Enable write-caching.
> (hdparm doesn't appear to be able to do that with a SATA-disc, but
> blktool appears to be able to)
> The later has a good chance of corrupting your filesystem when you do
> get a power-failure!!!
> 
> 
> 
> *:
> I don't think you want something from the server-line,
> SCSI/FibreChannel/...?
> IIRC i read a something about the first 100MB/s disc with in the 15-KRPM
> league.

Power consumption! See above.
> 
> Bis denn
> 
The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has 
all drivers but access to harddisk is unpredictable and thus unreliable!
What can I do??
> -- 
> Real Programmers consider "what you see is what you get" to be just as
> bad a concept in Text Editors as it is in women. No, the Real Programmer
> wants a "you asked for it, you got it" text editor -- complicated,
> cryptic, powerful, unforgiving, dangerous.
> 
> 

-- 
Dipl. Physiker
Martin Anton Fink
Max Planck Institute for extraterrestrial Physics
Giessenbachstrasse
85741 Garching
Germany
Tel. +49-(0)89-30000-3645
Fax. +49-(0)89-30000-3569

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 19:08         ` Alan
  2007-02-12 20:34           ` Nigel Cunningham
@ 2007-02-13  9:34           ` Martin A. Fink
  2007-02-13 11:25             ` Alan
  1 sibling, 1 reply; 33+ messages in thread
From: Martin A. Fink @ 2007-02-13  9:34 UTC (permalink / raw)
  To: Alan, linux-kernel

Am Montag, 12. Februar 2007 20:08 schrieben Sie:
> On Mon, 12 Feb 2007 18:56:29 +0100
> "Martin A. Fink" <fink@mpe.mpg.de> wrote:
> 
> > I have to store big amounts of data coming from 2 digital cameras to disk. 
> > Thus I have to write blocks of around 1 MB at 30 to 50 frames per second 
for 
> > a long period of time. So it is important for me that the harddisk drive 
is 
> > reliable in the sense of "if it is capable of 50 MB/s then it should 
operate 
> > at this speed. Constantly."
> 
> Hard disks don't do this. They support operations/second based upon
> physical and rotational latency constraints, vibration levels, mechanism,
> internal layout policy and the need to do housekeeping. 

Well they do. The Flash disk I have (SATA-I) is capable of 48 MB/s and this 
value is reached over the whole disk size by windows as well as by FreeBSD. 
See my test results in the first thread.
My Seagate Barracuda Harddisk drive (SATA-II) starts with 76 MB/s and 
decreases linearly to 35 MB/s due to the fact that it has to write to a 
rotating disk. But on a flash disk there is nothing rotating...

So where is the difference between SATA-I and SATA-II ?
And why is FreeBSD able to write with constant rates (the complete 25 GB, all 
with 48+/-0.1 MB/s) but Linux 2.6.18 not ?

> 
> If you have an ATA7 drive with suitable firmware sets you can talk to it
> directly via the SG_IO interface and use the streaming feature set which
> is quite different to filesystem type operations and lets you ask the
> drive to do this sort of stuff - if you can find any general PC firmware
> ones that support it anyway.
> 
> I'm not sure you'll get 50MB/sec sustained to work although you might
> with a good current drive used for nothing else, a linear stream of data
> (no seeking and file system overhead), and a non PCI controller (PCI
> Express, host chipset bus etc). 

With a dedicated (rotating) SATA II device, using the first 70% of disk space 
no problem -- tested ! With a SATA-I device only a problem with Linux 2.6.18
> 
> If you are using a file system then the more you fsync the more I'd
> expect you to see stalling as you keep draining whats effectively an 8MB
> plus pipeline on a modern drive precisely because fsync does "hitting
> disk" guarantees. You also want to be sure you are not journalling data.

That is true. Thus i do the sync only after every 1GB of written data. That is 
not to often in my eyes...
Journaling of data: you are right, ext2 performs better than ext3.


Martin
> 
> Alan
> 
> 
> 

-- 
Dipl. Physiker
Martin Anton Fink
Max Planck Institute for extraterrestrial Physics
Giessenbachstrasse
85741 Garching
Germany
Tel. +49-(0)89-30000-3645
Fax. +49-(0)89-30000-3569

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13  9:25           ` Martin A. Fink
@ 2007-02-13 10:08             ` Arjan van de Ven
  2007-02-13 11:18               ` Andi Kleen
                                 ` (2 more replies)
  2007-02-13 10:16             ` Matthias Schniedermeyer
  1 sibling, 3 replies; 33+ messages in thread
From: Arjan van de Ven @ 2007-02-13 10:08 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: Matthias Schniedermeyer, linux-kernel

> > 
> The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has 
> all drivers but access to harddisk is unpredictable and thus unreliable!
> What can I do??


there's several tunables you can do;
1) increase /sys/block/<device>/queue/nr_requests
   the linux default is on the low side
2) investigate other elevators; cfq is great for interactive use but not
so great for max throughput. you can do this by echo'ing "deadline"
into /sys/block/<device>/scheduler
3) make sure ext3 is set to "data=writeback"; the default journalling
mode is very strict, fine for smallish files but for multi-gigabyte
it'll start to hurt
4) try to use iostat -x /dev/<foo>  1 to see what values avg-rq and
avg-qu are.. avg-rq should be at least several hundred if not more.
5) echo a larger value into /sys/block/<device>/queue/max_sectors_kb
   the default seems to be 512 which is... really low. The hw max is in
   another file in that directory; if you want max throughput set the
   max_sectors_kb value to the hw max. (you pay in terms of fairness for
   this; it's the eternal fairness/latency versus throughput tradeoff)




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13  9:25           ` Martin A. Fink
  2007-02-13 10:08             ` Arjan van de Ven
@ 2007-02-13 10:16             ` Matthias Schniedermeyer
  2007-02-13 10:29               ` Martin A. Fink
  1 sibling, 1 reply; 33+ messages in thread
From: Matthias Schniedermeyer @ 2007-02-13 10:16 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

Martin A. Fink wrote:
> Am Dienstag, 13. Februar 2007 00:31 schrieben Sie:
>> Martin A. Fink wrote:
>>> I have to store big amounts of data coming from 2 digital cameras to disk. 
>>> Thus I have to write blocks of around 1 MB at 30 to 50 frames per second 
>>> for 
>>> a long period of time. So it is important for me that the harddisk drive 
>>> is 
>>> reliable in the sense of "if it is capable of 50 MB/s then it should 
>>> operate 
>>> at this speed. Constantly."
>> The good old handful of suggestions:
>>
>> - Use a dedicated disc for the task.
> 
> I used a dedicated disk for this task. No one else besides the task is writing 
> to it!

OK.

>> - Use an empty disc so there is no fragmentation.
> 
> All tests were performed on empty disk!

OK.

>> - Buy a bigger disk, they have high bandwidths.
> 
> I have a flash disk from a manufacturer who grants me 48 MB/s. And FreeBSD as 
> well as Windows reach this value. Only Linux 2.6.18 is far away from it (42 
> MB/s)

Even 48MB/s is quite low.
I've reached up to 70MB/s with a single 500GB Seagate model and even my older HDDs all reach 60MB/s (at least on the outer cylinders)
But i haven't tested any "sync/fsync" in between, only after.

>> - Buy a more "specialized" disc.
> 
> see above
> 
>>   for e.x.: Western Digital Raptor X(*) a 150GB, 10-KRPM S-ATA disc.
>> - Buy several discs and use RAID 0
>>   or alternate between discs when writing.
> 
> What I have to build is an application for the International Space Station 
> ISS. I am limited with power and space. So If the disk is able to write 
> constantly 48 MB/s then the Operating System should do this!

OK. That appears to be a serious constraint.
Do HDDs cope well with zero gravity?
At least the SSD won't have a problem with that. ;-)

> The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has 
> all drivers but access to harddisk is unpredictable and thus unreliable!
> What can I do??

Personally i haven't had such bad write speeds in years. Taking USB connected and/or encrypted partitions aside.
But on the other hand: I don't sync(fsync) until i have to.
And personally i have good (and constant bandwidth) experience using XFS as a filesystem.
(I have 41 HDDs with a total capacity of 10.5 TB, performance is quite important for me.)

Also you have skipped the information how the images "arrive" on the system (PCI(e) card?), that may be important for an "end to end" view of the problem.

And what's also missing. What is "a long period of time".
Calculating best-case with the SSD:
27GB divided by 30MB/s only gives a bit more than 15 Minutes.
And worst case with 50MB/s is less than 10 Minutes.





-- 
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 11:18               ` Andi Kleen
@ 2007-02-13 10:25                 ` Arjan van de Ven
  0 siblings, 0 replies; 33+ messages in thread
From: Arjan van de Ven @ 2007-02-13 10:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Martin A. Fink, Matthias Schniedermeyer, linux-kernel

On Tue, 2007-02-13 at 12:18 +0100, Andi Kleen wrote:
> Arjan van de Ven <arjan@infradead.org> writes:
> 
> > > > 
> > > The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has 
> > > all drivers but access to harddisk is unpredictable and thus unreliable!
> > > What can I do??
> > 
> > 
> > there's several tunables you can do;
> 
> [...] Well Linux certainly should perform better out of the box
> on such a simple configuration.

no argument from me there; first need to find out which piece is wrong
> 
> Something is wrong especially when the CPU usage is so high.

I'll buy that, yet there's plenty of cpu time available so that
shouldn't be all that much of a limit on the throughput... there's still
headroom

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 10:16             ` Matthias Schniedermeyer
@ 2007-02-13 10:29               ` Martin A. Fink
  2007-02-13 12:04                 ` Jörn Engel
  2007-02-13 12:24                 ` Matthias Schniedermeyer
  0 siblings, 2 replies; 33+ messages in thread
From: Martin A. Fink @ 2007-02-13 10:29 UTC (permalink / raw)
  To: Matthias Schniedermeyer, linux-kernel

Am Dienstag, 13. Februar 2007 11:16 schrieben Sie:
> Martin A. Fink wrote:
> > Am Dienstag, 13. Februar 2007 00:31 schrieben Sie:
> >> Martin A. Fink wrote:
> >>> I have to store big amounts of data coming from 2 digital cameras to 
disk. 
> >>> Thus I have to write blocks of around 1 MB at 30 to 50 frames per second 
> >>> for 
> >>> a long period of time. So it is important for me that the harddisk drive 
> >>> is 
> >>> reliable in the sense of "if it is capable of 50 MB/s then it should 
> >>> operate 
> >>> at this speed. Constantly."
> >> The good old handful of suggestions:
> >>
> >> - Use a dedicated disc for the task.
> > 
> > I used a dedicated disk for this task. No one else besides the task is 
writing 
> > to it!
> 
> OK.
> 
> >> - Use an empty disc so there is no fragmentation.
> > 
> > All tests were performed on empty disk!
> 
> OK.
> 
> >> - Buy a bigger disk, they have high bandwidths.
> > 
> > I have a flash disk from a manufacturer who grants me 48 MB/s. And FreeBSD 
as 
> > well as Windows reach this value. Only Linux 2.6.18 is far away from it 
(42 
> > MB/s)
> 
> Even 48MB/s is quite low.
> I've reached up to 70MB/s with a single 500GB Seagate model and even my 
older HDDs all reach 60MB/s (at least on the outer cylinders)
> But i haven't tested any "sync/fsync" in between, only after.

Please Read Carefully! I talk about flash disk, not normal harddisks. There 
are no mechanical parts in flash disks, only flash memory. And therefore 
48MB/s is excellent (compared to all other available disks)

> 
> >> - Buy a more "specialized" disc.
> > 
> > see above
> > 
> >>   for e.x.: Western Digital Raptor X(*) a 150GB, 10-KRPM S-ATA disc.
> >> - Buy several discs and use RAID 0
> >>   or alternate between discs when writing.
> > 
> > What I have to build is an application for the International Space Station 
> > ISS. I am limited with power and space. So If the disk is able to write 
> > constantly 48 MB/s then the Operating System should do this!
> 
> OK. That appears to be a serious constraint.
> Do HDDs cope well with zero gravity?

Yes and no. Yes: standard desktop HDDs are unproblematic. Laptop HDDs have 
g-force shock hardware that works on zero-g detection and thus Laptop HDDs 
can't be used in space. At least modern ones can't...

> At least the SSD won't have a problem with that. ;-)
> 
> > The problem is: FreeBSD is fast, but lacks of some special drivers. Linux 
has 
> > all drivers but access to harddisk is unpredictable and thus unreliable!
> > What can I do??
> 
> Personally i haven't had such bad write speeds in years. Taking USB 
connected and/or encrypted partitions aside.
> But on the other hand: I don't sync(fsync) until i have to.

If you don't have to - no problem. But if you use filesystem you do a fsync 
every time you close the file (and filesize is less then 1-2 GB)
> And personally i have good (and constant bandwidth) experience using XFS as 
a filesystem.
> (I have 41 HDDs with a total capacity of 10.5 TB, performance is quite 
important for me.)
> 
> Also you have skipped the information how the images "arrive" on the system 
(PCI(e) card?), that may be important for an "end to end" view of the 
problem.

Images arrive via Gigabit Ethernet. GigE Vision standard. (PCIe x4)
> 
> And what's also missing. What is "a long period of time".
> Calculating best-case with the SSD:
> 27GB divided by 30MB/s only gives a bit more than 15 Minutes.
> And worst case with 50MB/s is less than 10 Minutes.

Well. The testdrive has 27GB. The final drive will have 225 GB. And there will 
be 3 cameras and thus 3 disks. This means we talk about 140 MB/s for around 
90 minutes.
For space applications with low power but high performance this is a long 
time... ;-)
> 
> 
> 
> 
> 
> -- 
> Real Programmers consider "what you see is what you get" to be just as
> bad a concept in Text Editors as it is in women. No, the Real Programmer
> wants a "you asked for it, you got it" text editor -- complicated,
> cryptic, powerful, unforgiving, dangerous.
> 
> 

-- 
Dipl. Physiker
Martin Anton Fink
Max Planck Institute for extraterrestrial Physics
Giessenbachstrasse
85741 Garching
Germany
Tel. +49-(0)89-30000-3645
Fax. +49-(0)89-30000-3569

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 10:08             ` Arjan van de Ven
@ 2007-02-13 11:18               ` Andi Kleen
  2007-02-13 10:25                 ` Arjan van de Ven
  2007-02-13 11:27               ` Alan
  2007-02-13 19:54               ` Jeffrey Hundstad
  2 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2007-02-13 11:18 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Martin A. Fink, Matthias Schniedermeyer, linux-kernel

Arjan van de Ven <arjan@infradead.org> writes:

> > > 
> > The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has 
> > all drivers but access to harddisk is unpredictable and thus unreliable!
> > What can I do??
> 
> 
> there's several tunables you can do;

[...] Well Linux certainly should perform better out of the box
on such a simple configuration.

Something is wrong especially when the CPU usage is so high.

That is why I suggested oprofile. Perhaps contact linux-ide@vger.kernel.org
(if the results show driver problems) and linux-mm@kvack.org (otherwise) 
with the results.

-Andi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13  9:34           ` Martin A. Fink
@ 2007-02-13 11:25             ` Alan
  2007-02-13 12:32               ` Martin A. Fink
  2007-02-13 17:12               ` Jeff Garzik
  0 siblings, 2 replies; 33+ messages in thread
From: Alan @ 2007-02-13 11:25 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

> Well they do. The Flash disk I have (SATA-I) is capable of 48 MB/s and this 
> value is reached over the whole disk size by windows as well as by FreeBSD. 
> See my test results in the first thread.

Ok a flash disk should be more stable

> My Seagate Barracuda Harddisk drive (SATA-II) starts with 76 MB/s and 
> decreases linearly to 35 MB/s due to the fact that it has to write to a 
> rotating disk. But on a flash disk there is nothing rotating...

The hard disk one isn't guaranteed or stable but the flash especially if
it is aimed at it ought to behave.

> So where is the difference between SATA-I and SATA-II ?

All physical side if they are on the same controller when you do the
tests. Mostly latency,

> And why is FreeBSD able to write with constant rates (the complete 25 GB, all 
> with 48+/-0.1 MB/s) but Linux 2.6.18 not ?

Does the FreeBSD fsync sync to media ? Also what controller is being used
here, and do you have EHCI USB support running ?

> With a dedicated (rotating) SATA II device, using the first 70% of disk space 
> no problem -- tested ! With a SATA-I device only a problem with Linux 2.6.18

I suspect the SATA-1 itself may not be the decider but something else -
eg the hard disk using NCQ, which would cover up any latency related
problems.

> Journaling of data: you are right, ext2 performs better than ext3.

And ext3 in writeback mode ought in theory (but practice is always
harder ;)) be faster than ext2.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 10:08             ` Arjan van de Ven
  2007-02-13 11:18               ` Andi Kleen
@ 2007-02-13 11:27               ` Alan
  2007-02-13 11:59                 ` Jörn Engel
  2007-02-13 19:54               ` Jeffrey Hundstad
  2 siblings, 1 reply; 33+ messages in thread
From: Alan @ 2007-02-13 11:27 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Martin A. Fink, Matthias Schniedermeyer, linux-kernel

> there's several tunables you can do;
> 1) increase /sys/block/<device>/queue/nr_requests
>    the linux default is on the low side
> 5) echo a larger value into /sys/block/<device>/queue/max_sectors_kb
>    the default seems to be 512 which is... really low. The hw max is in
>    another file in that directory; if you want max throughput set the
>    max_sectors_kb value to the hw max. (you pay in terms of fairness for


There are two more factors that play into #1 and #5. Firstly there is a
per command completion overhead in ATA without NCQ being active and that
isn't yet a heavily optimised libata path. Secondly erase block size
matters with flash drives so the bigger each I/O the better erase block
behaviour we should get.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 11:27               ` Alan
@ 2007-02-13 11:59                 ` Jörn Engel
  0 siblings, 0 replies; 33+ messages in thread
From: Jörn Engel @ 2007-02-13 11:59 UTC (permalink / raw)
  To: Alan
  Cc: Arjan van de Ven, Martin A. Fink, Matthias Schniedermeyer, linux-kernel

On Tue, 13 February 2007 11:27:58 +0000, Alan wrote:
> 
> isn't yet a heavily optimised libata path. Secondly erase block size
> matters with flash drives so the bigger each I/O the better erase block
> behaviour we should get.

Although that should max out somewhere between 16KiB and 128KiB,
depending on the chips being used.

Jörn

-- 
If you're willing to restrict the flexibility of your approach,
you can almost always do something better.
-- John Carmack

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 10:29               ` Martin A. Fink
@ 2007-02-13 12:04                 ` Jörn Engel
  2007-02-13 12:24                 ` Matthias Schniedermeyer
  1 sibling, 0 replies; 33+ messages in thread
From: Jörn Engel @ 2007-02-13 12:04 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: Matthias Schniedermeyer, linux-kernel

On Tue, 13 February 2007 11:29:18 +0100, Martin A. Fink wrote:
> 
> Please Read Carefully! I talk about flash disk, not normal harddisks. There 
> are no mechanical parts in flash disks, only flash memory. And therefore 
> 48MB/s is excellent (compared to all other available disks)
> 
> [...]
> 
> Well. The testdrive has 27GB. The final drive will have 225 GB. And there will 
> be 3 cameras and thus 3 disks. This means we talk about 140 MB/s for around 
> 90 minutes.

Do you have any numbers on the performance for the final drive?  Single
flash chips are relatively slow, the high bandwidth is usually achieved
by writing in parallel to several of them.  With the bigger drive you
get more chips and the manufacturer could run more of them in parallel.

Jörn

-- 
With a PC, I always felt limited by the software available. On Unix,
I am limited only by my knowledge.
-- Peter J. Schoenster

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 10:29               ` Martin A. Fink
  2007-02-13 12:04                 ` Jörn Engel
@ 2007-02-13 12:24                 ` Matthias Schniedermeyer
  2007-02-13 12:49                   ` Martin A. Fink
  1 sibling, 1 reply; 33+ messages in thread
From: Matthias Schniedermeyer @ 2007-02-13 12:24 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

Martin A. Fink wrote:

>> Also you have skipped the information how the images "arrive" on the system 
> (PCI(e) card?), that may be important for an "end to end" view of the 
> problem.
> 
> Images arrive via Gigabit Ethernet. GigE Vision standard. (PCIe x4)

The the next question is: ChipSet/Used Protocol/JumboFrames/(NAPI)/... .

Have you already determined the load caused by this part?
Depending on the GigE-Chipset, and Protocol/JumboFrames/(NAPI)/..., the involved overhead can be quite serious.

>> And what's also missing. What is "a long period of time".
>> Calculating best-case with the SSD:
>> 27GB divided by 30MB/s only gives a bit more than 15 Minutes.
>> And worst case with 50MB/s is less than 10 Minutes.
> 
> Well. The testdrive has 27GB. The final drive will have 225 GB. And there will 
> be 3 cameras and thus 3 disks. This means we talk about 140 MB/s for around 
> 90 minutes.
> For space applications with low power but high performance this is a long 
> time... ;-)

The MB/CPU/RAM will be the one specified in the first mail?
My gut feeling says: Forget it.

The needed total bandwidth may be to high and at least the incoming part via GigE may have serious overhead.
150MB/s in via (at least 2) GigE, without Zero-Copy there is another 150MB/s memory to memory.
Then there is the next 150MB/s memory to the discs, without Zero-Copy there also another 150MB/s memory to memory.
In total that's 300MB/s to 600MB/s without any processing.

But on the other hand, hdparm -T says my system (Core2Duo E6700, FSB1066, 2GB DDR2-800 RAM, 32Bit) has a buffer-cache bandwidth around 4000MB/s.
As you don't said which FSB and Memory-Type you have i would guess that your system should reach between 2000MB/s and 3500MB/s of LINEAR(!) memory bandwidth.
(Total usable Memory-Bandwidth is unfortunately also dependent on usage pattern. Large & linear is not as important as with a rotating HDD, but it factors in)



Btw. On the topic of filesystem and Linux performance:
SGI did a "really big" test some time ago width a big iron having 24 Itanium2-CPUs in 12 nodes, and 12*2 GB of ram and having 256 discs using XFS(Which is from SGI!).
The pdf-file is here:
http://oss.sgi.com/projects/xfs/papers/ols2006/ols-2006-paper.pdf

According the the paper the system had a theoretical peak IO-performance of 11.5 GB/s and practically peaked at 10.7GB/s reading and 8.9GB/s writing.
IOW Linux and XFS CAN perform quite well, but the system has to have enough muscle for the job.
And since the paper (and Kernel 2.6.5) the development of Linux hasn't stopped.



-- 
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 11:25             ` Alan
@ 2007-02-13 12:32               ` Martin A. Fink
  2007-02-13 14:47                 ` Theodore Tso
  2007-02-13 17:12               ` Jeff Garzik
  1 sibling, 1 reply; 33+ messages in thread
From: Martin A. Fink @ 2007-02-13 12:32 UTC (permalink / raw)
  To: Alan, linux-kernel

Am Dienstag, 13. Februar 2007 12:25 schrieben Sie:
> > Well they do. The Flash disk I have (SATA-I) is capable of 48 MB/s and 
this 
> > value is reached over the whole disk size by windows as well as by 
FreeBSD. 
> > See my test results in the first thread.
> 
> Ok a flash disk should be more stable
> 
> > My Seagate Barracuda Harddisk drive (SATA-II) starts with 76 MB/s and 
> > decreases linearly to 35 MB/s due to the fact that it has to write to a 
> > rotating disk. But on a flash disk there is nothing rotating...
> 
> The hard disk one isn't guaranteed or stable but the flash especially if
> it is aimed at it ought to behave.
> 
> > So where is the difference between SATA-I and SATA-II ?
> 
> All physical side if they are on the same controller when you do the
> tests. Mostly latency,
> 
> > And why is FreeBSD able to write with constant rates (the complete 25 GB, 
all 
> > with 48+/-0.1 MB/s) but Linux 2.6.18 not ?
> 
> Does the FreeBSD fsync sync to media ? Also what controller is being used
> here, and do you have EHCI USB support running ?
Manual of FreeBSD fsync says it syncs to media.

I used the same controller: Same computer, same harddisk. two partitions on 
the system disk, one for linux, one for freebsd.

EHCI:

ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: Product: EHCI Host Controller

AHCI

ahci 0000:00:1f.2: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode

> 
> > With a dedicated (rotating) SATA II device, using the first 70% of disk 
space 
> > no problem -- tested ! With a SATA-I device only a problem with Linux 
2.6.18
> 
> I suspect the SATA-1 itself may not be the decider but something else -
> eg the hard disk using NCQ, which would cover up any latency related
> problems.
> 
> > Journaling of data: you are right, ext2 performs better than ext3.
> 
> And ext3 in writeback mode ought in theory (but practice is always
> harder ;)) be faster than ext2.
> 
> 

-- 
Dipl. Physiker
Martin Anton Fink
Max Planck Institute for extraterrestrial Physics
Giessenbachstrasse
85741 Garching
Germany
Tel. +49-(0)89-30000-3645
Fax. +49-(0)89-30000-3569

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 12:24                 ` Matthias Schniedermeyer
@ 2007-02-13 12:49                   ` Martin A. Fink
  2007-02-13 13:53                     ` Matthias Schniedermeyer
  0 siblings, 1 reply; 33+ messages in thread
From: Martin A. Fink @ 2007-02-13 12:49 UTC (permalink / raw)
  To: Matthias Schniedermeyer, linux-kernel

Am Dienstag, 13. Februar 2007 13:24 schrieben Sie:
> Martin A. Fink wrote:
> 
> >> Also you have skipped the information how the images "arrive" on the 
system 
> > (PCI(e) card?), that may be important for an "end to end" view of the 
> > problem.
> > 
> > Images arrive via Gigabit Ethernet. GigE Vision standard. (PCIe x4)
> 
> The the next question is: ChipSet/Used Protocol/JumboFrames/(NAPI)/... .
> 
> Have you already determined the load caused by this part?
> Depending on the GigE-Chipset, and Protocol/JumboFrames/(NAPI)/..., the 
involved overhead can be quite serious.
> 
> >> And what's also missing. What is "a long period of time".
> >> Calculating best-case with the SSD:
> >> 27GB divided by 30MB/s only gives a bit more than 15 Minutes.
> >> And worst case with 50MB/s is less than 10 Minutes.
> > 
> > Well. The testdrive has 27GB. The final drive will have 225 GB. And there 
will 
> > be 3 cameras and thus 3 disks. This means we talk about 140 MB/s for 
around 
> > 90 minutes.
> > For space applications with low power but high performance this is a long 
> > time... ;-)
> 
> The MB/CPU/RAM will be the one specified in the first mail?
> My gut feeling says: Forget it.
> 
> The needed total bandwidth may be to high and at least the incoming part via 
GigE may have serious overhead.
> 150MB/s in via (at least 2) GigE, without Zero-Copy there is another 150MB/s 
memory to memory.
> Then there is the next 150MB/s memory to the discs, without Zero-Copy there 
also another 150MB/s memory to memory.
> In total that's 300MB/s to 600MB/s without any processing.

I dont understand your calculation: from 3 GE ports come around 50 MB/each. 
These altogether 150MB/s have to be copied to memory. From there they will be 
copied to disk. So we talk about 2x150 MB/s running through my system. That 
is less than 2 PCIe lanes can handle... And there are more than 2 lanes 
between north and south bridge....
> 
> But on the other hand, hdparm -T says my system (Core2Duo E6700, FSB1066, 
2GB DDR2-800 RAM, 32Bit) has a buffer-cache bandwidth around 4000MB/s.
> As you don't said which FSB and Memory-Type you have i would guess that your 
system should reach between 2000MB/s and 3500MB/s of LINEAR(!) memory 
bandwidth.
> (Total usable Memory-Bandwidth is unfortunately also dependent on usage 
pattern. Large & linear is not as important as with a rotating HDD, but it 
factors in)
> 
> 
> 
> Btw. On the topic of filesystem and Linux performance:
> SGI did a "really big" test some time ago width a big iron having 24 
Itanium2-CPUs in 12 nodes, and 12*2 GB of ram and having 256 discs using 
XFS(Which is from SGI!).
> The pdf-file is here:
> http://oss.sgi.com/projects/xfs/papers/ols2006/ols-2006-paper.pdf
> 
> According the the paper the system had a theoretical peak IO-performance of 
11.5 GB/s and practically peaked at 10.7GB/s reading and 8.9GB/s writing.
> IOW Linux and XFS CAN perform quite well, but the system has to have enough 
muscle for the job.
> And since the paper (and Kernel 2.6.5) the development of Linux hasn't 
stopped.
> 
> 
> 
> -- 
> Real Programmers consider "what you see is what you get" to be just as
> bad a concept in Text Editors as it is in women. No, the Real Programmer
> wants a "you asked for it, you got it" text editor -- complicated,
> cryptic, powerful, unforgiving, dangerous.
> 
> 

-- 
Dipl. Physiker
Martin Anton Fink
Max Planck Institute for extraterrestrial Physics
Giessenbachstrasse
85741 Garching
Germany
Tel. +49-(0)89-30000-3645
Fax. +49-(0)89-30000-3569

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 12:49                   ` Martin A. Fink
@ 2007-02-13 13:53                     ` Matthias Schniedermeyer
  0 siblings, 0 replies; 33+ messages in thread
From: Matthias Schniedermeyer @ 2007-02-13 13:53 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

Martin A. Fink wrote:
>> The needed total bandwidth may be to high and at least the incoming part via 
> GigE may have serious overhead.
>> 150MB/s in via (at least 2) GigE, without Zero-Copy there is another 150MB/s 
> memory to memory.
>> Then there is the next 150MB/s memory to the discs, without Zero-Copy there 
> also another 150MB/s memory to memory.
>> In total that's 300MB/s to 600MB/s without any processing.
> 
> I dont understand your calculation: from 3 GE ports come around 50 MB/each. 
> These altogether 150MB/s have to be copied to memory. From there they will be 
> copied to disk. So we talk about 2x150 MB/s running through my system. That 
> is less than 2 PCIe lanes can handle... And there are more than 2 lanes 
> between north and south bridge....

It may be that the TCP/IP-Stack has to copy the data around. But someone that knows the inner workings would have to answer this.
That may also depend on the used NIC.

Also the data doesn't appear 'en bloc', but arrives over a period of time, so you have more or less big "gaps" in the processing.

Especially the "gaps" can considerably lower total achievable bandwidth.

A little naive fallacy (According to dict.leo.org a translation for: Milchmädchenrechnung):
You get a package of work every (say) 1ms and you (say) need .2ms for processing, shoveling and writing to disc.
Then there is no way you can saturate more than 1/5 of total theoretical bandwidth, because 80% of the time you are waiting for more work to come.



-- 
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 12:32               ` Martin A. Fink
@ 2007-02-13 14:47                 ` Theodore Tso
  2007-02-13 15:03                   ` Alan
  0 siblings, 1 reply; 33+ messages in thread
From: Theodore Tso @ 2007-02-13 14:47 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: Alan, linux-kernel

On Tue, Feb 13, 2007 at 01:32:34PM +0100, Martin A. Fink wrote:
> > Does the FreeBSD fsync sync to media ? Also what controller is being used
> > here, and do you have EHCI USB support running ?
>
> Manual of FreeBSD fsync says it syncs to media.

That didn't answer the question.  With SATA in particular, just
because you flush it to the *disk*, doesn't mean that you've flushed
it to the *media*, unless the OS is explicitly giving an command to
the disk to do so.  If you haven't done any tests where you sync a
huge amount of data on FreeBSD, and then immediate manually kick the
power plug out of the wall, and then checking to make sure all of the
data actually did make it to the media, I wouldn't necessary assume
that it has.  Given that it sounds like you really care about this,
I'd suggest that you explicitly testing this before making
assumptions.

							- Ted

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 14:47                 ` Theodore Tso
@ 2007-02-13 15:03                   ` Alan
  0 siblings, 0 replies; 33+ messages in thread
From: Alan @ 2007-02-13 15:03 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Martin A. Fink, linux-kernel

> data actually did make it to the media, I wouldn't necessary assume
> that it has.  Given that it sounds like you really care about this,
> I'd suggest that you explicitly testing this before making
> assumptions.

FreeBSD 6.1 appears to get it right for some subsets of devices so it
seems a reasonable assumption at first glance - I did actually look the
BSD bits up to check.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 11:25             ` Alan
  2007-02-13 12:32               ` Martin A. Fink
@ 2007-02-13 17:12               ` Jeff Garzik
  1 sibling, 0 replies; 33+ messages in thread
From: Jeff Garzik @ 2007-02-13 17:12 UTC (permalink / raw)
  To: Alan; +Cc: Martin A. Fink, linux-kernel

On Tue, Feb 13, 2007 at 11:25:27AM +0000, Alan wrote:
> > So where is the difference between SATA-I and SATA-II ?
> 
> All physical side if they are on the same controller when you do the
> tests. Mostly latency,

SATA-II is a highly confusing marketing term.  It is /not/ a technical
term.

In some cases there are NO differences between SATA-I and SATA-II.  You
can find 1.5Gbps non-NCQ-supporting devices claiming SATA-II.

Similarly, there is no "SATA version" word in the IDENTIFY DEVICE page,
like there are "ATA version" words.

	Jeff




^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 16:37   ` Martin A. Fink
  2007-02-12 18:19     ` Stefan Richter
@ 2007-02-13 19:09     ` Jeff Carr
  1 sibling, 0 replies; 33+ messages in thread
From: Jeff Carr @ 2007-02-13 19:09 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

On 02/12/07 08:37, Martin A. Fink wrote:

> :~> strace -c -T -o trace.out dd if=/dev/zero of=test.txt bs=10MB count=200
> 
> 200+0 Datensätze ein
> 200+0 Datensätze aus
> 2000000000 bytes (2,0 GB) copied, 52,8632 seconds, 37,8 MB/s

You might want to check the raw write & read speed to the device
without a filesystem. Also, your previous email didn't include xfs.
xfs has very good sustained write performance.

dd if=/dev/zero of=/dev/sdX bs=10MB count=200
dd of=/dev/null if=/dev/sdX bs=10MB count=200

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-13 10:08             ` Arjan van de Ven
  2007-02-13 11:18               ` Andi Kleen
  2007-02-13 11:27               ` Alan
@ 2007-02-13 19:54               ` Jeffrey Hundstad
  2 siblings, 0 replies; 33+ messages in thread
From: Jeffrey Hundstad @ 2007-02-13 19:54 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Martin A. Fink, Matthias Schniedermeyer, linux-kernel

Arjan van de Ven wrote:
>> The problem is: FreeBSD is fast, but lacks of some special drivers. Linux has 
>> all drivers but access to harddisk is unpredictable and thus unreliable!
>> What can I do??
>>     
>
>
> there's several tunables you can do;
> 1) increase /sys/block/<device>/queue/nr_requests
>    the linux default is on the low side
> 2) investigate other elevators; cfq is great for interactive use but not
> so great for max throughput. you can do this by echo'ing "deadline"
> into /sys/block/<device>/scheduler
>   

I'd suggest trying the noop scheduler with your ram based devices.  I 
don't see why these devices would need clever scheduling.  ...but prove 
me wrong if you will.  I haven't tested this.

echo noop > /sys/block/<device>/queue/scheduler

If you don't need journaling EXT2 might be a good choice.  But, I'd also 
like to re-iterate the XFS filesystem recommendation given several times 
now as well.  There are many tunables that /may/ help during filesystem 
creation.  Block size (-b) set to it's maximum would prob. help.

If you're sure you can not encounter power issues:
mount -t xfs -o nobarrier /dev/<device> /mount-point

Here's some more general reading for ya:
Troubleshooting Linux Performance Issues:
http://www.phptr.com/articles/article.asp?p=481867&seqNum=2&rl=1

-- 
Jeffrey Hundstad

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: SATA-performance: Linux vs. FreeBSD
  2007-02-12 14:02 SATA-performance: Linux vs. FreeBSD Martin A. Fink
  2007-02-12 17:04 ` Andi Kleen
@ 2007-02-15  5:48 ` Tejun Heo
  1 sibling, 0 replies; 33+ messages in thread
From: Tejun Heo @ 2007-02-15  5:48 UTC (permalink / raw)
  To: Martin A. Fink; +Cc: linux-kernel

Hello, Martin.

Martin A. Fink wrote:
> Test					OpenSuSE(AHCI)			FreeBSD(AHCI)
> ---------------------------------------------------------------------------------------------------------------------------------------
> SSD(vfat 25GB)			41+/-2 MB/s at 4-10%		15+/-0 MB/s at 2% CPU
> SSD(raw  25GB) 		26+/-1 MB/s at 4-10% CPU	48+/-0 MB/s at 1% CPU
> SSD(ext3 25GB)		39+/-5 MB/s at 10-15% CPU	34+/-0 MB/s at 14% CPU
> SSD(ext2 25GB)		42+/-1 MB/s at 10-15% CPU	32+/-0 MB/s at 10% CPU
> ---------------------------------------------------------------------------------------------------------------------------------------
> 
> Test					OpenSuSE (AHCI off)		FreeBSD (AHCI off)
> ---------------------------------------------------------------------------------------------------------------------------------------
> SSD(vfat 25GB)			22+/-4 MB/s at 6-19% CPU	--
> SSD(raw  25GB)		33+/-4 MB/s at 7-14% CPU	41+/-0 MB/s at 1% CPU
> SSD(ext2 25GB)		27+/-6 MB/s at 6-14% CPU	--
> ---------------------------------------------------------------------------------------------------------------------------------------
> 
> Question 1:
> Can anybody explain to me, why writing to a SATA-I device with AHCI consumes 
> so much CPU time using Linux, while it takes almost no CPU time on FreeBSD 
> 6.2 ? Especially comparing values of writing to the raw device?

Can't tell.  AHCI needs very few MMIOs to perform each request.  As Andi 
suggested, please do oprofile.  It's easy.

> Question 2:
> Can anybody explain to me, why writing to a solid state disk (a kind of memory 
> that always has the same constant bandwidth) has such big standard errors in 
> writing rate using Linux (between 1 to 6 MB/s error) while FreeBSD gives an 
> almost constant writing rate (as one would expect it for a SSD) ?

The default iosched is heavily optimized for regular disks with moving 
head and for more usual workload.  Requests are sometimes paused to wait 
for requests in adjacent area.  Use deadline or noop for ssd.

Also, try turn off NCQ.  Some of early drives from major disk vendors 
had all kinds of issues with NCQ implementation.  SSD firmwares don't 
tend to be of high quality.

> Question 3:
> Why is writing to a raw device in Linux slower than using e.g. ext2 ? And why 
> is Linux writing rate much lower (-12.5 % for the best case) compared to 
> writing rate of FreeBSD?

As written above, the first thing I can think of is interaction with 
iosched.  SSD and your workload are pretty unusual.

> Question 4:
> When writing to the SATA-II HDD Linux is around 10% slower than FreeBSD when 
> using ext3, but around as fast as FreeBSD when writing raw. Why?

Dunno much about that.  Where's the test result?

> How can I improve the speed of Linux,

Other ppl have pointed out but use /dev/sdX not the raw devices.  If you 
use raw, you end up writing each chunk synchronously.

-- 
tejun


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2007-02-15 16:03 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-12 14:02 SATA-performance: Linux vs. FreeBSD Martin A. Fink
2007-02-12 17:04 ` Andi Kleen
2007-02-12 16:27   ` Martin A. Fink
2007-02-12 18:41     ` Andi Kleen
2007-02-12 17:56       ` Martin A. Fink
2007-02-12 18:17         ` Ray Lee
2007-02-12 19:08         ` Alan
2007-02-12 20:34           ` Nigel Cunningham
2007-02-13  9:34           ` Martin A. Fink
2007-02-13 11:25             ` Alan
2007-02-13 12:32               ` Martin A. Fink
2007-02-13 14:47                 ` Theodore Tso
2007-02-13 15:03                   ` Alan
2007-02-13 17:12               ` Jeff Garzik
2007-02-12 23:31         ` Matthias Schniedermeyer
2007-02-13  9:25           ` Martin A. Fink
2007-02-13 10:08             ` Arjan van de Ven
2007-02-13 11:18               ` Andi Kleen
2007-02-13 10:25                 ` Arjan van de Ven
2007-02-13 11:27               ` Alan
2007-02-13 11:59                 ` Jörn Engel
2007-02-13 19:54               ` Jeffrey Hundstad
2007-02-13 10:16             ` Matthias Schniedermeyer
2007-02-13 10:29               ` Martin A. Fink
2007-02-13 12:04                 ` Jörn Engel
2007-02-13 12:24                 ` Matthias Schniedermeyer
2007-02-13 12:49                   ` Martin A. Fink
2007-02-13 13:53                     ` Matthias Schniedermeyer
2007-02-12 16:37   ` Martin A. Fink
2007-02-12 18:19     ` Stefan Richter
2007-02-13 19:09     ` Jeff Carr
2007-02-12 17:42   ` Martin A. Fink
2007-02-15  5:48 ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.