linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [POT] Which journalised filesystem uses Linus Torvalds ?
@ 2001-10-03 12:00 sebastien.cabaniols
  2001-10-03 12:39 ` [POT] Which journalised filesystem ? Rik van Riel
                   ` (7 more replies)
  0 siblings, 8 replies; 47+ messages in thread
From: sebastien.cabaniols @ 2001-10-03 12:00 UTC (permalink / raw)
  To: linux-kernel

Hello lkml,

With the availability of XFS,JFS,ext3 and ReiserFS I am a 
little
lost and I don't know which one I should use for entreprise 
class
servers.

In terms of intergration into the kernel, functionnalities, 
stability
and performance which one is the best for entreprise class 
servers

I guess the begining of the answer is: it depends... on what 
you are doing

So, what do you think if

I want a database server
or
a supercomputer (HPC use)
or
a Linux KDE/GNOME desktop

Thanks for your help, links and experience.


Sebastien CABANIOLS



"Ce message vous est envoyé par laposte.net - web : www.laposte.net/  minitel : 3615 LAPOSTENET (0,84 F TTC la minute)/ téléphone : 08 92 68 13 50 (2,21 F TTC la minute)"



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 12:00 [POT] Which journalised filesystem uses Linus Torvalds ? sebastien.cabaniols
@ 2001-10-03 12:39 ` Rik van Riel
  2001-10-03 12:54   ` Dave Jones
  2001-10-03 14:33 ` [POT] Which journalised filesystem uses Linus Torvalds ? Dave Cinege
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Rik van Riel @ 2001-10-03 12:39 UTC (permalink / raw)
  To: sebastien.cabaniols; +Cc: linux-kernel

On Wed, 3 Oct 2001, sebastien.cabaniols wrote:

> With the availability of XFS,JFS,ext3 and ReiserFS I am a little lost
> and I don't know which one I should use for entreprise class servers.

Personally I like ext3 a lot.  I've been using it for almost a
year now and it has never given me trouble.  In the theoretical
case where it would give me trouble, I'd have the very well
tested e2fsck utility to rescue me (ext2 and ext3 have the same
on-disk layout).

regards,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/  (volunteers needed)

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 12:39 ` [POT] Which journalised filesystem ? Rik van Riel
@ 2001-10-03 12:54   ` Dave Jones
  2001-10-03 13:00     ` Billy Harvey
                       ` (6 more replies)
  0 siblings, 7 replies; 47+ messages in thread
From: Dave Jones @ 2001-10-03 12:54 UTC (permalink / raw)
  To: Rik van Riel; +Cc: sebastien.cabaniols, linux-kernel

On Wed, 3 Oct 2001, Rik van Riel wrote:

> Personally I like ext3 a lot.  I've been using it for almost a
> year now and it has never given me trouble.

I've similar experiences with ext3, except for one bad instance
recently when I put it on my laptop. Lots of asserts were triggered,
and on reboot it couldn't find the journal, the superblock,
or the backup superblocks. I spent a few hours trying to get data
back, and eventually gave up and reformatted as ext2.

Alan mentioned this was something to do with the IBM hard disk
having strange write-cache properties that confuse ext3.
I'm not sure if this has been fixed or not yet, but its enough
to make me think twice about trying it on the vaio for a while.

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 12:54   ` Dave Jones
@ 2001-10-03 13:00     ` Billy Harvey
  2001-10-04 22:14       ` Alan Cox
  2001-10-03 13:01     ` Ragnar Kjørstad
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 47+ messages in thread
From: Billy Harvey @ 2001-10-03 13:00 UTC (permalink / raw)
  To: lk; +Cc: Dave Jones

On Wed, 2001-10-03 at 08:54, Dave Jones wrote:

> I've similar experiences with ext3, except for one bad instance
> recently when I put it on my laptop. Lots of asserts were triggered,
> and on reboot it couldn't find the journal, the superblock,
> or the backup superblocks. I spent a few hours trying to get data
> back, and eventually gave up and reformatted as ext2.
> 
> Alan mentioned this was something to do with the IBM hard disk
> having strange write-cache properties that confuse ext3.
> I'm not sure if this has been fixed or not yet, but its enough
> to make me think twice about trying it on the vaio for a while.
> 
> regards,
> 
> Dave.

I've been using ext3 on my ThinkPad (A20P) for about a month now with
nary the slightest problem.  I've even smoke tested it by shutting it
down in the middle of disk writes and it worked fine.

Billy




^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 12:54   ` Dave Jones
  2001-10-03 13:00     ` Billy Harvey
@ 2001-10-03 13:01     ` Ragnar Kjørstad
  2001-10-03 13:24       ` Dave Jones
  2001-10-03 17:51       ` Andrew Morton
  2001-10-03 15:34     ` André Dahlqvist
                       ` (4 subsequent siblings)
  6 siblings, 2 replies; 47+ messages in thread
From: Ragnar Kjørstad @ 2001-10-03 13:01 UTC (permalink / raw)
  To: Dave Jones; +Cc: Rik van Riel, sebastien.cabaniols, linux-kernel

On Wed, Oct 03, 2001 at 02:54:17PM +0200, Dave Jones wrote:
> Alan mentioned this was something to do with the IBM hard disk
> having strange write-cache properties that confuse ext3.
> I'm not sure if this has been fixed or not yet, but its enough
> to make me think twice about trying it on the vaio for a while.

If a disk is doing write-back caching, it's likely to break all
journaling filesystem and anything else that relies on write ordering.


-- 
Ragnar Kjørstad
Big Storage

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 13:01     ` Ragnar Kjørstad
@ 2001-10-03 13:24       ` Dave Jones
  2001-10-03 17:51       ` Andrew Morton
  1 sibling, 0 replies; 47+ messages in thread
From: Dave Jones @ 2001-10-03 13:24 UTC (permalink / raw)
  To: Ragnar Kjørstad; +Cc: Rik van Riel, sebastien.cabaniols, linux-kernel

On Wed, 3 Oct 2001, Ragnar Kjørstad wrote:

> If a disk is doing write-back caching, it's likely to break all
> journaling filesystem and anything else that relies on write ordering.

Yup, I know this *now* :-)
My point is that I had no idea the drive was doing write-caching.

hdparm only offers an option to set it to on/off, not to query it.
Just disabling it in a boot up script *might* be enough to make this
safe again, but I've not looked at the hdparm & IDE code, so this
is just a theory.

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
  2001-10-03 12:00 [POT] Which journalised filesystem uses Linus Torvalds ? sebastien.cabaniols
  2001-10-03 12:39 ` [POT] Which journalised filesystem ? Rik van Riel
@ 2001-10-03 14:33 ` Dave Cinege
  2001-10-03 14:48   ` Sean Hunter
  2001-10-03 16:54 ` Fabbione
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 47+ messages in thread
From: Dave Cinege @ 2001-10-03 14:33 UTC (permalink / raw)
  To: sebastien.cabaniols, linux-kernel

On Wednesday 03 October 2001 8:00, sebastien.cabaniols wrote:
> Hello lkml,
>
> With the availability of XFS,JFS,ext3 and ReiserFS I am a
> little
> lost and I don't know which one I should use for entreprise
> class
> servers.

I use Reiserfs on everything now, including a 13 drive Fiber Channel 
SAN with 3 hosts and multiple levels of Software RAID between them.

It is as fast as ext2, and in some case much faster. (IE rm 10K+ files in ~2 
seconds) FYI I Bonnie 70MB/s on 6 7200rpm drives in RAID 0. (64k blocks)

Keeping up with the 'best' reiserfs patch set can be a little bit of a
chore. (However it looks like we're coming to the end of that with 2.4.10)

Never used ext3. From what I did read about it, it didn't excite me.
The others I've yet to see a mature enough version to actually use, and 
considering Reiserfs, don't see a reason to try them.

Dave

-- 
The time is now 22:19 (Totalitarian)  -  http://www.ccops.org/clock.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
  2001-10-03 14:33 ` [POT] Which journalised filesystem uses Linus Torvalds ? Dave Cinege
@ 2001-10-03 14:48   ` Sean Hunter
  0 siblings, 0 replies; 47+ messages in thread
From: Sean Hunter @ 2001-10-03 14:48 UTC (permalink / raw)
  To: sebastien.cabaniols; +Cc: linux-kernel

I use ext3 on a couple of servers and a couple of laptops.  I think which fs is
best for you will depend enormously on the intended use of the machines and
your own expectations.  A mail and dns server that I operate running ext3 has
been very happy since conversion, and has definitely benefitted.

I feel I get the benefit of no more fscks and fast operations on "-o
sync"-mounted filesystems without (IMO) exposing the box to immature code that
you might see in less conservative "experimental" filesystem options.

I personally feel more comfortable with the stability and robustness criteria
of the ext3 developers than some others.  If you want a very fast filesystem or
one that handles very large numbers of files very well, your choice may well be
different from mine.

I quite like my filesystems to be boring. :)

Sean

On Wed, Oct 03, 2001 at 10:33:17AM -0400, Dave Cinege wrote:
> On Wednesday 03 October 2001 8:00, sebastien.cabaniols wrote:
> > Hello lkml,
> >
> > With the availability of XFS,JFS,ext3 and ReiserFS I am a
> > little
> > lost and I don't know which one I should use for entreprise
> > class
> > servers.
> 
> I use Reiserfs on everything now, including a 13 drive Fiber Channel 
> SAN with 3 hosts and multiple levels of Software RAID between them.
> 
> It is as fast as ext2, and in some case much faster. (IE rm 10K+ files in ~2 
> seconds) FYI I Bonnie 70MB/s on 6 7200rpm drives in RAID 0. (64k blocks)
> 
> Keeping up with the 'best' reiserfs patch set can be a little bit of a
> chore. (However it looks like we're coming to the end of that with 2.4.10)
> 
> Never used ext3. From what I did read about it, it didn't excite me.
> The others I've yet to see a mature enough version to actually use, and 
> considering Reiserfs, don't see a reason to try them.
> 
> Dave
> 
> -- 
> The time is now 22:19 (Totalitarian)  -  http://www.ccops.org/clock.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 12:54   ` Dave Jones
  2001-10-03 13:00     ` Billy Harvey
  2001-10-03 13:01     ` Ragnar Kjørstad
@ 2001-10-03 15:34     ` André Dahlqvist
  2001-10-04 21:25       ` Alan Cox
  2001-10-03 17:03     ` Matthias Andree
                       ` (3 subsequent siblings)
  6 siblings, 1 reply; 47+ messages in thread
From: André Dahlqvist @ 2001-10-03 15:34 UTC (permalink / raw)
  To: linux-kernel

Dave Jones <davej@suse.de> wrote:

> Alan mentioned this was something to do with the IBM hard disk
> having strange write-cache properties that confuse ext3.

Which IBM harddrive(s) does this? How can one check if it does?
-- 

André Dahlqvist <andre.dahlqvist@telia.com>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
  2001-10-03 12:00 [POT] Which journalised filesystem uses Linus Torvalds ? sebastien.cabaniols
  2001-10-03 12:39 ` [POT] Which journalised filesystem ? Rik van Riel
  2001-10-03 14:33 ` [POT] Which journalised filesystem uses Linus Torvalds ? Dave Cinege
@ 2001-10-03 16:54 ` Fabbione
  2001-10-03 17:52 ` Bernd Eckenfels
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Fabbione @ 2001-10-03 16:54 UTC (permalink / raw)
  To: sebastien.cabaniols; +Cc: linux-kernel

Hi Sebastien,
		I had the possiblity to poke around with ext3 and reiserfs,
but endup converting all my machine to ext3 for various reasons.

First of all we ext3 you don't need to re-format your partitions, so no
mkreiserfs or mke3fs
but a simple tune2fs. No need to backup all your data, rebuild on top of
the new fs and
reinstall and....

When I was testing reiserfs (it was atleast a couple of months ago) I
got very bad performance
but I know that they have improved performance within the 2.4.10 release
of the kernel
that unfortunatly seems having many other problems.

Fabbione

"sebastien.cabaniols" wrote:
> 
> Hello lkml,
> 
> With the availability of XFS,JFS,ext3 and ReiserFS I am a
> little
> lost and I don't know which one I should use for entreprise
> class
> servers.

-- 
Debian GNU/Linux Unstable Kernel 2.4.9
fabbione on irc.atdot.it #coredump #kchat | fabbione@fabbione.net

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 12:54   ` Dave Jones
                       ` (2 preceding siblings ...)
  2001-10-03 15:34     ` André Dahlqvist
@ 2001-10-03 17:03     ` Matthias Andree
  2001-10-03 17:36     ` Stephen C. Tweedie
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 47+ messages in thread
From: Matthias Andree @ 2001-10-03 17:03 UTC (permalink / raw)
  To: linux-kernel

On Wed, 03 Oct 2001, Dave Jones wrote:

> Alan mentioned this was something to do with the IBM hard disk
> having strange write-cache properties that confuse ext3.

hdparm -W0 /dev/hda is your friend.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 12:54   ` Dave Jones
                       ` (3 preceding siblings ...)
  2001-10-03 17:03     ` Matthias Andree
@ 2001-10-03 17:36     ` Stephen C. Tweedie
  2001-10-03 17:41       ` Dave Jones
  2001-10-04 21:09       ` Alan Cox
  2001-10-03 17:40     ` Sujal Shah
  2001-10-03 17:41     ` Xavier Bestel
  6 siblings, 2 replies; 47+ messages in thread
From: Stephen C. Tweedie @ 2001-10-03 17:36 UTC (permalink / raw)
  To: Dave Jones
  Cc: Rik van Riel, sebastien.cabaniols, linux-kernel, Stephen Tweedie

Hi,

On Wed, Oct 03, 2001 at 02:54:17PM +0200, Dave Jones wrote:

> > Personally I like ext3 a lot.  I've been using it for almost a
> > year now and it has never given me trouble.
> 
> I've similar experiences with ext3, except for one bad instance
> recently when I put it on my laptop. Lots of asserts were triggered,
> and on reboot it couldn't find the journal, the superblock,
> or the backup superblocks. I spent a few hours trying to get data
> back, and eventually gave up and reformatted as ext2.

Which laptop?  I've seen several reports of disk corruption with
recent kernels on certain laptops.

Cheers,
 Stephen

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 12:54   ` Dave Jones
                       ` (4 preceding siblings ...)
  2001-10-03 17:36     ` Stephen C. Tweedie
@ 2001-10-03 17:40     ` Sujal Shah
  2001-10-03 19:13       ` Erik Mouw
  2001-10-03 17:41     ` Xavier Bestel
  6 siblings, 1 reply; 47+ messages in thread
From: Sujal Shah @ 2001-10-03 17:40 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 963 bytes --]

On Wed, 2001-10-03 at 13:03, Matthias Andree wrote:
> On Wed, 03 Oct 2001, Dave Jones wrote:
> 
> > Alan mentioned this was something to do with the IBM hard disk
> > having strange write-cache properties that confuse ext3.
> 
> hdparm -W0 /dev/hda is your friend.

Dumb question: when would you want it to be -W1?

I mean, I can imagine maybe media recording or something where you might
*really* want the performance increase...  but generally speaking, I
want my data to be there in case things blow up.

does anyone know what the performance increase is?

Sujal

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
---- Sujal Shah ---- PSC Labs (Progress Software) ---- 

Now Playing: Ministry Of Sound - York - The Awakening

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 17:36     ` Stephen C. Tweedie
@ 2001-10-03 17:41       ` Dave Jones
  2001-10-04 21:09       ` Alan Cox
  1 sibling, 0 replies; 47+ messages in thread
From: Dave Jones @ 2001-10-03 17:41 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: Linux Kernel Mailing List

On Wed, 3 Oct 2001, Stephen C. Tweedie wrote:

> Which laptop?  I've seen several reports of disk corruption with
> recent kernels on certain laptops.

Sony Vaio Z600TEK
Hard disk is an IBM-DJSA-220

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 12:54   ` Dave Jones
                       ` (5 preceding siblings ...)
  2001-10-03 17:40     ` Sujal Shah
@ 2001-10-03 17:41     ` Xavier Bestel
  2001-10-03 17:53       ` Matthias Andree
  6 siblings, 1 reply; 47+ messages in thread
From: Xavier Bestel @ 2001-10-03 17:41 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Linux Kernel Mailing List

le mer 03-10-2001 at 19:03 Matthias Andree a écrit :
> On Wed, 03 Oct 2001, Dave Jones wrote:
> 
> > Alan mentioned this was something to do with the IBM hard disk
> > having strange write-cache properties that confuse ext3.
> 
> hdparm -W0 /dev/hda is your friend.

Unfortunately I think IDE drives don't honor this setting - write-cache
is always on.

	Xav


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 13:01     ` Ragnar Kjørstad
  2001-10-03 13:24       ` Dave Jones
@ 2001-10-03 17:51       ` Andrew Morton
  1 sibling, 0 replies; 47+ messages in thread
From: Andrew Morton @ 2001-10-03 17:51 UTC (permalink / raw)
  To: Ragnar Kjørstad
  Cc: Dave Jones, Rik van Riel, sebastien.cabaniols, linux-kernel

Ragnar Kjørstad wrote:
> 
> On Wed, Oct 03, 2001 at 02:54:17PM +0200, Dave Jones wrote:
> > Alan mentioned this was something to do with the IBM hard disk
> > having strange write-cache properties that confuse ext3.
> > I'm not sure if this has been fixed or not yet, but its enough
> > to make me think twice about trying it on the vaio for a while.
> 
> If a disk is doing write-back caching, it's likely to break all
> journaling filesystem and anything else that relies on write ordering.

In theory, disk write caching can defeat ext3's ordering requirements.

However I have never observed this in practice, nor have I seen
any report of it happening.

Think about it: ext3 writes a chunk of blocks, waits on them,
then writes a single commit block and waits on that.  The "chunk"
of blocks are very probably contiguous on disk.  The commit block
will most probably be at the very next LBA afer the "chunk".

The only way in which the drive can cause corruption is for it to write the
commit block before the "chunk", and for you to lose power [*] within
that time window.  Unless some serious block remapping has occurred
at the physical level, I really can't see any reason why the disk
should choose to flush those blocks in the wrong order.  Nor do I see why
the disk should leave a large time window between flushing the commit
block and then flushing the "chunk".

So....  I wouldn't be too fussed about it, personally.  




[*] I think it has to be a power outage - a kernel crash won't be
enough - the disk should still flush its write cache.  I'm not sure
if hitting the front-panel reset button would prevent a disk from
flushing its cache?

-

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
  2001-10-03 12:00 [POT] Which journalised filesystem uses Linus Torvalds ? sebastien.cabaniols
                   ` (2 preceding siblings ...)
  2001-10-03 16:54 ` Fabbione
@ 2001-10-03 17:52 ` Bernd Eckenfels
  2001-10-03 18:01 ` Luigi Genoni
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Bernd Eckenfels @ 2001-10-03 17:52 UTC (permalink / raw)
  To: linux-kernel

In article <GKMPCZ$IZh2dKhbICnp0WDXKHB6iO7OKoHwqOxmqj9XfriOC7PjHiIDA6bHi6xrImT@laposte.net> you wrote:
> With the availability of XFS,JFS,ext3 and ReiserFS I am a 
> little
> lost and I don't know which one I should use for entreprise 
> class
> servers.

In former versions of ReiserFS you had a weak support for fschk. And since a
lot of bugs and heavy load triggered this problem regularly, it was not
awise idea to use Reiser. Things are reported to have increased, but I do
not have any first hand experineces since then.

Personally I think xfs is a very mature Journaling File System. A bit
annoying is, that the CVS tree is hard to track from SGI. I have reports
from heavyly loaded servers that it performs very well (i.e. newsspool).

ext3 is the alternative, cause of its compatibility to ext2. But I am not
sure, if this is good or bad, since it has not increaesed some of the
performance issues of the ext2 structure, afaik.

I have no experience with JFS, IBM seems to missed a opportunity to have
large community support.

GFS as a general purpose filesystem may need some more tweaking, but it's
cluster properties are great for enterprise systems.

Greetings
Bernd

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 17:41     ` Xavier Bestel
@ 2001-10-03 17:53       ` Matthias Andree
  0 siblings, 0 replies; 47+ messages in thread
From: Matthias Andree @ 2001-10-03 17:53 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Wed, 03 Oct 2001, Xavier Bestel wrote:

> le mer 03-10-2001 at 19:03 Matthias Andree a écrit :
> > On Wed, 03 Oct 2001, Dave Jones wrote:
> > 
> > > Alan mentioned this was something to do with the IBM hard disk
> > > having strange write-cache properties that confuse ext3.
> > 
> > hdparm -W0 /dev/hda is your friend.
> 
> Unfortunately I think IDE drives don't honor this setting - write-cache
> is always on.

It's meant for IDE drives, and the write cache has been in the feature 
register for ages. Just try it, you'll notice if it fails.

BTW, it works as expected on my DJNA, DPTA, DTLA drives, and I know you
can turn the cache of DARA drives off as well.

-- 
Matthias Andree

"Those who give up essential liberties for temporary safety deserve
neither liberty nor safety." - Benjamin Franklin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
  2001-10-03 12:00 [POT] Which journalised filesystem uses Linus Torvalds ? sebastien.cabaniols
                   ` (3 preceding siblings ...)
  2001-10-03 17:52 ` Bernd Eckenfels
@ 2001-10-03 18:01 ` Luigi Genoni
  2001-10-04  5:42 ` Andrew Ip
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 47+ messages in thread
From: Luigi Genoni @ 2001-10-03 18:01 UTC (permalink / raw)
  To: sebastien.cabaniols; +Cc: linux-kernel

I would bet that Linus is using ext2 :).

apart of this, everyone will give you difefrent suggestions.

basically ext3 can journal data, but this way is slower, and is a simple
ext2 with journal.

reiserFS is really interesting, is the most space effective, thanx to
B*Tree and the advanced hash techniques, but
actually journals just meta-data. The real point is that
reiserFS does a tree traversal every time it writes  4k block, and the it
puts one pointer at time inside of the tree. So the tree is balanced every
4k write, That is bad for very large files.

jfs, should be quite stable. is a very interesting technology, and
i know it very well from AIX (but the linux one comes from OS2).
it's very solid, quite fast, can journal also data (??).
The way jfs manages free data block group is very smart, altought it is
not an extent based FS (but leaf node are piece of bitmap instead of
extent).

xfs, I dislike the way they are isering a kind of double VFS, but i
understand that Irix buffer cache was developed with some xfs features in
mind, and so they need this pagebuf module, but i dislike it. I also
dislike the concept of per-group quota, but this is just my taste.
Anyway, I have to admit that on very big file xfs is very efficient.
On Irix 6.4 i found it to be a little slow with small files.

That is just my opinion, I am wayting for reiserFS 4.

On Wed, 3 Oct 2001, sebastien.cabaniols wrote:

> Hello lkml,
>
> With the availability of XFS,JFS,ext3 and ReiserFS I am a
> little
> lost and I don't know which one I should use for entreprise
> class
> servers.
>
> In terms of intergration into the kernel, functionnalities,
> stability
> and performance which one is the best for entreprise class
> servers
>
> I guess the begining of the answer is: it depends... on what
> you are doing
>
> So, what do you think if
>
> I want a database server
reiserFS
> or
> a supercomputer (HPC use)
jfs / ext3
> or
> a Linux KDE/GNOME desktop
ext2 :)
>
Luigi


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 17:40     ` Sujal Shah
@ 2001-10-03 19:13       ` Erik Mouw
  2001-10-03 20:52         ` Mark Hahn
  0 siblings, 1 reply; 47+ messages in thread
From: Erik Mouw @ 2001-10-03 19:13 UTC (permalink / raw)
  To: Sujal Shah; +Cc: linux-kernel

On Wed, Oct 03, 2001 at 01:40:36PM -0400, Sujal Shah wrote:
> On Wed, 2001-10-03 at 13:03, Matthias Andree wrote:
> > hdparm -W0 /dev/hda is your friend.
> 
> Dumb question: when would you want it to be -W1?
> 
> I mean, I can imagine maybe media recording or something where you might
> *really* want the performance increase...  but generally speaking, I
> want my data to be there in case things blow up.

I've used it in the past for an SGI Octane that was (and still is) used
to do real time TV studio quality (CCIR-601 YUV422 data, about 20MB/s)
record/playback to four striped SCSI disks.

> does anyone know what the performance increase is?

It made the difference between "doesn't cut it" and "enough headroom".
IIRC it was something like 18MB/s without and 30MB/s with write
caching, but don't quote me on the exact numbers.


Erik

-- 
J.A.K. (Erik) Mouw, Information and Communication Theory Group, Department
of Electrical Engineering, Faculty of Information Technology and Systems,
Delft University of Technology, PO BOX 5031,  2600 GA Delft, The Netherlands
Phone: +31-15-2783635  Fax: +31-15-2781843  Email: J.A.K.Mouw@its.tudelft.nl
WWW: http://www-ict.its.tudelft.nl/~erik/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 19:13       ` Erik Mouw
@ 2001-10-03 20:52         ` Mark Hahn
  2001-10-04 22:49           ` Bernd Eckenfels
  0 siblings, 1 reply; 47+ messages in thread
From: Mark Hahn @ 2001-10-03 20:52 UTC (permalink / raw)
  To: Erik Mouw; +Cc: Sujal Shah, linux-kernel

> IIRC it was something like 18MB/s without and 30MB/s with write

for a current Maxtor 60G 5400 RPM UDMA100 disk, 2.4.10, ext2,
I just measured: 7 MBps with -W0, vs 27 MB/s with -W1.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
  2001-10-03 12:00 [POT] Which journalised filesystem uses Linus Torvalds ? sebastien.cabaniols
                   ` (4 preceding siblings ...)
  2001-10-03 18:01 ` Luigi Genoni
@ 2001-10-04  5:42 ` Andrew Ip
  2001-10-04  7:32 ` Constantin Loizides
  2001-10-04 16:30 ` Nathan Straz
  7 siblings, 0 replies; 47+ messages in thread
From: Andrew Ip @ 2001-10-04  5:42 UTC (permalink / raw)
  To: sebastien.cabaniols; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1614 bytes --]

For those who are interested on trying out journalling filesystem.  I have
made a kernel rpm which supports XFS, JFS, Ext3 and ReiserFS.  You can get it
at ftp://ftp.cwlinux.com/pub/downloads/journaling_fs/kernel.  Comments are
welcome.

-Andrew

On Wed, Oct 03, 2001 at 02:00:35PM +0200, sebastien.cabaniols wrote:
> Hello lkml,
> 
> With the availability of XFS,JFS,ext3 and ReiserFS I am a 
> little
> lost and I don't know which one I should use for entreprise 
> class
> servers.
> 
> In terms of intergration into the kernel, functionnalities, 
> stability
> and performance which one is the best for entreprise class 
> servers
> 
> I guess the begining of the answer is: it depends... on what 
> you are doing
> 
> So, what do you think if
> 
> I want a database server
> or
> a supercomputer (HPC use)
> or
> a Linux KDE/GNOME desktop
> 
> Thanks for your help, links and experience.
> 
> 
> Sebastien CABANIOLS
> 
> 
> 
> "Ce message vous est envoyé par laposte.net - web : www.laposte.net/  minitel : 3615 LAPOSTENET (0,84 F TTC la minute)/ téléphone : 08 92 68 13 50 (2,21 F TTC la minute)"
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Andrew Ip
Email:  aip@cwlinux.com
Tel:    (852) 2542 2046
Fax:    (852) 2542 2046
Mobile: (852) 9201 9866

Cwlinux Limited
18B Tower 1 Tern Centre,
237 Queen's Road Central,
Hong Kong.


[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
  2001-10-03 12:00 [POT] Which journalised filesystem uses Linus Torvalds ? sebastien.cabaniols
                   ` (5 preceding siblings ...)
  2001-10-04  5:42 ` Andrew Ip
@ 2001-10-04  7:32 ` Constantin Loizides
  2001-10-04 16:30 ` Nathan Straz
  7 siblings, 0 replies; 47+ messages in thread
From: Constantin Loizides @ 2001-10-04  7:32 UTC (permalink / raw)
  To: sebastien.cabaniols; +Cc: kernel-list

Hallo Sebastien,


> In terms of intergration into the kernel, functionnalities,
> stability and performance which one is the best for entreprise class
> servers
 
You might want to take a look at

http://www.informatik.uni-frankfurt.de/~loizides/reiserfs/

where I try to answer the one of your criteria, namely performance,
and how performance behaves over time, eg. when the file system
is heavily used...


The focus is on ReiserFS compared to Ext2, though I plan to set up
some tests with XFS and JFS soon (to get the results before end
of october)



Constantin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
  2001-10-03 12:00 [POT] Which journalised filesystem uses Linus Torvalds ? sebastien.cabaniols
                   ` (6 preceding siblings ...)
  2001-10-04  7:32 ` Constantin Loizides
@ 2001-10-04 16:30 ` Nathan Straz
  2001-10-04 17:21   ` Hristo Grigorov
  7 siblings, 1 reply; 47+ messages in thread
From: Nathan Straz @ 2001-10-04 16:30 UTC (permalink / raw)
  To: sebastien.cabaniols; +Cc: linux-kernel

On Wed, Oct 03, 2001 at 02:00:35PM +0200, sebastien.cabaniols wrote:
> With the availability of XFS,JFS,ext3 and ReiserFS I am a little lost
> and I don't know which one I should use for entreprise class servers.

I'd recommend reading:

       http://www.mandrakeforum.com/article.php?sid=1212&lang=en

It's an article in the Mandrake forums concerning ext3, JFS, XFS, and
ReiserFS, all of which are in Mandrake 8.1.


> In terms of intergration into the kernel, functionnalities, stability
> and performance which one is the best for entreprise class servers

For enterprise stuff, I would recommend XFS based on the tools it
provides.  XFS has a complete set of tools for dumping XFS, repairing a
broken file system (should it every break), and debugging should you
find something wrong with it.  

-- 
Nate Straz                                              nstraz@sgi.com
sgi, inc                                           http://www.sgi.com/
Linux Test Project                                  http://ltp.sf.net/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
  2001-10-04 16:30 ` Nathan Straz
@ 2001-10-04 17:21   ` Hristo Grigorov
  0 siblings, 0 replies; 47+ messages in thread
From: Hristo Grigorov @ 2001-10-04 17:21 UTC (permalink / raw)
  To: Nathan Straz, sebastien.cabaniols; +Cc: linux-kernel

Heh..

Choosing the best FS is like choosing the best Linux distribution or choosing
the best women for the rest of your life, as you like.. :) 

Each FS implementation has its strengths and weaknesses. I read that article
and come to the opinion that every peace of software is more or less PnP 
(plug-n-pray). You know, every code has bugs and the worst of them are never
found :)

Hristo.

On Thursday 04 October 2001 19:30, Nathan Straz wrote:
> On Wed, Oct 03, 2001 at 02:00:35PM +0200, sebastien.cabaniols wrote:
> > With the availability of XFS,JFS,ext3 and ReiserFS I am a little lost
> > and I don't know which one I should use for entreprise class servers.
>
> I'd recommend reading:
>
>        http://www.mandrakeforum.com/article.php?sid=1212&lang=en
>
> It's an article in the Mandrake forums concerning ext3, JFS, XFS, and
> ReiserFS, all of which are in Mandrake 8.1.
>
> > In terms of intergration into the kernel, functionnalities, stability
> > and performance which one is the best for entreprise class servers
>
> For enterprise stuff, I would recommend XFS based on the tools it
> provides.  XFS has a complete set of tools for dumping XFS, repairing a
> broken file system (should it every break), and debugging should you
> find something wrong with it.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 17:36     ` Stephen C. Tweedie
  2001-10-03 17:41       ` Dave Jones
@ 2001-10-04 21:09       ` Alan Cox
  2001-10-05 10:27         ` Stephen C. Tweedie
  1 sibling, 1 reply; 47+ messages in thread
From: Alan Cox @ 2001-10-04 21:09 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Dave Jones, Rik van Riel, sebastien.cabaniols, linux-kernel,
	Stephen Tweedie

> Which laptop?  I've seen several reports of disk corruption with
> recent kernels on certain laptops.

20Gbyte IBM 2.5" ones I suspect ? If so then we aren't the only OS

Alan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 15:34     ` André Dahlqvist
@ 2001-10-04 21:25       ` Alan Cox
  2001-10-04 21:53         ` Alessandro Suardi
  0 siblings, 1 reply; 47+ messages in thread
From: Alan Cox @ 2001-10-04 21:25 UTC (permalink / raw)
  To: André Dahlqvist; +Cc: linux-kernel

> > Alan mentioned this was something to do with the IBM hard disk
> > having strange write-cache properties that confuse ext3.
> 
> Which IBM harddrive(s) does this? How can one check if it does?

Its not specifically IBM, there are two sets of things to watch out for

-	Cache flush as a nop/unimplemented. This is legal in all but the
	most recent ATA specification. The spec has been tightened so that
	problem will go in time

-	Some IBM laptop drives appeared to fail to write back the cache on
	machine shutdown/suspend etc. The exact rights/wrongs/details on
	that one haven't been pinned down because the folks concerned
	swapped a couple of drives for different ones, saw the problem
	vanish and being a large organisation had the supplier replace the
	other fifty odd. 

Alan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-04 21:25       ` Alan Cox
@ 2001-10-04 21:53         ` Alessandro Suardi
  0 siblings, 0 replies; 47+ messages in thread
From: Alessandro Suardi @ 2001-10-04 21:53 UTC (permalink / raw)
  To: Alan Cox; +Cc: André Dahlqvist, linux-kernel

Alan Cox wrote:
> 
> > > Alan mentioned this was something to do with the IBM hard disk
> > > having strange write-cache properties that confuse ext3.
> >
> > Which IBM harddrive(s) does this? How can one check if it does?
> 
> Its not specifically IBM, there are two sets of things to watch out for
> 
> -       Cache flush as a nop/unimplemented. This is legal in all but the
>         most recent ATA specification. The spec has been tightened so that
>         problem will go in time
> 
> -       Some IBM laptop drives appeared to fail to write back the cache on
>         machine shutdown/suspend etc. The exact rights/wrongs/details on
>         that one haven't been pinned down because the folks concerned
>         swapped a couple of drives for different ones, saw the problem
>         vanish and being a large organisation had the supplier replace the
>         other fifty odd.

[asuardi@dolphin asuardi]$ dmesg | grep hda
    ide0: BM-DMA at 0x0860-0x0867, BIOS settings: hda:DMA, hdb:pio
hda: IBM-DJSA-220, ATA DISK drive
hda: 39070080 sectors (20004 MB) w/1874KiB Cache, CHS=2432/255/63, UDMA(33)
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 >

This one has been used in the last 4 months without any issue
 doing lots of shutdowns, suspends, kernel rebuilds etc. ;)

--alessandro

 "this is no time to get cute, it's a mad dog's promenade
  so walk tall, or baby don't walk at all"
                (Bruce Springsteen, 'New York City Serenade')

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-04 22:14       ` Alan Cox
@ 2001-10-04 22:14         ` Dave Jones
  2001-10-04 22:24           ` Alan Cox
  0 siblings, 1 reply; 47+ messages in thread
From: Dave Jones @ 2001-10-04 22:14 UTC (permalink / raw)
  To: Alan Cox; +Cc: lk

On Thu, 4 Oct 2001, Alan Cox wrote:

> I have no recorded case of an ext3 crash that someone showed was even
> likely to have been disk caching stuff.

So the case I mentioned to you about 2 months ago was some 'quirk'
of the drive rather than its write cache ? (Yup, 20gb IBM).
I'm sure you mentioned write cache in relation to that, but I could
be wrong.

regards,

Dave.

-- 
| Dave Jones.        http://www.suse.de/~davej
| SuSE Labs


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 13:00     ` Billy Harvey
@ 2001-10-04 22:14       ` Alan Cox
  2001-10-04 22:14         ` Dave Jones
  0 siblings, 1 reply; 47+ messages in thread
From: Alan Cox @ 2001-10-04 22:14 UTC (permalink / raw)
  To: Billy Harvey; +Cc: lk, Dave Jones

> I've been using ext3 on my ThinkPad (A20P) for about a month now with
> nary the slightest problem.  I've even smoke tested it by shutting it
> down in the middle of disk writes and it worked fine.

I have no recorded case of an ext3 crash that someone showed was even 
likely to have been disk caching stuff. 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-04 22:14         ` Dave Jones
@ 2001-10-04 22:24           ` Alan Cox
  0 siblings, 0 replies; 47+ messages in thread
From: Alan Cox @ 2001-10-04 22:24 UTC (permalink / raw)
  To: Dave Jones; +Cc: Alan Cox, lk

> So the case I mentioned to you about 2 months ago was some 'quirk'
> of the drive rather than its write cache ? (Yup, 20gb IBM).
> I'm sure you mentioned write cache in relation to that, but I could
> be wrong.

Write cache yes - not apparently writing it out always on suspend/power off

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-03 20:52         ` Mark Hahn
@ 2001-10-04 22:49           ` Bernd Eckenfels
  2001-10-04 23:27             ` Linus Torvalds
  0 siblings, 1 reply; 47+ messages in thread
From: Bernd Eckenfels @ 2001-10-04 22:49 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.10.10110031648250.20425-100000@coffee.psychology.mcmaster.ca> you wrote:
> for a current Maxtor 60G 5400 RPM UDMA100 disk, 2.4.10, ext2,
> I just measured: 7 MBps with -W0, vs 27 MB/s with -W1.

how much data do you have written to get those numbers? The drive cache is
is most often so small it only can cache a few blocks.

Greetings
Bernd

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-04 22:49           ` Bernd Eckenfels
@ 2001-10-04 23:27             ` Linus Torvalds
  2001-10-04 23:55               ` Rik van Riel
  2001-10-05  1:05               ` Mike Fedyk
  0 siblings, 2 replies; 47+ messages in thread
From: Linus Torvalds @ 2001-10-04 23:27 UTC (permalink / raw)
  To: linux-kernel

In article <E15pHJT-00041q-00@calista.inka.de>,
Bernd Eckenfels  <ecki@lina.inka.de> wrote:
>In article <Pine.LNX.4.10.10110031648250.20425-100000@coffee.psychology.mcmaster.ca> you wrote:
>> for a current Maxtor 60G 5400 RPM UDMA100 disk, 2.4.10, ext2,
>> I just measured: 7 MBps with -W0, vs 27 MB/s with -W1.
>
>how much data do you have written to get those numbers? The drive cache is
>is most often so small it only can cache a few blocks.

Actually, that's not the main win of writeback caching.

Themain win is being able to write a whole track in one go, starting at
the _right_ position (where "right" is defined as "where the head
happens to be when it can start writing). Along with making up for the
occasional seek for meta-data, and other "smooth out the writes so that
the platter keeps gettint written to all the time" things.

Which can be a HUGE win, and which is why I personally think that any
disk that doesn't do write-back caching is a waste of good money.

We (as in Linux) should make sure that we explicitly tell the disk when
we need it to flush its disk buffers. We don't do that right, and
because of _our_ problems some people claim that writeback caching is
evil and bad.

		Linus

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-04 23:27             ` Linus Torvalds
@ 2001-10-04 23:55               ` Rik van Riel
  2001-10-05 14:57                 ` Alan Cox
  2001-10-05  1:05               ` Mike Fedyk
  1 sibling, 1 reply; 47+ messages in thread
From: Rik van Riel @ 2001-10-04 23:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Thu, 4 Oct 2001, Linus Torvalds wrote:

> We (as in Linux) should make sure that we explicitly tell the disk when
> we need it to flush its disk buffers. We don't do that right, and
> because of _our_ problems some people claim that writeback caching is
> evil and bad.

Does this even work right for IDE ?

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/  (volunteers needed)

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-04 23:27             ` Linus Torvalds
  2001-10-04 23:55               ` Rik van Riel
@ 2001-10-05  1:05               ` Mike Fedyk
  1 sibling, 0 replies; 47+ messages in thread
From: Mike Fedyk @ 2001-10-05  1:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Thu, Oct 04, 2001 at 11:27:45PM +0000, Linus Torvalds wrote:
> We (as in Linux) should make sure that we explicitly tell the disk when
> we need it to flush its disk buffers. We don't do that right, and
> because of _our_ problems some people claim that writeback caching is
> evil and bad.
>

Actually, their claim is that most drives won't even *honor* the request to
sync to oxide.

Once the number of drives that support this goes up, then write cache is
safe to use...

Personally, I have a script that enables write cache, and sets the drive to
its highest dma level on boot...

Mike

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-04 21:09       ` Alan Cox
@ 2001-10-05 10:27         ` Stephen C. Tweedie
  0 siblings, 0 replies; 47+ messages in thread
From: Stephen C. Tweedie @ 2001-10-05 10:27 UTC (permalink / raw)
  To: Alan Cox
  Cc: Stephen C. Tweedie, Dave Jones, Rik van Riel,
	sebastien.cabaniols, linux-kernel

Hi,

On Thu, Oct 04, 2001 at 10:09:38PM +0100, Alan Cox wrote:
> > Which laptop?  I've seen several reports of disk corruption with
> > recent kernels on certain laptops.
> 
> 20Gbyte IBM 2.5" ones I suspect ? If so then we aren't the only OS

Yes, it was.

--Stephen

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-04 23:55               ` Rik van Riel
@ 2001-10-05 14:57                 ` Alan Cox
  2001-10-05 15:25                   ` Eric W. Biederman
  2001-10-10 17:29                   ` Stephen C. Tweedie
  0 siblings, 2 replies; 47+ messages in thread
From: Alan Cox @ 2001-10-05 14:57 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linus Torvalds, linux-kernel

> > We (as in Linux) should make sure that we explicitly tell the disk when
> > we need it to flush its disk buffers. We don't do that right, and
> > because of _our_ problems some people claim that writeback caching is
> > evil and bad.
> 
> Does this even work right for IDE ?

Current IDE drives it may be a NOP. Worse than that it would totally ruin
high end raid performance. We need to pass write barriers. A good i2o card
might have 256Mb of writeback cache that we want to avoid flushing - because
it is battery backed and can be ordered.

By all means have drivers fall back to cache writeback, but don't assume
that is the basic operation.

Indeed a smarter raid card can generally do

	"read"
	"read with readahead"
	"read with readahead and some readahead on card only"
	"read but dont cache"

	"write to cache"
	"write through cache"
	"write uncached"

Alan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-05 14:57                 ` Alan Cox
@ 2001-10-05 15:25                   ` Eric W. Biederman
  2001-10-05 20:25                     ` Bernd Eckenfels
  2001-10-05 22:05                     ` Pavel Machek
  2001-10-10 17:29                   ` Stephen C. Tweedie
  1 sibling, 2 replies; 47+ messages in thread
From: Eric W. Biederman @ 2001-10-05 15:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: Rik van Riel, Linus Torvalds, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> > > We (as in Linux) should make sure that we explicitly tell the disk when
> > > we need it to flush its disk buffers. We don't do that right, and
> > > because of _our_ problems some people claim that writeback caching is
> > > evil and bad.
> > 
> > Does this even work right for IDE ?
> 
> Current IDE drives it may be a NOP. Worse than that it would totally ruin
> high end raid performance. We need to pass write barriers. A good i2o card
> might have 256Mb of writeback cache that we want to avoid flushing - because
> it is battery backed and can be ordered.

If the cache is small and is primarily a track cache (IDE) one trick that
we can do is to flood the cache with data so everything is forced out.

We can do this at mkfs time, (so even destructive tests are allowed)
and we can probe how to make this work for a particular drive.  And
then the kernel can just use the results of that probe. 

> By all means have drivers fall back to cache writeback, but don't assume
> that is the basic operation.

Definentily.  We want a write-barrier however we can get it.
 
> Indeed a smarter raid card can generally do

Eric

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-05 15:25                   ` Eric W. Biederman
@ 2001-10-05 20:25                     ` Bernd Eckenfels
  2001-10-05 23:41                       ` Miquel van Smoorenburg
  2001-10-06  8:32                       ` Tonu Samuel
  2001-10-05 22:05                     ` Pavel Machek
  1 sibling, 2 replies; 47+ messages in thread
From: Bernd Eckenfels @ 2001-10-05 20:25 UTC (permalink / raw)
  To: linux-kernel

In article <m1669uyuqy.fsf@frodo.biederman.org> you wrote:
> Definentily.  We want a write-barrier however we can get it.

Does that mean we can or we can't? Is there a flush write cache operation in
ATA? I asume there is one in SCSI?

Greetings
Bernd

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-05 15:25                   ` Eric W. Biederman
  2001-10-05 20:25                     ` Bernd Eckenfels
@ 2001-10-05 22:05                     ` Pavel Machek
  2001-10-07  0:51                       ` Eric W. Biederman
  1 sibling, 1 reply; 47+ messages in thread
From: Pavel Machek @ 2001-10-05 22:05 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Alan Cox, Rik van Riel, Linus Torvalds, linux-kernel

Hi!

> > > > We (as in Linux) should make sure that we explicitly tell the disk when
> > > > we need it to flush its disk buffers. We don't do that right, and
> > > > because of _our_ problems some people claim that writeback caching is
> > > > evil and bad.
> > > 
> > > Does this even work right for IDE ?
> > 
> > Current IDE drives it may be a NOP. Worse than that it would totally ruin
> > high end raid performance. We need to pass write barriers. A good i2o card
> > might have 256Mb of writeback cache that we want to avoid flushing - because
> > it is battery backed and can be ordered.
> 
> If the cache is small and is primarily a track cache (IDE) one trick that
> we can do is to flood the cache with data so everything is forced out.
> 
> We can do this at mkfs time, (so even destructive tests are allowed)
> and we can probe how to make this work for a particular drive.  And
> then the kernel can just use the results of that probe. 

How do you probe this without actually powering system down?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-05 20:25                     ` Bernd Eckenfels
@ 2001-10-05 23:41                       ` Miquel van Smoorenburg
  2001-10-06  8:32                       ` Tonu Samuel
  1 sibling, 0 replies; 47+ messages in thread
From: Miquel van Smoorenburg @ 2001-10-05 23:41 UTC (permalink / raw)
  To: linux-kernel

In article <E15pbX5-0007do-00@calista.inka.de>,
Bernd Eckenfels  <ecki@lina.inka.de> wrote:
>In article <m1669uyuqy.fsf@frodo.biederman.org> you wrote:
>> Definentily.  We want a write-barrier however we can get it.
>
>Does that mean we can or we can't? Is there a flush write cache operation in
>ATA? I asume there is one in SCSI?

Well hdparm has a -W option with which you can turn on/off the
write cache. If that works (and it appears it does) you should be
able to turn write cache off, write *one* block so that the
cache gets flushed and turn it back on. I'm not sure how to
test this, though.

Mike.
-- 
Move sig.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-05 20:25                     ` Bernd Eckenfels
  2001-10-05 23:41                       ` Miquel van Smoorenburg
@ 2001-10-06  8:32                       ` Tonu Samuel
  2001-10-06  9:16                         ` Miquel van Smoorenburg
  2001-10-06 16:42                         ` Bernd Eckenfels
  1 sibling, 2 replies; 47+ messages in thread
From: Tonu Samuel @ 2001-10-06  8:32 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-kernel

On Sat, 2001-10-06 at 01:41, Miquel van Smoorenburg wrote:
> >Does that mean we can or we can't? Is there a flush write cache operation in
> >ATA? I asume there is one in SCSI?
> 
> Well hdparm has a -W option with which you can turn on/off the
> write cache. If that works (and it appears it does) you should be
> able to turn write cache off, write *one* block so that the
> cache gets flushed and turn it back on. I'm not sure how to
> test this, though.

Doesn't hdparm -W0f do the work?

  Tõnu


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-06  8:32                       ` Tonu Samuel
@ 2001-10-06  9:16                         ` Miquel van Smoorenburg
  2001-10-06 16:42                         ` Bernd Eckenfels
  1 sibling, 0 replies; 47+ messages in thread
From: Miquel van Smoorenburg @ 2001-10-06  9:16 UTC (permalink / raw)
  To: linux-kernel

In article <1002357150.3083.20.camel@volk.internalnet>,
Tonu Samuel  <tonu@please.do.not.remove.this.spam.ee> wrote:
>On Sat, 2001-10-06 at 01:41, Miquel van Smoorenburg wrote:
>> >Does that mean we can or we can't? Is there a flush write cache operation in
>> >ATA? I asume there is one in SCSI?
>> 
>> Well hdparm has a -W option with which you can turn on/off the
>> write cache. If that works (and it appears it does) you should be
>> able to turn write cache off, write *one* block so that the
>> cache gets flushed and turn it back on. I'm not sure how to
>> test this, though.
>
>Doesn't hdparm -W0f do the work?

No, -f flushes the kernels buffer cache, not the IDE disk write cache.

Mike.
-- 
Move sig.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-06  8:32                       ` Tonu Samuel
  2001-10-06  9:16                         ` Miquel van Smoorenburg
@ 2001-10-06 16:42                         ` Bernd Eckenfels
  1 sibling, 0 replies; 47+ messages in thread
From: Bernd Eckenfels @ 2001-10-06 16:42 UTC (permalink / raw)
  To: linux-kernel

In article <1002357150.3083.20.camel@volk.internalnet> you wrote:
>> Well hdparm has a -W option with which you can turn on/off the
>> write cache. If that works (and it appears it does) you should be
>> able to turn write cache off, write *one* block so that the
>> cache gets flushed and turn it back on. I'm not sure how to
>> test this, though.

> Doesn't hdparm -W0f do the work?

We are talking about a write barrier. This means you write all stuff which
can be written unordered (all data) and then you initiate the barrier.. and
if that is finished, you write the commit block. That way you can get
increased write performance and still transaction safe persitence.

Gruss
Bernd

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-05 22:05                     ` Pavel Machek
@ 2001-10-07  0:51                       ` Eric W. Biederman
  0 siblings, 0 replies; 47+ messages in thread
From: Eric W. Biederman @ 2001-10-07  0:51 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

Pavel Machek <pavel@Elf.ucw.cz> writes:

> Hi!
> 
> > > > > We (as in Linux) should make sure that we explicitly tell the disk when
> > > > > we need it to flush its disk buffers. We don't do that right, and
> > > > > because of _our_ problems some people claim that writeback caching is
> > > > > evil and bad.
> > > > 
> > > > Does this even work right for IDE ?
> > > 
> > > Current IDE drives it may be a NOP. Worse than that it would totally ruin
> > > high end raid performance. We need to pass write barriers. A good i2o card
> > > might have 256Mb of writeback cache that we want to avoid flushing - because
> 
> > > it is battery backed and can be ordered.
> > 
> > If the cache is small and is primarily a track cache (IDE) one trick that
> > we can do is to flood the cache with data so everything is forced out.
> > 
> > We can do this at mkfs time, (so even destructive tests are allowed)
> > and we can probe how to make this work for a particular drive.  And
> > then the kernel can just use the results of that probe. 
> 
> How do you probe this without actually powering system down?

You can't be 100% certain.  But you can do timings.  And usually you can
infer what is happening in the caches from that.  For example if you
take timings with the cache enabled and disabled, and the speed is the
same you can be fairly confident that the caches doen't disable.

Having a final verification step where you ask the user to pull the plug
could add some extra confidence.  But even then weird cases of buggy
firmware could defeat you.

Eric






^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem ?
  2001-10-05 14:57                 ` Alan Cox
  2001-10-05 15:25                   ` Eric W. Biederman
@ 2001-10-10 17:29                   ` Stephen C. Tweedie
  1 sibling, 0 replies; 47+ messages in thread
From: Stephen C. Tweedie @ 2001-10-10 17:29 UTC (permalink / raw)
  To: Alan Cox; +Cc: Rik van Riel, Linus Torvalds, linux-kernel, Stephen Tweedie

Hi,

On Fri, Oct 05, 2001 at 03:57:49PM +0100, Alan Cox wrote:
> > > We (as in Linux) should make sure that we explicitly tell the disk when
> > > we need it to flush its disk buffers. We don't do that right, and
> > > because of _our_ problems some people claim that writeback caching is
> > > evil and bad.
> > 
> > Does this even work right for IDE ?
> 
> Current IDE drives it may be a NOP. Worse than that it would totally ruin
> high end raid performance. We need to pass write barriers. A good i2o card
> might have 256Mb of writeback cache that we want to avoid flushing - because
> it is battery backed and can be ordered.

The important thing is to flush to non-volatile storage: non-volatile
cache still qualifies.  The one thing we need to avoid is the data
lingering in volatile cache, and that's a different thing.

Sure, journaling filesystems can benefit from a write barrier, but at
some point that's not sufficient --- we really need to know, at a high
level, whether the data is permanently secured.  When your MTA
finishes its fsync(), it assumes that the mail spool file has been
securely stored and it can tell the sender to go ahead and delete the
upstream copy.  

A barrier is not sufficient there.  It's a useful primitive to have,
but not a substitute for a flush to permanent storage.

--Stephen

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [POT] Which journalised filesystem uses Linus Torvalds ?
@ 2001-10-03 16:21 Roy Murphy
  0 siblings, 0 replies; 47+ messages in thread
From: Roy Murphy @ 2001-10-03 16:21 UTC (permalink / raw)
  To: linux-kernel

'Twas brillig when Sebastien Cabaniols scrobe:
>With the availability of XFS,JFS,ext3 and ReiserFS I am a 
>little lost and I don't know which one I should use for entreprise 
>class servers. 

Well, the Linus Torvalds filesystem (ltfs for short) is a highly developed,
version control filesystem, but it still has a few shortcomings.

When saving a file to ltfs, it sometimes suggests that you should do it a different
way.  The ltfs is very particular about how things should be done.

Often, when saving a file, it is dropped without any notification.  Experienced
users of the ltfs follow the mantra "submit early and submit often".  They repeatedly
resave their files hoping that one of them will be accepted into a "version"
that does get saved to disk.

Several forks of the ltfs (i.e the Alan Cox filesystem -- acfs and the Anread
Arcangeli filesystem -- aafs) are a little better about saving files, but each
of them has its own idea about which files are worthy of being saved.

While these advanced filesystems hold great promise for the future, they should
probably not be used in a production server due to these failings.  In fact,
one user of the acfs, Telsa Cox, reports that the acfs often dosn't work at
all before noon local time.

YMMV.
 

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2001-10-10 17:31 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-03 12:00 [POT] Which journalised filesystem uses Linus Torvalds ? sebastien.cabaniols
2001-10-03 12:39 ` [POT] Which journalised filesystem ? Rik van Riel
2001-10-03 12:54   ` Dave Jones
2001-10-03 13:00     ` Billy Harvey
2001-10-04 22:14       ` Alan Cox
2001-10-04 22:14         ` Dave Jones
2001-10-04 22:24           ` Alan Cox
2001-10-03 13:01     ` Ragnar Kjørstad
2001-10-03 13:24       ` Dave Jones
2001-10-03 17:51       ` Andrew Morton
2001-10-03 15:34     ` André Dahlqvist
2001-10-04 21:25       ` Alan Cox
2001-10-04 21:53         ` Alessandro Suardi
2001-10-03 17:03     ` Matthias Andree
2001-10-03 17:36     ` Stephen C. Tweedie
2001-10-03 17:41       ` Dave Jones
2001-10-04 21:09       ` Alan Cox
2001-10-05 10:27         ` Stephen C. Tweedie
2001-10-03 17:40     ` Sujal Shah
2001-10-03 19:13       ` Erik Mouw
2001-10-03 20:52         ` Mark Hahn
2001-10-04 22:49           ` Bernd Eckenfels
2001-10-04 23:27             ` Linus Torvalds
2001-10-04 23:55               ` Rik van Riel
2001-10-05 14:57                 ` Alan Cox
2001-10-05 15:25                   ` Eric W. Biederman
2001-10-05 20:25                     ` Bernd Eckenfels
2001-10-05 23:41                       ` Miquel van Smoorenburg
2001-10-06  8:32                       ` Tonu Samuel
2001-10-06  9:16                         ` Miquel van Smoorenburg
2001-10-06 16:42                         ` Bernd Eckenfels
2001-10-05 22:05                     ` Pavel Machek
2001-10-07  0:51                       ` Eric W. Biederman
2001-10-10 17:29                   ` Stephen C. Tweedie
2001-10-05  1:05               ` Mike Fedyk
2001-10-03 17:41     ` Xavier Bestel
2001-10-03 17:53       ` Matthias Andree
2001-10-03 14:33 ` [POT] Which journalised filesystem uses Linus Torvalds ? Dave Cinege
2001-10-03 14:48   ` Sean Hunter
2001-10-03 16:54 ` Fabbione
2001-10-03 17:52 ` Bernd Eckenfels
2001-10-03 18:01 ` Luigi Genoni
2001-10-04  5:42 ` Andrew Ip
2001-10-04  7:32 ` Constantin Loizides
2001-10-04 16:30 ` Nathan Straz
2001-10-04 17:21   ` Hristo Grigorov
2001-10-03 16:21 Roy Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).