All of lore.kernel.org
 help / color / mirror / Atom feed
* Questions about XFS
@ 2013-06-11  9:56 Steve Bergman
  2013-06-11 13:10 ` Emmanuel Florac
                   ` (4 more replies)
  0 siblings, 5 replies; 50+ messages in thread
From: Steve Bergman @ 2013-06-11  9:56 UTC (permalink / raw)
  To: linux-xfs

Hi all,

I have a few questions about XFS that didn't make the XFS FAQ. I'm trying to
get a feel for where I might want to use it on my servers (or at home). A
mix of ext3 & ext4 has worked well for me. But I'd like to get to know XFS a
bit better. The target OS would be RHEL6.

1. I don't have "lots and large". Why should I run XFS?

2. I don't have "lots and large". Why shouldn't I run XFS?

3. Why doesn't RHEL6 support XFS on root, when the XFS FAQ says XFS on root
is fine? Is there some issue I should be aware of?

4. From the time I write() a bit of data, what's the maximum time before the
data is actually committed to disk?

5. Ext4 provides some automatic fsync'ing to avoid the zero-length file
issue for some common cases via the auto_da_alloc feature added in kernel
2.6.30. Does XFS have similar behavior? 

6. RHEL6 Anaconda sets a RAID10 chunk size of 512K by default XFS complains
and sets its log stripe down to 32k. Should I accept Anaconda's default? It
knows I've requested XFS formatting before it sets the chunk size, after all.

8. Eric (and the XFS FAQ) have recommended just using the defaults for
mkfs.xfs and mount. But I've also heard Dave say "Increase logbsize and use
inode64; everybody does that, but we just haven't made it the default". I'm
guessing it doesn't matter if one doesn't have large and lots?

9. I there something else I should have thought to ask?

Thanks for any insights,

Steve Bergman

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11  9:56 Questions about XFS Steve Bergman
@ 2013-06-11 13:10 ` Emmanuel Florac
  2013-06-11 13:35 ` Stefan Ring
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: Emmanuel Florac @ 2013-06-11 13:10 UTC (permalink / raw)
  To: Steve Bergman; +Cc: linux-xfs

Le Tue, 11 Jun 2013 09:56:38 +0000 (UTC)
Steve Bergman <sbergman27@gmail.com> écrivait:

> Hi all,
> 
> I have a few questions about XFS that didn't make the XFS FAQ. I'm
> trying to get a feel for where I might want to use it on my servers
> (or at home). A mix of ext3 & ext4 has worked well for me. But I'd
> like to get to know XFS a bit better. The target OS would be RHEL6.
> 
> 1. I don't have "lots and large". Why should I run XFS?

Because it performs well and reliably.

> 2. I don't have "lots and large". Why shouldn't I run XFS?

Because ext4 is more common and won't uncover unexpected bugs in badly
written applications.

> 3. Why doesn't RHEL6 support XFS on root, when the XFS FAQ says XFS
> on root is fine? Is there some issue I should be aware of?

Not that I know of. I've used XFS as root back in the RedHat 7.x times,
13 years ago. I've used XFS as root on Irix years before that.
However nowadays I use XFS extensively but usually not as root.

> 4. From the time I write() a bit of data, what's the maximum time
> before the data is actually committed to disk?

On the distributions I'm using (Debian, Slackware), no significant
delay that I know of. Even extremely mistreated systems (pulling the
plug while working shouldn't do any harm, should it?)

> 5. Ext4 provides some automatic fsync'ing to avoid the zero-length
> file issue for some common cases via the auto_da_alloc feature added
> in kernel 2.6.30. Does XFS have similar behavior? 

I don't know. I keep hearing of this "xfs bug" but never actually
encountered it, ever, though I've set up about 3000 servers with XFS
filesystems, many to work under very harsh conditions.

> 6. RHEL6 Anaconda sets a RAID10 chunk size of 512K by default XFS
> complains and sets its log stripe down to 32k. Should I accept
> Anaconda's default? It knows I've requested XFS formatting before it
> sets the chunk size, after all.

512k seems insanely large to me. Something like 64 or 256k seems more
common, and reasonable. BTW, 256k is a perfectly valid size for xfs log
stripe. My advice: unless you plan on working only with big files,
create a 64k stripe RAID-10. Your performance will be much better, and
XFS will be happy.

> 8. Eric (and the XFS FAQ) have recommended just using the defaults for
> mkfs.xfs and mount. But I've also heard Dave say "Increase logbsize
> and use inode64; everybody does that, but we just haven't made it the
> default". I'm guessing it doesn't matter if one doesn't have large
> and lots?

Actually inode64 is default on recent kernels. Of course this doesn't
apply to RH which for some reason uses only positively jurassic
kernels :)
Increasing logbsize is probably unnecessary except on highly
performance sensitive workloads; currently the 32k default should be
enough.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11  9:56 Questions about XFS Steve Bergman
  2013-06-11 13:10 ` Emmanuel Florac
@ 2013-06-11 13:35 ` Stefan Ring
  2013-06-11 13:52 ` Ric Wheeler
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 50+ messages in thread
From: Stefan Ring @ 2013-06-11 13:35 UTC (permalink / raw)
  To: Steve Bergman; +Cc: linux-xfs

> I have a few questions about XFS that didn't make the XFS FAQ. I'm trying to
> get a feel for where I might want to use it on my servers (or at home). A
> mix of ext3 & ext4 has worked well for me. But I'd like to get to know XFS a
> bit better. The target OS would be RHEL6.

That's what I'm using as well. Answers are obviously biased by my
limited experience. I'm not an XFS veteran and have only started using
it after delaylog was added.

> 1. I don't have "lots and large". Why should I run XFS?

For me, the main reason is this: on a multi-user system with a decent
amount of memory (let's say >= 32GB), a single user untarring a large
source tarball, checking out a large source repository etc. will
effectively stall an ext3/4 filesystem for minutes on end. This does
not happen with XFS. Maybe this has been solved with recent kernels,
but it used to be a major nuisance for several years with kernels in
the 2.6.25-2.6.30 range at least.

> 2. I don't have "lots and large". Why shouldn't I run XFS?

Cache/memory accounting as well as displayed % wait time in top are "strange".

XFS performance is more sensitive to free space fragmentation than
ext4. For this reason, I find that it is not overly problematic to
fill an ext filesystem 95%, while with XFS I would stay below 85%.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11  9:56 Questions about XFS Steve Bergman
  2013-06-11 13:10 ` Emmanuel Florac
  2013-06-11 13:35 ` Stefan Ring
@ 2013-06-11 13:52 ` Ric Wheeler
  2013-06-11 13:59 ` Ric Wheeler
  2013-06-11 19:35 ` Ben Myers
  4 siblings, 0 replies; 50+ messages in thread
From: Ric Wheeler @ 2013-06-11 13:52 UTC (permalink / raw)
  To: xfs

On 06/11/2013 05:56 AM, Steve Bergman wrote:
> 3. Why doesn't RHEL6 support XFS on root, when the XFS FAQ says XFS on root
> is fine? Is there some issue I should be aware of?

This is a vendor specific issue, not an upstream one, so let me put on my fedora 
and respond.

When Red Hat added XFS support several years back, we decided to roll it out in 
a very controlled way. Not a comment on any limitation on XFS as root.

Note that in RHEL7, we will be supporting XFS without this restriction.

Thanks!

Ric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11  9:56 Questions about XFS Steve Bergman
                   ` (2 preceding siblings ...)
  2013-06-11 13:52 ` Ric Wheeler
@ 2013-06-11 13:59 ` Ric Wheeler
  2013-06-11 16:12   ` Steve Bergman
  2013-06-11 19:35 ` Ben Myers
  4 siblings, 1 reply; 50+ messages in thread
From: Ric Wheeler @ 2013-06-11 13:59 UTC (permalink / raw)
  To: xfs

On 06/11/2013 05:56 AM, Steve Bergman wrote:
> 4. From the time I write() a bit of data, what's the maximum time before the
> data is actually committed to disk?
>
> 5. Ext4 provides some automatic fsync'ing to avoid the zero-length file
> issue for some common cases via the auto_da_alloc feature added in kernel
> 2.6.30. Does XFS have similar behavior?

I think that here you are talking more about ext3 than ext4.

The answer to both of these - even for ext4 or ext3 - is that unless your 
application and storage is all properly configured, you are effectively at risk 
indefinitely. Chris Mason did a study years ago where he was able to demonstrate 
that dirty data could get pinned in a disk cache effectively indefinitely.  Only 
an fsync() would push that out.

Applications need to use the data integrity hooks in order to have a reliable 
promise that application data is crash safe.  Jeff Moyer wrote up a really nice 
overview of this for lwn which you can find here:

http://lwn.net/Articles/457667

That said, if you have applications that do not do any of this, you can roll the 
dice and use a file system like ext3 that will periodically push data out of the 
page cache for you.

Note that without the barrier mount option, that is not sufficient to push data 
to platter, just moves it down the line to the next potentially volatile cache 
:)  Even then, 4 out of every 5 seconds, your application will be certain to 
lose data if the box crashes while it is writing data. Lots of applications 
don't actually use the file system much (or write much), so ext3's sync 
behaviour helped mask poorly written applications pretty effectively for quite a 
while.

There really is no short cut to doing the job right - your applications need to 
use the correct calls and we all need to configure the file and storage stack 
correctly.

Thanks!

Ric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 13:59 ` Ric Wheeler
@ 2013-06-11 16:12   ` Steve Bergman
  2013-06-11 17:19     ` Ric Wheeler
                       ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Steve Bergman @ 2013-06-11 16:12 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: xfs

In #5 I was specifically talking about ext4. After the 2009 brouhaha
over zero-length files in ext4 with delayed allocation turned on, Ted
merged some patches into vanilla kernel 2,6,30 which mitigated the
problem by recognizing certain common idioms and forcing automatically
forcing an fsync. I'd heard the the XFS team modeled a set of XFS
patches from them.

Regarding #4, I have 12 years experience with my workloads on ext3 and
3 yrs on ext4 and know what I have observed. As a practical matter,
there are large differences between filesystem behaviors which aren't
up for debate since I know my workloads' behavior in the real world
far better than anyone else possibly could. (In fact, I'm not sure how
anyone else could presume to know how my workloads and filesystems
interact.) But if I understand correctly, ext4 at default settings
journals metadata and commits it every 5s, while flushing data every
30s. Ext3 journals metadata, and commits it every 5 seconds, while
effectively flushing data, *immediately before the metadata*, every 5
seconds. so the window in which data and metadata are not in sync is
vanishingly small. Are you saying that with XFS there is no periodic
flushing mechanism at all? And that unless there's an
fsync/fdatasync/sync or the memory needs to be reclaimed, that it can
sit in the page cache forever?

One thing is puzzling me. Everyone is telling me that I must ensure
that fsync/fdatasync is used, even in environments where the concept
doesn't exist. So I've gone to find good examples of how it it used.
Since RHEL6 has been shipping with ext4 as the default for over 2.5
years, I figured it would be a great place to find examples. However,
I've been unable to find examples of fsync or fdatasync being used,
when using "strace -o file.out -f" on various system programs which
one would very much expect to use it. We talked about some Python
config utilities the other day. But now I've moved on to C and C++
code. e.g. "cupsd" copy/truncate/writes the config file
"/etc/cups/printers.conf" quite frequently, all day long. But there is
no sign whatsoever of any fsync or fdatasync when I grep the strace
output file for those strings case insensitively. (And indeed, a
complex printers.conf file turned up zero-length on one of my RHEL6.4
boxes last week.)

So I figured that when rpm installs a new vmlinuz, builds a new
initramfs and puts it into place, and modifies grub.conf, that surely
proper sync'ing must be done in this particularly critical case. But
while I do see rpm fsync/fsync'ing its own database files, it never
seems to fsync/fdatasync the critical system files it just installed
and/or modified. Surely, after over 2 - 1/2 years of Red Hat shipping
RHEL6 to customers, I must be mistaken in some way. Could you point me
to an example in RHEL6.4 where I can see clearly how fsync is being
properly used? In the mean time, I'll keep looking.


Thanks,
Steve



On Tue, Jun 11, 2013 at 8:59 AM, Ric Wheeler <rwheeler@redhat.com> wrote:
> On 06/11/2013 05:56 AM, Steve Bergman wrote:
>>
>> 4. From the time I write() a bit of data, what's the maximum time before
>> the
>> data is actually committed to disk?
>>
>> 5. Ext4 provides some automatic fsync'ing to avoid the zero-length file
>> issue for some common cases via the auto_da_alloc feature added in kernel
>> 2.6.30. Does XFS have similar behavior?
>
>
> I think that here you are talking more about ext3 than ext4.
>
> The answer to both of these - even for ext4 or ext3 - is that unless your
> application and storage is all properly configured, you are effectively at
> risk indefinitely. Chris Mason did a study years ago where he was able to
> demonstrate that dirty data could get pinned in a disk cache effectively
> indefinitely.  Only an fsync() would push that out.
>
> Applications need to use the data integrity hooks in order to have a
> reliable promise that application data is crash safe.  Jeff Moyer wrote up a
> really nice overview of this for lwn which you can find here:
>
> http://lwn.net/Articles/457667
>
> That said, if you have applications that do not do any of this, you can roll
> the dice and use a file system like ext3 that will periodically push data
> out of the page cache for you.
>
> Note that without the barrier mount option, that is not sufficient to push
> data to platter, just moves it down the line to the next potentially
> volatile cache :)  Even then, 4 out of every 5 seconds, your application
> will be certain to lose data if the box crashes while it is writing data.
> Lots of applications don't actually use the file system much (or write
> much), so ext3's sync behaviour helped mask poorly written applications
> pretty effectively for quite a while.
>
> There really is no short cut to doing the job right - your applications need
> to use the correct calls and we all need to configure the file and storage
> stack correctly.
>
> Thanks!
>
> Ric
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 16:12   ` Steve Bergman
@ 2013-06-11 17:19     ` Ric Wheeler
  2013-06-11 17:27       ` Stefan Ring
  2013-06-11 17:28     ` Eric Sandeen
  2013-06-12  8:26     ` Roger Oberholtzer
  2 siblings, 1 reply; 50+ messages in thread
From: Ric Wheeler @ 2013-06-11 17:19 UTC (permalink / raw)
  To: Steve Bergman; +Cc: xfs

On 06/11/2013 12:12 PM, Steve Bergman wrote:
> In #5 I was specifically talking about ext4. After the 2009 brouhaha
> over zero-length files in ext4 with delayed allocation turned on, Ted
> merged some patches into vanilla kernel 2,6,30 which mitigated the
> problem by recognizing certain common idioms and forcing automatically
> forcing an fsync. I'd heard the the XFS team modeled a set of XFS
> patches from them.
>
> Regarding #4, I have 12 years experience with my workloads on ext3 and
> 3 yrs on ext4 and know what I have observed. As a practical matter,
> there are large differences between filesystem behaviors which aren't
> up for debate since I know my workloads' behavior in the real world
> far better than anyone else possibly could. (In fact, I'm not sure how
> anyone else could presume to know how my workloads and filesystems
> interact.) But if I understand correctly, ext4 at default settings
> journals metadata and commits it every 5s, while flushing data every
> 30s. Ext3 journals metadata, and commits it every 5 seconds, while
> effectively flushing data, *immediately before the metadata*, every 5
> seconds. so the window in which data and metadata are not in sync is
> vanishingly small. Are you saying that with XFS there is no periodic
> flushing mechanism at all? And that unless there's an
> fsync/fdatasync/sync or the memory needs to be reclaimed, that it can
> sit in the page cache forever?

I think that you are still missing the bigger point.

Periodic fsync() - done magically under the covers by the file system - does not 
provide any useful data integrity for any serious application.

Let's take a simple example - a database app that does say 30 transactions/sec.

In your example, you are extremely likely to lose up to just shy of 5 seconds of 
"committed" data - way over 100 transactions!  That can be *really* serious 
amounts of data and translate into large financial loss.

In a second example, let's say you are copying data to disk (say a movie) at a 
rate of 50 MB/second.  When the power cut hits at just the wrong time, you will 
have lost a large chunk of that data that has been "written" to disk (over 200MB).

You won't get any serious file system or storage person to go out on a limb on 
this kind of "it mostly kind of works" type of scenario. It just does not cut it 
in the enterprise world.

Hope this is helpful :)

Ric

>
> One thing is puzzling me. Everyone is telling me that I must ensure
> that fsync/fdatasync is used, even in environments where the concept
> doesn't exist. So I've gone to find good examples of how it it used.
> Since RHEL6 has been shipping with ext4 as the default for over 2.5
> years, I figured it would be a great place to find examples. However,
> I've been unable to find examples of fsync or fdatasync being used,
> when using "strace -o file.out -f" on various system programs which
> one would very much expect to use it. We talked about some Python
> config utilities the other day. But now I've moved on to C and C++
> code. e.g. "cupsd" copy/truncate/writes the config file
> "/etc/cups/printers.conf" quite frequently, all day long. But there is
> no sign whatsoever of any fsync or fdatasync when I grep the strace
> output file for those strings case insensitively. (And indeed, a
> complex printers.conf file turned up zero-length on one of my RHEL6.4
> boxes last week.)
>
> So I figured that when rpm installs a new vmlinuz, builds a new
> initramfs and puts it into place, and modifies grub.conf, that surely
> proper sync'ing must be done in this particularly critical case. But
> while I do see rpm fsync/fsync'ing its own database files, it never
> seems to fsync/fdatasync the critical system files it just installed
> and/or modified. Surely, after over 2 - 1/2 years of Red Hat shipping
> RHEL6 to customers, I must be mistaken in some way. Could you point me
> to an example in RHEL6.4 where I can see clearly how fsync is being
> properly used? In the mean time, I'll keep looking.
>
>
> Thanks,
> Steve
>
>
>
> On Tue, Jun 11, 2013 at 8:59 AM, Ric Wheeler <rwheeler@redhat.com> wrote:
>> On 06/11/2013 05:56 AM, Steve Bergman wrote:
>>> 4. From the time I write() a bit of data, what's the maximum time before
>>> the
>>> data is actually committed to disk?
>>>
>>> 5. Ext4 provides some automatic fsync'ing to avoid the zero-length file
>>> issue for some common cases via the auto_da_alloc feature added in kernel
>>> 2.6.30. Does XFS have similar behavior?
>>
>> I think that here you are talking more about ext3 than ext4.
>>
>> The answer to both of these - even for ext4 or ext3 - is that unless your
>> application and storage is all properly configured, you are effectively at
>> risk indefinitely. Chris Mason did a study years ago where he was able to
>> demonstrate that dirty data could get pinned in a disk cache effectively
>> indefinitely.  Only an fsync() would push that out.
>>
>> Applications need to use the data integrity hooks in order to have a
>> reliable promise that application data is crash safe.  Jeff Moyer wrote up a
>> really nice overview of this for lwn which you can find here:
>>
>> http://lwn.net/Articles/457667
>>
>> That said, if you have applications that do not do any of this, you can roll
>> the dice and use a file system like ext3 that will periodically push data
>> out of the page cache for you.
>>
>> Note that without the barrier mount option, that is not sufficient to push
>> data to platter, just moves it down the line to the next potentially
>> volatile cache :)  Even then, 4 out of every 5 seconds, your application
>> will be certain to lose data if the box crashes while it is writing data.
>> Lots of applications don't actually use the file system much (or write
>> much), so ext3's sync behaviour helped mask poorly written applications
>> pretty effectively for quite a while.
>>
>> There really is no short cut to doing the job right - your applications need
>> to use the correct calls and we all need to configure the file and storage
>> stack correctly.
>>
>> Thanks!
>>
>> Ric
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 17:19     ` Ric Wheeler
@ 2013-06-11 17:27       ` Stefan Ring
  2013-06-11 17:31         ` Ric Wheeler
  2013-06-11 17:59         ` Ben Myers
  0 siblings, 2 replies; 50+ messages in thread
From: Stefan Ring @ 2013-06-11 17:27 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Steve Bergman, Linux fs XFS

> Let's take a simple example - a database app that does say 30
> transactions/sec.
>
> In your example, you are extremely likely to lose up to just shy of 5
> seconds of "committed" data - way over 100 transactions!  That can be
> *really* serious amounts of data and translate into large financial loss.

Every database software will do the flushing correctly.

> In a second example, let's say you are copying data to disk (say a movie) at
> a rate of 50 MB/second.  When the power cut hits at just the wrong time, you
> will have lost a large chunk of that data that has been "written" to disk
> (over 200MB).

But why would anyone care about that? I know that the system went down
while copying this large movie, so I'll just copy it again.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 16:12   ` Steve Bergman
  2013-06-11 17:19     ` Ric Wheeler
@ 2013-06-11 17:28     ` Eric Sandeen
  2013-06-11 19:17       ` Steve Bergman
  2013-07-22 14:59       ` Steve Bergman
  2013-06-12  8:26     ` Roger Oberholtzer
  2 siblings, 2 replies; 50+ messages in thread
From: Eric Sandeen @ 2013-06-11 17:28 UTC (permalink / raw)
  To: Steve Bergman; +Cc: Ric Wheeler, xfs

On 6/11/13 11:12 AM, Steve Bergman wrote:
> In #5 I was specifically talking about ext4. After the 2009 brouhaha
> over zero-length files in ext4 with delayed allocation turned on, Ted
> merged some patches into vanilla kernel 2,6,30 which mitigated the
> problem by recognizing certain common idioms and forcing automatically
> forcing an fsync. I'd heard the the XFS team modeled a set of XFS
> patches from them.

Assuming we're talking about the same behaviors, XFS resolved this issue
in May 2007, in the 2.6.22 kernel, commit ba87ea6, over a year before
ext4 even had delayed allocation working. ext4 added the flush-on-close
heuristic in 2009, commit 7d8f9f7.

> Regarding #4, I have 12 years experience with my workloads on ext3 and
> 3 yrs on ext4 and know what I have observed. As a practical matter,
> there are large differences between filesystem behaviors which aren't
> up for debate since I know my workloads' behavior in the real world
> far better than anyone else possibly could. (In fact, I'm not sure how
> anyone else could presume to know how my workloads and filesystems
> interact.) But if I understand correctly, ext4 at default settings
> journals metadata and commits it every 5s, while flushing data every
> 30s. Ext3 journals metadata, and commits it every 5 seconds, while
> effectively flushing data, *immediately before the metadata*, every 5
> seconds. so the window in which data and metadata are not in sync is
> vanishingly small. Are you saying that with XFS there is no periodic
> flushing mechanism at all? And that unless there's an
> fsync/fdatasync/sync or the memory needs to be reclaimed, that it can
> sit in the page cache forever?

No.

By and large, buffered IO in a filesystem is flushed out by the vm,
due to either age or memory pressure.  The filesystem then responds
to these requests by the VM, writing data as requested.

You can read all about it in
Documentation/sysctl/vm.txt but see dirty_expire_centisecs and
dirty_writeback_centisecs - flushers wake up every 30s and push on
data more than 5s old, by default.

ext3 is somewhat unique in data=ordered metadata logging driving
data flushing, IMHO.

> One thing is puzzling me. Everyone is telling me that I must ensure
> that fsync/fdatasync is used, even in environments where the concept
> doesn't exist. So I've gone to find good examples of how it it used.
> Since RHEL6 has been shipping with ext4 as the default for over 2.5
> years, I figured it would be a great place to find examples. However,
> I've been unable to find examples of fsync or fdatasync being used,
> when using "strace -o file.out -f" on various system programs which
> one would very much expect to use it.

Whether or not an application *uses* fsync is orthogonal to whether
or not it's *needed* to ensure persistence.  Obviously you don't need
to fsync every IO as soon as its issued.  And there are buggy
applications, yes.  It's up to the app to decide what needs to be
persistent and when.  See the "When Should You Fsync?" section
in the URL below.

> We talked about some Python
> config utilities the other day. But now I've moved on to C and C++
> code. e.g. "cupsd" copy/truncate/writes the config file
> "/etc/cups/printers.conf" quite frequently, all day long. But there is
> no sign whatsoever of any fsync or fdatasync when I grep the strace
> output file for those strings case insensitively. (And indeed, a
> complex printers.conf file turned up zero-length on one of my RHEL6.4
> boxes last week.)

I'd file a bug against cups, then.

> So I figured that when rpm installs a new vmlinuz, builds a new
> initramfs and puts it into place, and modifies grub.conf, that surely
> proper sync'ing must be done in this particularly critical case. But
> while I do see rpm fsync/fsync'ing its own database files, it never
> seems to fsync/fdatasync the critical system files it just installed
> and/or modified. Surely, after over 2 - 1/2 years of Red Hat shipping
> RHEL6 to customers, I must be mistaken in some way. Could you point me
> to an example in RHEL6.4 where I can see clearly how fsync is being
> properly used? In the mean time, I'll keep looking.

database packages would get it right, I hope.

See also http://lwn.net/Articles/457667/, "Ensuring data reaches disk"

-Eric

> Thanks,
> Steve
> 
> 
> 
> On Tue, Jun 11, 2013 at 8:59 AM, Ric Wheeler <rwheeler@redhat.com> wrote:
>> On 06/11/2013 05:56 AM, Steve Bergman wrote:
>>>
>>> 4. From the time I write() a bit of data, what's the maximum time before
>>> the
>>> data is actually committed to disk?
>>>
>>> 5. Ext4 provides some automatic fsync'ing to avoid the zero-length file
>>> issue for some common cases via the auto_da_alloc feature added in kernel
>>> 2.6.30. Does XFS have similar behavior?
>>
>>
>> I think that here you are talking more about ext3 than ext4.
>>
>> The answer to both of these - even for ext4 or ext3 - is that unless your
>> application and storage is all properly configured, you are effectively at
>> risk indefinitely. Chris Mason did a study years ago where he was able to
>> demonstrate that dirty data could get pinned in a disk cache effectively
>> indefinitely.  Only an fsync() would push that out.
>>
>> Applications need to use the data integrity hooks in order to have a
>> reliable promise that application data is crash safe.  Jeff Moyer wrote up a
>> really nice overview of this for lwn which you can find here:
>>
>> http://lwn.net/Articles/457667
>>
>> That said, if you have applications that do not do any of this, you can roll
>> the dice and use a file system like ext3 that will periodically push data
>> out of the page cache for you.
>>
>> Note that without the barrier mount option, that is not sufficient to push
>> data to platter, just moves it down the line to the next potentially
>> volatile cache :)  Even then, 4 out of every 5 seconds, your application
>> will be certain to lose data if the box crashes while it is writing data.
>> Lots of applications don't actually use the file system much (or write
>> much), so ext3's sync behaviour helped mask poorly written applications
>> pretty effectively for quite a while.
>>
>> There really is no short cut to doing the job right - your applications need
>> to use the correct calls and we all need to configure the file and storage
>> stack correctly.
>>
>> Thanks!
>>
>> Ric
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 17:27       ` Stefan Ring
@ 2013-06-11 17:31         ` Ric Wheeler
  2013-06-11 17:41           ` Stefan Ring
  2013-06-11 19:30           ` Steve Bergman
  2013-06-11 17:59         ` Ben Myers
  1 sibling, 2 replies; 50+ messages in thread
From: Ric Wheeler @ 2013-06-11 17:31 UTC (permalink / raw)
  To: Stefan Ring; +Cc: Steve Bergman, Linux fs XFS

On 06/11/2013 01:27 PM, Stefan Ring wrote:
>> Let's take a simple example - a database app that does say 30
>> transactions/sec.
>>
>> In your example, you are extremely likely to lose up to just shy of 5
>> seconds of "committed" data - way over 100 transactions!  That can be
>> *really* serious amounts of data and translate into large financial loss.
> Every database software will do the flushing correctly.

Stefan, you are making my point because every database will do the right thing, 
it won't rely on ext3's magic every 5 second fsync :)

Ric

>
>> In a second example, let's say you are copying data to disk (say a movie) at
>> a rate of 50 MB/second.  When the power cut hits at just the wrong time, you
>> will have lost a large chunk of that data that has been "written" to disk
>> (over 200MB).
> But why would anyone care about that? I know that the system went down
> while copying this large movie, so I'll just copy it again.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 17:31         ` Ric Wheeler
@ 2013-06-11 17:41           ` Stefan Ring
  2013-06-11 18:03             ` Eric Sandeen
  2013-06-11 19:30           ` Steve Bergman
  1 sibling, 1 reply; 50+ messages in thread
From: Stefan Ring @ 2013-06-11 17:41 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Steve Bergman, Linux fs XFS

>> Every database software will do the flushing correctly.
>
>
> Stefan, you are making my point because every database will do the right
> thing, it won't rely on ext3's magic every 5 second fsync :)

Ok, let us agree on this one then ;)

There is still kind of a dichotomy regarding the entire issue. While
in many cases, noone will complain about a few  seconds of lost data,
it is an entirely different beast if you lose an entire file that may
have been added to and maintained over month/years, just because the
system went down at an unfortunate moment. In my opinion, having a
consistent, yet possibly somewhat outdated file, is much more
important than having the most recent view in an inconsistent/broken
state.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 17:27       ` Stefan Ring
  2013-06-11 17:31         ` Ric Wheeler
@ 2013-06-11 17:59         ` Ben Myers
  1 sibling, 0 replies; 50+ messages in thread
From: Ben Myers @ 2013-06-11 17:59 UTC (permalink / raw)
  To: Stefan Ring; +Cc: Steve Bergman, Ric Wheeler, Linux fs XFS

Hey Stefan,

On Tue, Jun 11, 2013 at 07:27:59PM +0200, Stefan Ring wrote:
> > In a second example, let's say you are copying data to disk (say a movie) at
> > a rate of 50 MB/second.  When the power cut hits at just the wrong time, you
> > will have lost a large chunk of that data that has been "written" to disk
> > (over 200MB).
> 
> But why would anyone care about that? I know that the system went down
> while copying this large movie, so I'll just copy it again.

You don't always have enough storage to keep that first copy around
indefinitely, so you want to have some guarantees about whether the 2nd
copy has made it to the platter before you allow the first one to be
overwritten.

e.g. You could have a set of remote closed circuit cameras with limited
local storage and want to transfer frames from them to a central
location without losing any.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 17:41           ` Stefan Ring
@ 2013-06-11 18:03             ` Eric Sandeen
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Sandeen @ 2013-06-11 18:03 UTC (permalink / raw)
  To: Stefan Ring; +Cc: Steve Bergman, Ric Wheeler, Linux fs XFS

On 6/11/13 12:41 PM, Stefan Ring wrote:
>>> Every database software will do the flushing correctly.
>>
>>
>> Stefan, you are making my point because every database will do the right
>> thing, it won't rely on ext3's magic every 5 second fsync :)
> 
> Ok, let us agree on this one then ;)
> 
> There is still kind of a dichotomy regarding the entire issue. While
> in many cases, noone will complain about a few  seconds of lost data,
> it is an entirely different beast if you lose an entire file that may
> have been added to and maintained over month/years, just because the
> system went down at an unfortunate moment. In my opinion, having a
> consistent, yet possibly somewhat outdated file, is much more
> important than having the most recent view in an inconsistent/broken
> state.

But the application needs to watch out for that too.  Appending to, or
overwriting part of a file, should never result in whole-file loss on a crash,
although it may lead to data inconsistency.  If you truncate the file and rewrite,
there is more risk.

If you want atomic updates so that you get either-old-or-new-version,
that can be accomplished.

>From the LWN article we keep linking to ;)

Similarly, if you encounter a system failure (such as power loss, ENOSPC or an I/O error) while overwriting a file, it can result in the loss of existing data. To avoid this problem, it is common practice (and advisable) to write the updated data to a temporary file, ensure that it is safe on stable storage, then rename the temporary file to the original file name (thus replacing the contents). This ensures an atomic update of the file, so that other readers get one copy of the data or another. The following steps are required to perform this type of update:

    create a new temp file (on the same file system!)
    write data to the temp file
    fsync() the temp file
    rename the temp file to the appropriate name
    fsync() the containing directory 

-Eric


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 17:28     ` Eric Sandeen
@ 2013-06-11 19:17       ` Steve Bergman
  2013-06-11 21:47         ` Dave Chinner
  2013-07-22 14:59       ` Steve Bergman
  1 sibling, 1 reply; 50+ messages in thread
From: Steve Bergman @ 2013-06-11 19:17 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Ric Wheeler, xfs

Eric,

Thank you for the straight answers to my questions, and for not trying
to argue with me about my real life experiences. I honestly appreciate
that.

All this information was very helpful. I've always been fuzzy about
the interaction between filesystem mount options, like commit=5, and
the vm tunables. So am I correct in thinking that
dirty_writeback_centisecs tells pdflush how often to wake up and look
around for data to flush. That the commit=5 mount parameter to extX
says "flush the journaled metadata to disk every 5 seconds, unless the
vm tells us to do it before that". And If I set commit=45, I'd still
get 30-35 second metadata flushes due to dirty_expire_centisecs=3000
and dirty_writeback_centisecs=500? And that that actual data will also
gets flusehd at least every 30-35 seconds for the same reasons?  And
that those vm parameters basically apply to XFS in the same way?
(Except XFS has some setting to flush metadata more frequently, I'd
guess.

I follow LWN, even though I rarely post anymore. I'd read that article
before. But I reviewed it again on (your?) recommendation, yesterday.
I think Ric recommended it to me, as well.

Now about ext3 behavior. (Sorry, I know this is an XFS list, but I've
been dying to have this clarified. And this does kind of relate to
XFS.) Not only does data get flushed more frequently. But it gets
careful treatment. It gets written out immediately before the
metadata. But both are (almost?) always written at almost exactly the
same time. Obviously, this comes with a significant performance price
tag for the whole system, which understandably upsets a lot of people.
But more than just the more frequent data updates, doesn't that
contribute substantially to ext3's pony-magic? The reason that I'm
particularly interested in this is that I don't really care that much
whether I lose 5 seconds of data or 30 seconds of data. That doesn't
make much difference. What I do care about is losing a whole day's
worth of data. And I have definitely observed a difference here
between ext3 & ext4. (And I'm assuming XFS would act very much like
ext4 in this context.) What happens with ext4, but *never* in my
experience with ext3 (though I acknowledge that it's *possible* and
just a matter of relative likelihood) is that I end up with C/ISAM
(not database!) files that were being appended to that are not just
missing data at the end, but are actually corrupted. Sometimes in such
a way that the rebuild utility cannot repair them and everything is
lost. This is not common. But I've seen it happen a couple of times
since I've been using ext4. Is the careful ordering, and the fact that
metadata is always written at the same time (and just after the data)
in ext3 likely to be responsible for that difference?

-Steve

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 17:31         ` Ric Wheeler
  2013-06-11 17:41           ` Stefan Ring
@ 2013-06-11 19:30           ` Steve Bergman
  2013-06-11 21:03             ` Dave Chinner
  1 sibling, 1 reply; 50+ messages in thread
From: Steve Bergman @ 2013-06-11 19:30 UTC (permalink / raw)
  To: Ric Wheeler; +Cc: Stefan Ring, Linux fs XFS

Thanks. But I'm specifically *not* talking about database apps, I'm
talking about non-database applications written in languages that
don't even have a concept of fsync. I understand that PostgreSQL would
be perfectly fine on pretty much any unix filesystem short of tmpfs.
But PostgreSQL is just a small part of my mixed workloads. I wish
everything I had were PostgreSQL. But I have to deal with reality. My
only reasonable migration path for getting everything backed by a
database is to upgrade to the latest version of our software, replace
all the Linux workstations with Windows, replace the Linux servers
with Windows Server 2008 or later, and install Microsoft SQL Server at
all locations. I'm sticking with the current Cobol app which is quite
stable and reliable when running on ext3. Especially as I keep hearing
that the newer Windows-only version doesn't work particularly well.


-Steve

On Tue, Jun 11, 2013 at 12:31 PM, Ric Wheeler <rwheeler@redhat.com> wrote:
> On 06/11/2013 01:27 PM, Stefan Ring wrote:
>>>
>>> Let's take a simple example - a database app that does say 30
>>> transactions/sec.
>>>
>>> In your example, you are extremely likely to lose up to just shy of 5
>>> seconds of "committed" data - way over 100 transactions!  That can be
>>> *really* serious amounts of data and translate into large financial loss.
>>
>> Every database software will do the flushing correctly.
>
>
> Stefan, you are making my point because every database will do the right
> thing, it won't rely on ext3's magic every 5 second fsync :)
>
> Ric
>
>>
>>> In a second example, let's say you are copying data to disk (say a movie)
>>> at
>>> a rate of 50 MB/second.  When the power cut hits at just the wrong time,
>>> you
>>> will have lost a large chunk of that data that has been "written" to disk
>>> (over 200MB).
>>
>> But why would anyone care about that? I know that the system went down
>> while copying this large movie, so I'll just copy it again.
>
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11  9:56 Questions about XFS Steve Bergman
                   ` (3 preceding siblings ...)
  2013-06-11 13:59 ` Ric Wheeler
@ 2013-06-11 19:35 ` Ben Myers
  2013-06-11 19:55   ` Steve Bergman
  4 siblings, 1 reply; 50+ messages in thread
From: Ben Myers @ 2013-06-11 19:35 UTC (permalink / raw)
  To: Steve Bergman; +Cc: linux-xfs

Hey Steve,

On Tue, Jun 11, 2013 at 09:56:38AM +0000, Steve Bergman wrote:
> I have a few questions about XFS that didn't make the XFS FAQ. I'm trying to
> get a feel for where I might want to use it on my servers (or at home). A mix
> of ext3 & ext4 has worked well for me. But I'd like to get to know XFS a bit
> better. The target OS would be RHEL6.

"If it ain't broke don't fix it."

It's good that you look before you jump.

> 9. I there something else I should have thought to ask?

* How do I back it up?

I'm always surprised at how many people neglect backups and then run into
trouble.  xfsdump is the preferred method.

* Where can I learn more?

http://www.xfs.org/index.php/XFS_Papers_and_Documentation

The user guide link is a good place to get started.

My personal favorite link on this page is to the original design
documents from '93 (at the bottom).  If you ever have a 'why the heck
did they do it this way' moment with XFS, this is the place to go.

Not the least of the arguments in favor of using XFS is that we have a
rich and active community.  You can also get professional support from a
variety of vendors.

Regards,
Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 19:35 ` Ben Myers
@ 2013-06-11 19:55   ` Steve Bergman
  2013-06-11 20:08     ` Ben Myers
  2013-06-11 21:57     ` Matthias Schniedermeyer
  0 siblings, 2 replies; 50+ messages in thread
From: Steve Bergman @ 2013-06-11 19:55 UTC (permalink / raw)
  To: Ben Myers; +Cc: linux-xfs

Hi Ben,

I'm setting up a new server right now, and I'm breaking it up into a
number of different LV's for flexibility. I've gone through a number
of possibilities, and I think it's going to look something like this:

/ - Ext4
/root - Ext4
/home - Ext4
/var - Ext4
/usr/local - Ext4
/usr/local/worktmp - XFS (Very intensive & time-consuming random
writes, here. But all files are temporary work files. XFS is
performing well here in my testing.)
/usr/local/data - Ext3 (Cobol doesn't know about fsync. I need
pony-magic in this LV.)

Backup on the /usr/local/worktmp is not critical, due to its nature.
But can't I just lvm2-snapshot an XFS LV and rsync that to USB drive
like I do everything else?

-Steve

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 19:55   ` Steve Bergman
@ 2013-06-11 20:08     ` Ben Myers
  2013-06-11 21:57     ` Matthias Schniedermeyer
  1 sibling, 0 replies; 50+ messages in thread
From: Ben Myers @ 2013-06-11 20:08 UTC (permalink / raw)
  To: Steve Bergman; +Cc: linux-xfs

Hey Steve,

On Tue, Jun 11, 2013 at 02:55:09PM -0500, Steve Bergman wrote:
> I'm setting up a new server right now, and I'm breaking it up into a
> number of different LV's for flexibility. I've gone through a number
> of possibilities, and I think it's going to look something like this:
> 
> / - Ext4
> /root - Ext4
> /home - Ext4
> /var - Ext4
> /usr/local - Ext4
> /usr/local/worktmp - XFS (Very intensive & time-consuming random
> writes, here. But all files are temporary work files. XFS is
> performing well here in my testing.)

Good.

> /usr/local/data - Ext3 (Cobol doesn't know about fsync. I need
> pony-magic in this LV.)

You might be better served (with whatever filesystem you are using) by
looking into whether you can teach cobol about fsync.  I've never worked
with cobol so I don't know if it can be done.  Based upon a quick web
search I suspect that it may be possible.

> Backup on the /usr/local/worktmp is not critical, due to its nature.
> But can't I just lvm2-snapshot an XFS LV and rsync that to USB drive
> like I do everything else?

Snapshot and rsync should work fine with xfs.  

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 19:30           ` Steve Bergman
@ 2013-06-11 21:03             ` Dave Chinner
  2013-06-11 21:43               ` Steve Bergman
  0 siblings, 1 reply; 50+ messages in thread
From: Dave Chinner @ 2013-06-11 21:03 UTC (permalink / raw)
  To: Steve Bergman; +Cc: Stefan Ring, Ric Wheeler, Linux fs XFS

On Tue, Jun 11, 2013 at 02:30:36PM -0500, Steve Bergman wrote:
> Thanks. But I'm specifically *not* talking about database apps, I'm
> talking about non-database applications written in languages that
> don't even have a concept of fsync.

I get it. You want a pony, and you don't want to pay anything for
it.

Any language is fundamentally broken if it has no concept and/or
method for ensuring data integrity for data that is written through
it. If you have such a language and you need data integrity then
your only filesystem option for guaranteeing no data loss is
synchronous writes and metadata updates. 

XFS allows you to minimise the impact of such legacy languages and
applications to just the data sets those applications use through
the use of 'chattr -S' and the /proc/sys/fs/xfs/xfs_inherit_sync
sysctl....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 21:03             ` Dave Chinner
@ 2013-06-11 21:43               ` Steve Bergman
  0 siblings, 0 replies; 50+ messages in thread
From: Steve Bergman @ 2013-06-11 21:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Stefan Ring, Ric Wheeler, Linux fs XFS

> I get it. You want a pony, and you don't want to pay anything for
it.

Hi Dave,

Not at all. I don't mind incurring a performance penalty in that LV.
Ext3 performance is quite acceptable, with the exception of a single
maintenance operation which I perform periodically (which can be quite
slow, indeed, on the larger files. Very intensive random writes.)
Fortunately, the filesystem intensive part of that operation occurs in
its own work directory. The files in that directory are just temporary
work files. And at the end of the processing, the resulting files get
copied back to the main data directory tree.  XFS performs extremely
well for this operation. For about a dozen years, I've been using Ext3
for the whole thing, and the resiliency has been much more than just
adequate. (If I needed more, I'd mount ext3 data=journal; I can't
imagine mounting synchronously.) But for this new server, and probably
future ones, I'll be using Ext3 for the permanent data, and XFS for
the work directory. It makes a huge difference for that one operation,
and leverages the strengths of both filesystems to yield something
more appropriate for the workload than either one alone. As Ted is
fond of noting, one of Linux's greatest strengths is the variety of
filesystems it offers. And sometimes filesystem performance just
doesn't matter.

-Steve

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 19:17       ` Steve Bergman
@ 2013-06-11 21:47         ` Dave Chinner
  0 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2013-06-11 21:47 UTC (permalink / raw)
  To: Steve Bergman; +Cc: Ric Wheeler, Eric Sandeen, xfs

On Tue, Jun 11, 2013 at 02:17:16PM -0500, Steve Bergman wrote:
> Now about ext3 behavior. (Sorry, I know this is an XFS list, but I've
> been dying to have this clarified. And this does kind of relate to
> XFS.) Not only does data get flushed more frequently. But it gets
> careful treatment. It gets written out immediately before the
> metadata.

That's not careful treatment, that's just ordering. XFS has "careful
treatment", too, but it orders data and metadata differently.

XFS doesn't actually update the inode size until after
the data is written i.e. the size is still zero until the data is
written. That's how XFS orders data vs metadata updates - metadata
that is oly valid after data IO completes is not modified until
after the data IO completes...

Hence if you create a file, write some data and then crash before
the data is written, you'll get the create replayed but nothing else.
i.e. a zero length file. Often it is just the inode size transaction
that is missing from the log - the data is actually on disk and *can
be recovered* if you know what you are doing...

IOWs, the thing that causes people to complain about XFS and zero
length files is that XFS leaves *observable evidence* of data loss
behind, while ext3 and ext4 don't. Hence when people switch to XFS
they actually notice (maybe for the first time!) that running their
laptop battery dead leads to data loss and that their applications
are not behaving safely. They then shoot the messenger (i.e XFS)
for losing data as ext3/ext4 don't "lose data" as they don't leave
any observable evidence of partial file states...

Filesystems behave differently, but those observable differences in
behaviour don't mean one filesystem loses more data than another.
Some filesystems just makes it easier to find evidence of such data
loss and hence make it easier to recover from, not to mention make
it obvious which applications write data unsafely and hence
shouldn't be trusted....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 19:55   ` Steve Bergman
  2013-06-11 20:08     ` Ben Myers
@ 2013-06-11 21:57     ` Matthias Schniedermeyer
  2013-06-11 22:18       ` Steve Bergman
  1 sibling, 1 reply; 50+ messages in thread
From: Matthias Schniedermeyer @ 2013-06-11 21:57 UTC (permalink / raw)
  To: Steve Bergman; +Cc: linux-xfs

On 11.06.2013 14:55, Steve Bergman wrote:
> Hi Ben,
> 
> /usr/local/data - Ext3 (Cobol doesn't know about fsync. I need
> pony-magic in this LV.)

You COULD do a little magic external of the program to expedite writing 
to stable storage.

- fsync_test.pl -
#!/usr/bin/perl

use File::Sync qw(fsync);

open my $fh, '<', 'testfile' or die ("Can't open: $!");
fsync($fh) or die ("Can't fsync: $!");
close $fh;
- fsync_test.pl -

- fsync_test.sh -
rm -f testfile
echo -n sync: ; time sync
echo ; echo -n dd: ; time dd if=/dev/zero of=testfile bs=1M count=1k
echo ; echo -n fsync_test.pl: ; time ./fsync_test.pl
echo ; echo -n sync: ; time sync
- fsync_test.sh -

The result on my machine against a HDD:

- result -
sync:
real    0m0.048s
user    0m0.000s
sys     0m0.046s

dd:
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.317354 s, 3.4 GB/s
real    0m0.318s
user    0m0.001s
sys     0m0.316s

fsync_test.pl:
real    0m10.237s
user    0m0.012s
sys     0m0.133s

sync:
real    0m0.047s
user    0m0.000s
sys     0m0.045s
- result -

This tells me that a program can fsync a file i didn't write itself.

I choose 1GB because it can be created fast enough to not trigger any 
automatic writebacks (And with 32GB of RAM it is well within default 
limits) and still long enough to be easily visible with simple 
'time'ing.

With secret ingredient number 2, inotify, you COULD write a program that 
does 'fsync'(s) after the program that can't do it itself. Of course 
that can only expedite matters, it doesn't give you the same consistency 
guarantees as the original program isn't the one waiting for the 
fsync(s) to complete and happily continues dirtying more data in the 
mean time. 


And there also is syncfs which is relativly new (man-page says it was 
introduced with kernel 2.6.39). 'syncfs' is a lighter version of 'sync' 
that only syncs a single filesystem. So you could make a program that 
does a syncfs every second or so (with the help of inotify you could do 
it after something "interesting" was written), without impacting other 
filesystems like 'sync' does.




-- 

Matthias

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 21:57     ` Matthias Schniedermeyer
@ 2013-06-11 22:18       ` Steve Bergman
  0 siblings, 0 replies; 50+ messages in thread
From: Steve Bergman @ 2013-06-11 22:18 UTC (permalink / raw)
  To: Matthias Schniedermeyer; +Cc: linux-xfs

That's a very interesting idea, using inotify. But I'd prefer to keep
it a little simpler. Using ext3 for the main data tree and XFS for the
work file directory really seems perfect. Although it gives me an
option that was never feasible before. And that would be to mount ext3
data=journal. Without XFS to do the heavy listing in the work dir,
that would make certain operations (very) unreasonably slow. But with
the hybrid fs configuration, it would only slow down EOD posting a
bit. I mounted data=journal at one site, for one day the other day,
and nobody complained. That would be the "correct" solution from a
data integrity standpoint. But I've always found ext3 & data=ordered
to be quite adequate for this application.

-Steve

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 16:12   ` Steve Bergman
  2013-06-11 17:19     ` Ric Wheeler
  2013-06-11 17:28     ` Eric Sandeen
@ 2013-06-12  8:26     ` Roger Oberholtzer
  2013-06-12 10:34       ` Ric Wheeler
                         ` (2 more replies)
  2 siblings, 3 replies; 50+ messages in thread
From: Roger Oberholtzer @ 2013-06-12  8:26 UTC (permalink / raw)
  To: xfs

On Tue, 2013-06-11 at 11:12 -0500, Steve Bergman wrote: 
> Are you saying that with XFS there is no periodic
> flushing mechanism at all? And that unless there's an
> fsync/fdatasync/sync or the memory needs to be reclaimed, that it can
> sit in the page cache forever?

I read the later responses to this and they seemed to say that the data
in the page cache should be written to the disk periodically. I am not
meaning to hijack the thread. I just have a question directly related to
this point.

I have an application that is streaming data to an XFS disk at a
sustained 25 MB/sec. This is well below what the hardware supports. The
application does fopen/fwrite/fclose (no active flushing or syncing).

I see that as my application writes data (the only process writing the
only open file on the disk), the system cache grows and grows. Here is
the unusual part: periodically, writes take some number of seconds to
complete, rather than the typical <50 msecs). The increased time seems
to correspond to the increasing size of the page cache.

If I do:

echo 1 > /proc/sys/vm/drop_caches

while the application is runnung, then the writes do not occasionally
take longer. Until the cache grows again, and I do the echo again.

I am sure I must be misinterpreting what I see.

(on openSUSE 12.1. kernel 3.1.0)

--
Roger Oberholtzer

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-12  8:26     ` Roger Oberholtzer
@ 2013-06-12 10:34       ` Ric Wheeler
  2013-06-12 13:52         ` Roger Oberholtzer
  2013-06-12 12:12       ` Stan Hoeppner
  2013-06-13  0:48       ` Dave Chinner
  2 siblings, 1 reply; 50+ messages in thread
From: Ric Wheeler @ 2013-06-12 10:34 UTC (permalink / raw)
  To: xfs

On 06/12/2013 04:26 AM, Roger Oberholtzer wrote:
> On Tue, 2013-06-11 at 11:12 -0500, Steve Bergman wrote:
>> Are you saying that with XFS there is no periodic
>> flushing mechanism at all? And that unless there's an
>> fsync/fdatasync/sync or the memory needs to be reclaimed, that it can
>> sit in the page cache forever?
> I read the later responses to this and they seemed to say that the data
> in the page cache should be written to the disk periodically. I am not
> meaning to hijack the thread. I just have a question directly related to
> this point.

You most likely need to adjust some of the vm tunings to cause the vm to kick 
out pages more evenly. Not sure what the opensuse crowd would suggest tweaking.

>
> I have an application that is streaming data to an XFS disk at a
> sustained 25 MB/sec. This is well below what the hardware supports. The
> application does fopen/fwrite/fclose (no active flushing or syncing).

Sounds like this is more likely to be an application issue than a file system 
one. Can you push the IO write speed up with a simple "dd" test to a file?

Ric
'
>
> I see that as my application writes data (the only process writing the
> only open file on the disk), the system cache grows and grows. Here is
> the unusual part: periodically, writes take some number of seconds to
> complete, rather than the typical <50 msecs). The increased time seems
> to correspond to the increasing size of the page cache.
>
> If I do:
>
> echo 1 > /proc/sys/vm/drop_caches
>
> while the application is runnung, then the writes do not occasionally
> take longer. Until the cache grows again, and I do the echo again.
>
> I am sure I must be misinterpreting what I see.
>
> (on openSUSE 12.1. kernel 3.1.0)
>
> --
> Roger Oberholtzer
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-12  8:26     ` Roger Oberholtzer
  2013-06-12 10:34       ` Ric Wheeler
@ 2013-06-12 12:12       ` Stan Hoeppner
  2013-06-12 13:48         ` Roger Oberholtzer
  2013-06-13  0:48       ` Dave Chinner
  2 siblings, 1 reply; 50+ messages in thread
From: Stan Hoeppner @ 2013-06-12 12:12 UTC (permalink / raw)
  To: xfs

On 6/12/2013 3:26 AM, Roger Oberholtzer wrote:
...
> I have an application that is streaming data to an XFS disk at a
> sustained 25 MB/sec. This is well below what the hardware supports. The
> application does fopen/fwrite/fclose (no active flushing or syncing).

Buffered IO.

> I see that as my application writes data (the only process writing the
> only open file on the disk), the system cache grows and grows. Here is
> the unusual part: periodically, writes take some number of seconds to
> complete, rather than the typical <50 msecs). The increased time seems
> to correspond to the increasing size of the page cache.

Standard Linux buffered IO behavior.  Note this is not XFS specific.

> If I do:
> 
> echo 1 > /proc/sys/vm/drop_caches

Dumps the page cache forcing your buffered writes to disk.

> while the application is runnung, then the writes do not occasionally
> take longer. Until the cache grows again, and I do the echo again.

Which seems a bit laborious.

> I am sure I must be misinterpreting what I see.

Nope.  The Linux virtual memory system has behaved this way for quite
some time.  You can teak how long IOs stay in cache.  See dirty_* at
https://www.kernel.org/doc/Documentation/sysctl/vm.txt

Given the streaming nature you describe, have you looked at possibly
using O_DIRECT?

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-12 12:12       ` Stan Hoeppner
@ 2013-06-12 13:48         ` Roger Oberholtzer
  0 siblings, 0 replies; 50+ messages in thread
From: Roger Oberholtzer @ 2013-06-12 13:48 UTC (permalink / raw)
  To: xfs

On Wed, 2013-06-12 at 07:12 -0500, Stan Hoeppner wrote:
> On 6/12/2013 3:26 AM, Roger Oberholtzer wrote:
> ...
> > I have an application that is streaming data to an XFS disk at a
> > sustained 25 MB/sec. This is well below what the hardware supports. The
> > application does fopen/fwrite/fclose (no active flushing or syncing).
> 
> Buffered IO.
> 
> > I see that as my application writes data (the only process writing the
> > only open file on the disk), the system cache grows and grows. Here is
> > the unusual part: periodically, writes take some number of seconds to
> > complete, rather than the typical <50 msecs). The increased time seems
> > to correspond to the increasing size of the page cache.
> 
> Standard Linux buffered IO behavior.  Note this is not XFS specific.

That is correct. But users of XFS, like any others, may experience this
and have a solution. And it seemed to be related the the question in the
original post.

> > If I do:
> > 
> > echo 1 > /proc/sys/vm/drop_caches
> 
> Dumps the page cache forcing your buffered writes to disk.

The interesting thing is that when this is done, and the 3 or 4 GB of
cache goes away, it seems rather quick. Like the pages are not
containing data that must be written. But if that is the case, why the
increasingly long periodic write delays as the cache gets bigger?

> > while the application is runnung, then the writes do not occasionally
> > take longer. Until the cache grows again, and I do the echo again.
> 
> Which seems a bit laborious.
> 
> > I am sure I must be misinterpreting what I see.
> 
> Nope.  The Linux virtual memory system has behaved this way for quite
> some time.  You can teak how long IOs stay in cache.  See dirty_* at
> https://www.kernel.org/doc/Documentation/sysctl/vm.txt

> 
> Given the streaming nature you describe, have you looked at possibly
> using O_DIRECT?

I would really like to avoid this if possible. The data is not in
uniform chunks, so it would need to be buffered in the app to make it
so. The system can obviously keep up with the data rate - as long as it
does not get greedy with all that RAM just sitting there...

I have been thinking that I may need to do an occasional
fflush/fdatasync to be sure the write cache stays reasonably small.


Yours sincerely,

Roger Oberholtzer

Ramböll RST / Systems

Office: Int +46 10-615 60 20
Mobile: Int +46 70-815 1696
roger.oberholtzer@ramboll.se
________________________________________

Ramböll Sverige AB
Krukmakargatan 21
P.O. Box 17009
SE-104 62 Stockholm, Sweden
www.rambollrst.se


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-12 10:34       ` Ric Wheeler
@ 2013-06-12 13:52         ` Roger Oberholtzer
  0 siblings, 0 replies; 50+ messages in thread
From: Roger Oberholtzer @ 2013-06-12 13:52 UTC (permalink / raw)
  To: xfs

On Wed, 2013-06-12 at 06:34 -0400, Ric Wheeler wrote:
> On 06/12/2013 04:26 AM, Roger Oberholtzer wrote:
> > On Tue, 2013-06-11 at 11:12 -0500, Steve Bergman wrote:
> >> Are you saying that with XFS there is no periodic
> >> flushing mechanism at all? And that unless there's an
> >> fsync/fdatasync/sync or the memory needs to be reclaimed, that it can
> >> sit in the page cache forever?
> > I read the later responses to this and they seemed to say that the data
> > in the page cache should be written to the disk periodically. I am not
> > meaning to hijack the thread. I just have a question directly related to
> > this point.
> 
> You most likely need to adjust some of the vm tunings to cause the vm to kick 
> out pages more evenly. Not sure what the opensuse crowd would suggest tweaking.

In progress. 

> > I have an application that is streaming data to an XFS disk at a
> > sustained 25 MB/sec. This is well below what the hardware supports. The
> > application does fopen/fwrite/fclose (no active flushing or syncing).
> 
> Sounds like this is more likely to be an application issue than a file system 
> one. Can you push the IO write speed up with a simple "dd" test to a file?

I have a small test app that can either write at full speed (100 MB/Sec
as it turns out on the system in question) or at the app rate of 25
MB/Sec. When at full speed, the cache becomes bigger faster. But the
result is the same. Memory eventually seems to be all taken by the
cache. It is freed when I umount the file system or "echo 1
> /proc/sys/vm/drop_caches"


Yours sincerely,

Roger Oberholtzer

Ramböll RST / Systems

Office: Int +46 10-615 60 20
Mobile: Int +46 70-815 1696
roger.oberholtzer@ramboll.se
________________________________________

Ramböll Sverige AB
Krukmakargatan 21
P.O. Box 17009
SE-104 62 Stockholm, Sweden
www.rambollrst.se


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-12  8:26     ` Roger Oberholtzer
  2013-06-12 10:34       ` Ric Wheeler
  2013-06-12 12:12       ` Stan Hoeppner
@ 2013-06-13  0:48       ` Dave Chinner
  2 siblings, 0 replies; 50+ messages in thread
From: Dave Chinner @ 2013-06-13  0:48 UTC (permalink / raw)
  To: Roger Oberholtzer; +Cc: xfs

On Wed, Jun 12, 2013 at 10:26:51AM +0200, Roger Oberholtzer wrote:
> On Tue, 2013-06-11 at 11:12 -0500, Steve Bergman wrote: 
> > Are you saying that with XFS there is no periodic
> > flushing mechanism at all? And that unless there's an
> > fsync/fdatasync/sync or the memory needs to be reclaimed, that it can
> > sit in the page cache forever?
> 
> I read the later responses to this and they seemed to say that the data
> in the page cache should be written to the disk periodically. I am not
> meaning to hijack the thread. I just have a question directly related to
> this point.
> 
> I have an application that is streaming data to an XFS disk at a
> sustained 25 MB/sec. This is well below what the hardware supports. The
> application does fopen/fwrite/fclose (no active flushing or syncing).

fadvise(DONTNEED)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-06-11 17:28     ` Eric Sandeen
  2013-06-11 19:17       ` Steve Bergman
@ 2013-07-22 14:59       ` Steve Bergman
  2013-07-22 15:16         ` Steve Bergman
  1 sibling, 1 reply; 50+ messages in thread
From: Steve Bergman @ 2013-07-22 14:59 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Ric Wheeler, Linux fs XFS

"I'd file a bug against cups, then."

I have. Take a look at bug #984883 and see what I'm running up
against. Maybe you would have some suggestions as to what is the right
thing to do. And perhaps Jiri would listen to you. I'd appreciate any
aid in getting this very embarrassing RHEL6 bug fixed.

-Steve

On Tue, Jun 11, 2013 at 12:28 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 6/11/13 11:12 AM, Steve Bergman wrote:
>> In #5 I was specifically talking about ext4. After the 2009 brouhaha
>> over zero-length files in ext4 with delayed allocation turned on, Ted
>> merged some patches into vanilla kernel 2,6,30 which mitigated the
>> problem by recognizing certain common idioms and forcing automatically
>> forcing an fsync. I'd heard the the XFS team modeled a set of XFS
>> patches from them.
>
> Assuming we're talking about the same behaviors, XFS resolved this issue
> in May 2007, in the 2.6.22 kernel, commit ba87ea6, over a year before
> ext4 even had delayed allocation working. ext4 added the flush-on-close
> heuristic in 2009, commit 7d8f9f7.
>
>> Regarding #4, I have 12 years experience with my workloads on ext3 and
>> 3 yrs on ext4 and know what I have observed. As a practical matter,
>> there are large differences between filesystem behaviors which aren't
>> up for debate since I know my workloads' behavior in the real world
>> far better than anyone else possibly could. (In fact, I'm not sure how
>> anyone else could presume to know how my workloads and filesystems
>> interact.) But if I understand correctly, ext4 at default settings
>> journals metadata and commits it every 5s, while flushing data every
>> 30s. Ext3 journals metadata, and commits it every 5 seconds, while
>> effectively flushing data, *immediately before the metadata*, every 5
>> seconds. so the window in which data and metadata are not in sync is
>> vanishingly small. Are you saying that with XFS there is no periodic
>> flushing mechanism at all? And that unless there's an
>> fsync/fdatasync/sync or the memory needs to be reclaimed, that it can
>> sit in the page cache forever?
>
> No.
>
> By and large, buffered IO in a filesystem is flushed out by the vm,
> due to either age or memory pressure.  The filesystem then responds
> to these requests by the VM, writing data as requested.
>
> You can read all about it in
> Documentation/sysctl/vm.txt but see dirty_expire_centisecs and
> dirty_writeback_centisecs - flushers wake up every 30s and push on
> data more than 5s old, by default.
>
> ext3 is somewhat unique in data=ordered metadata logging driving
> data flushing, IMHO.
>
>> One thing is puzzling me. Everyone is telling me that I must ensure
>> that fsync/fdatasync is used, even in environments where the concept
>> doesn't exist. So I've gone to find good examples of how it it used.
>> Since RHEL6 has been shipping with ext4 as the default for over 2.5
>> years, I figured it would be a great place to find examples. However,
>> I've been unable to find examples of fsync or fdatasync being used,
>> when using "strace -o file.out -f" on various system programs which
>> one would very much expect to use it.
>
> Whether or not an application *uses* fsync is orthogonal to whether
> or not it's *needed* to ensure persistence.  Obviously you don't need
> to fsync every IO as soon as its issued.  And there are buggy
> applications, yes.  It's up to the app to decide what needs to be
> persistent and when.  See the "When Should You Fsync?" section
> in the URL below.
>
>> We talked about some Python
>> config utilities the other day. But now I've moved on to C and C++
>> code. e.g. "cupsd" copy/truncate/writes the config file
>> "/etc/cups/printers.conf" quite frequently, all day long. But there is
>> no sign whatsoever of any fsync or fdatasync when I grep the strace
>> output file for those strings case insensitively. (And indeed, a
>> complex printers.conf file turned up zero-length on one of my RHEL6.4
>> boxes last week.)
>
> I'd file a bug against cups, then.
>
>> So I figured that when rpm installs a new vmlinuz, builds a new
>> initramfs and puts it into place, and modifies grub.conf, that surely
>> proper sync'ing must be done in this particularly critical case. But
>> while I do see rpm fsync/fsync'ing its own database files, it never
>> seems to fsync/fdatasync the critical system files it just installed
>> and/or modified. Surely, after over 2 - 1/2 years of Red Hat shipping
>> RHEL6 to customers, I must be mistaken in some way. Could you point me
>> to an example in RHEL6.4 where I can see clearly how fsync is being
>> properly used? In the mean time, I'll keep looking.
>
> database packages would get it right, I hope.
>
> See also http://lwn.net/Articles/457667/, "Ensuring data reaches disk"
>
> -Eric
>
>> Thanks,
>> Steve
>>
>>
>>
>> On Tue, Jun 11, 2013 at 8:59 AM, Ric Wheeler <rwheeler@redhat.com> wrote:
>>> On 06/11/2013 05:56 AM, Steve Bergman wrote:
>>>>
>>>> 4. From the time I write() a bit of data, what's the maximum time before
>>>> the
>>>> data is actually committed to disk?
>>>>
>>>> 5. Ext4 provides some automatic fsync'ing to avoid the zero-length file
>>>> issue for some common cases via the auto_da_alloc feature added in kernel
>>>> 2.6.30. Does XFS have similar behavior?
>>>
>>>
>>> I think that here you are talking more about ext3 than ext4.
>>>
>>> The answer to both of these - even for ext4 or ext3 - is that unless your
>>> application and storage is all properly configured, you are effectively at
>>> risk indefinitely. Chris Mason did a study years ago where he was able to
>>> demonstrate that dirty data could get pinned in a disk cache effectively
>>> indefinitely.  Only an fsync() would push that out.
>>>
>>> Applications need to use the data integrity hooks in order to have a
>>> reliable promise that application data is crash safe.  Jeff Moyer wrote up a
>>> really nice overview of this for lwn which you can find here:
>>>
>>> http://lwn.net/Articles/457667
>>>
>>> That said, if you have applications that do not do any of this, you can roll
>>> the dice and use a file system like ext3 that will periodically push data
>>> out of the page cache for you.
>>>
>>> Note that without the barrier mount option, that is not sufficient to push
>>> data to platter, just moves it down the line to the next potentially
>>> volatile cache :)  Even then, 4 out of every 5 seconds, your application
>>> will be certain to lose data if the box crashes while it is writing data.
>>> Lots of applications don't actually use the file system much (or write
>>> much), so ext3's sync behaviour helped mask poorly written applications
>>> pretty effectively for quite a while.
>>>
>>> There really is no short cut to doing the job right - your applications need
>>> to use the correct calls and we all need to configure the file and storage
>>> stack correctly.
>>>
>>> Thanks!
>>>
>>> Ric
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-07-22 14:59       ` Steve Bergman
@ 2013-07-22 15:16         ` Steve Bergman
  0 siblings, 0 replies; 50+ messages in thread
From: Steve Bergman @ 2013-07-22 15:16 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Ric Wheeler, Linux fs XFS

Apologies to Eric, Ric, and the list for quoting that whole thread and
for the top-post. It's a silly default behavior that Gmail doesn't
make obvious that it's doing before one hits 'send'.  You have to
remember to delete all that (hidden) text each time, before hitting
send. At least I didn't post in html. (And yes, I'm about ready to
switch back to imap and Evolution or Thunderbird!)

-Steve

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-10-26 10:41     ` Stan Hoeppner
@ 2013-10-27  3:29       ` Eric Sandeen
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Sandeen @ 2013-10-27  3:29 UTC (permalink / raw)
  To: stan, xfs

On 10/26/13 5:41 AM, Stan Hoeppner wrote:
> On 10/25/2013 9:57 AM, Eric Sandeen wrote:
> 
>> allocator, but it doesn't have GRIO (Guaranteed Realtime I/O) like 
>> IRIX does.
> 
> Wasn't it called "Guaranteed-Rate I/O"?  

Yeah, you are right.  Brain fart.

> And required the Origin ccNUMA
> hardware including the HUB and XBow ASICs?  IIRC this had no real-time
> guarantee, but simply reserved X amount of bandwidth from the XBow
> through the HUB to the processor, and finally the kernel and process.
> Whether the attached disks could sustain the reserved bandwidth was
> another matter.

There were 2 versions, implemented in very different ways...

Anyway, if Harry would "prefer designing and implementing" it for Linux,
I'll just let him get started.  :)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-10-25 14:57   ` Eric Sandeen
  2013-10-25 16:24     ` harryxiyou
  2013-10-25 16:44     ` harryxiyou
@ 2013-10-26 10:41     ` Stan Hoeppner
  2013-10-27  3:29       ` Eric Sandeen
  2 siblings, 1 reply; 50+ messages in thread
From: Stan Hoeppner @ 2013-10-26 10:41 UTC (permalink / raw)
  To: xfs

On 10/25/2013 9:57 AM, Eric Sandeen wrote:

> allocator, but it doesn't have GRIO (Guaranteed Realtime I/O) like 
> IRIX does.

Wasn't it called "Guaranteed-Rate I/O"?  And required the Origin ccNUMA
hardware including the HUB and XBow ASICs?  IIRC this had no real-time
guarantee, but simply reserved X amount of bandwidth from the XBow
through the HUB to the processor, and finally the kernel and process.
Whether the attached disks could sustain the reserved bandwidth was
another matter.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-10-25 14:57   ` Eric Sandeen
  2013-10-25 16:24     ` harryxiyou
@ 2013-10-25 16:44     ` harryxiyou
  2013-10-26 10:41     ` Stan Hoeppner
  2 siblings, 0 replies; 50+ messages in thread
From: harryxiyou @ 2013-10-25 16:44 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Fri, Oct 25, 2013 at 10:57 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 10/25/13 9:42 AM, Emmanuel Florac wrote:
>> Le Fri, 25 Oct 2013 22:28:10 +0800
>> harryxiyou <harryxiyou@gmail.com> écrivait:
>>
>>> 1, How to install XFS on Linux OS, I should do like below, right?
>>>    a, Download latest Linux Kernel Source Codes.
>>>    b, Compile the Linux Kernel and select XFS feature when I do the
>>> configuration (make menuconfig).
>>
>> No, XFS is supported by all Linux distributions out of the box. To use
>> XFS, you may eventually need to install the xfs-progs package.
>>
>>
>>> 2, Does XFS support real-time feature after I install it on Linux
>>> Kernel? And how to use real-time feature on Linux Kernel.
>>>
>>
>> Yes. What do you want to do with realtime XFS?
>>
>
> well, yes and no.  It supports the "realtime subvolume" which is
> not really technically "realtime."  It does have a more deterministic
> allocator, but it doesn't have GRIO (Guaranteed Realtime I/O) like
> IRIX does.
>

What about let Linux Kernel support GRIO like IRIX does. I think this
would be a good feature for XFS. Actually, I prefer designing and
implementing this feature for XFS.

Any comments?



-- 
Thanks
Weiwei  Jia (Harry Wei)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-10-25 14:57   ` Eric Sandeen
@ 2013-10-25 16:24     ` harryxiyou
  2013-10-25 16:44     ` harryxiyou
  2013-10-26 10:41     ` Stan Hoeppner
  2 siblings, 0 replies; 50+ messages in thread
From: harryxiyou @ 2013-10-25 16:24 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Fri, Oct 25, 2013 at 10:57 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> well, yes and no.  It supports the "realtime subvolume" which is
> not really technically "realtime."  It does have a more deterministic
> allocator, but it doesn't have GRIO (Guaranteed Realtime I/O) like
> IRIX does.

Hmmm..., however, I find the manual of mkfs.xfs tell me as follows.

       -r realtime_section_options
              These options specify the location, size, and  other
parameters  of  the
              real-time  section  of the filesystem. The valid
realtime_section_options
              are:

                   rtdev=device
                          This is used to specify the device which
should  contain  the
                          real-time  section of the filesystem.  The
suboption value is
                          the name of a block device.

                   extsize=value
                          This is used to specify the size of the
blocks in  the  real-
                          time section of the filesystem. This value
must be a multiple
                          of the filesystem block size. The minimum
allowed size is the
                          filesystem  block  size  or  4 KiB
(whichever is larger); the
                          default size is the stripe width for striped
 volumes  or  64
                          KiB  for  non-striped  volumes; the maximum
allowed size is 1
                          GiB. The real-time extent size should be
carefully chosen  to
                          match the parameters of the physical media used.

                   size=value
                          This  is  used  to specify the size of the
real-time section.
                          This suboption is only needed if the
real-time section of the
                          filesystem should occupy less space than the
size of the par?
                          tition or logical volume containing the section.



After I run "mkfs.xfs -r rtdev=/dev/sda3 extsize=64K", I wonder what the
differences between '/dev/sda3' and '/dev/sda2' (/dev/sda2 is common XFS or
EXT2). The differences are as follows, right?

1, I/O speed of real-time '/dev/sda3' is faster?
2, The performance of '/dev/sda3’ is better?

Anything else?



-- 
Thanks
Weiwei  Jia (Harry Wei)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-10-25 16:13   ` harryxiyou
@ 2013-10-25 16:16     ` Eric Sandeen
  0 siblings, 0 replies; 50+ messages in thread
From: Eric Sandeen @ 2013-10-25 16:16 UTC (permalink / raw)
  To: harryxiyou, Emmanuel Florac; +Cc: xfs

On 10/25/13 11:13 AM, harryxiyou wrote:
> On Fri, Oct 25, 2013 at 10:42 PM, Emmanuel Florac
> <eflorac@intellique.com> wrote:
>> Le Fri, 25 Oct 2013 22:28:10 +0800
>> harryxiyou <harryxiyou@gmail.com> écrivait:
>>
>>> 1, How to install XFS on Linux OS, I should do like below, right?
>>>    a, Download latest Linux Kernel Source Codes.
>>>    b, Compile the Linux Kernel and select XFS feature when I do the
>>> configuration (make menuconfig).
>>
>> No, XFS is supported by all Linux distributions out of the box. To use
>> XFS, you may eventually need to install the xfs-progs package.
> 
> Yeah, I have seen it. After I install xfs-progs, I could run mkfs.xfs to
> format a device to run XFS.
> 
>>
>>
>>> 2, Does XFS support real-time feature after I install it on Linux
>>> Kernel? And how to use real-time feature on Linux Kernel.
>>>
>>
>> Yes. What do you want to do with realtime XFS?
>>
> 
> I just wonder how to realize the real-time upon common Linux Kernel
> and common physical hardware. It's interesting, isn't? ;-)

http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/ch04s09.html
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/ch05s03.html
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/ch06s11.html
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide//tmp/en-US/html/ch06s12.html

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-10-25 14:42 ` Emmanuel Florac
  2013-10-25 14:57   ` Eric Sandeen
@ 2013-10-25 16:13   ` harryxiyou
  2013-10-25 16:16     ` Eric Sandeen
  1 sibling, 1 reply; 50+ messages in thread
From: harryxiyou @ 2013-10-25 16:13 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

On Fri, Oct 25, 2013 at 10:42 PM, Emmanuel Florac
<eflorac@intellique.com> wrote:
> Le Fri, 25 Oct 2013 22:28:10 +0800
> harryxiyou <harryxiyou@gmail.com> écrivait:
>
>> 1, How to install XFS on Linux OS, I should do like below, right?
>>    a, Download latest Linux Kernel Source Codes.
>>    b, Compile the Linux Kernel and select XFS feature when I do the
>> configuration (make menuconfig).
>
> No, XFS is supported by all Linux distributions out of the box. To use
> XFS, you may eventually need to install the xfs-progs package.

Yeah, I have seen it. After I install xfs-progs, I could run mkfs.xfs to
format a device to run XFS.

>
>
>> 2, Does XFS support real-time feature after I install it on Linux
>> Kernel? And how to use real-time feature on Linux Kernel.
>>
>
> Yes. What do you want to do with realtime XFS?
>

I just wonder how to realize the real-time upon common Linux Kernel
and common physical hardware. It's interesting, isn't? ;-)

Thanks for your reply.



-- 
Thanks
Weiwei  Jia (Harry Wei)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-10-25 14:42 ` Emmanuel Florac
@ 2013-10-25 14:57   ` Eric Sandeen
  2013-10-25 16:24     ` harryxiyou
                       ` (2 more replies)
  2013-10-25 16:13   ` harryxiyou
  1 sibling, 3 replies; 50+ messages in thread
From: Eric Sandeen @ 2013-10-25 14:57 UTC (permalink / raw)
  To: Emmanuel Florac, harryxiyou; +Cc: xfs

On 10/25/13 9:42 AM, Emmanuel Florac wrote:
> Le Fri, 25 Oct 2013 22:28:10 +0800
> harryxiyou <harryxiyou@gmail.com> écrivait:
> 
>> 1, How to install XFS on Linux OS, I should do like below, right?
>>    a, Download latest Linux Kernel Source Codes.
>>    b, Compile the Linux Kernel and select XFS feature when I do the
>> configuration (make menuconfig).
> 
> No, XFS is supported by all Linux distributions out of the box. To use
> XFS, you may eventually need to install the xfs-progs package.
> 
>  
>> 2, Does XFS support real-time feature after I install it on Linux
>> Kernel? And how to use real-time feature on Linux Kernel.
>>
> 
> Yes. What do you want to do with realtime XFS?
> 

well, yes and no.  It supports the "realtime subvolume" which is
not really technically "realtime."  It does have a more deterministic
allocator, but it doesn't have GRIO (Guaranteed Realtime I/O) like 
IRIX does.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2013-10-25 14:28 harryxiyou
@ 2013-10-25 14:42 ` Emmanuel Florac
  2013-10-25 14:57   ` Eric Sandeen
  2013-10-25 16:13   ` harryxiyou
  0 siblings, 2 replies; 50+ messages in thread
From: Emmanuel Florac @ 2013-10-25 14:42 UTC (permalink / raw)
  To: harryxiyou; +Cc: xfs

Le Fri, 25 Oct 2013 22:28:10 +0800
harryxiyou <harryxiyou@gmail.com> écrivait:

> 1, How to install XFS on Linux OS, I should do like below, right?
>    a, Download latest Linux Kernel Source Codes.
>    b, Compile the Linux Kernel and select XFS feature when I do the
> configuration (make menuconfig).

No, XFS is supported by all Linux distributions out of the box. To use
XFS, you may eventually need to install the xfs-progs package.

 
> 2, Does XFS support real-time feature after I install it on Linux
> Kernel? And how to use real-time feature on Linux Kernel.
> 

Yes. What do you want to do with realtime XFS?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Questions about XFS
@ 2013-10-25 14:28 harryxiyou
  2013-10-25 14:42 ` Emmanuel Florac
  0 siblings, 1 reply; 50+ messages in thread
From: harryxiyou @ 2013-10-25 14:28 UTC (permalink / raw)
  To: xfs

Hi Folks,

I have some questions about XFS as follows.

1, How to install XFS on Linux OS, I should do like below, right?
   a, Download latest Linux Kernel Source Codes.
   b, Compile the Linux Kernel and select XFS feature when I do the
configuration (make menuconfig).

2, Does XFS support real-time feature after I install it on Linux
Kernel? And how to use real-time feature on Linux Kernel.

Could anyone please give me some suggestions? thanks very much.

-- 
Thanks
Weiwei  Jia (Harry Wei)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2007-03-16 10:36       ` Martin Steigerwald
@ 2007-03-17  0:47         ` Jason White
  0 siblings, 0 replies; 50+ messages in thread
From: Jason White @ 2007-03-17  0:47 UTC (permalink / raw)
  To: linux-xfs

On Fri, Mar 16, 2007 at 11:36:31AM +0100, Martin Steigerwald wrote:
 
> Since 2.6.17.7 and enabled write barriers I didn't loose meta data 
> consistency on my laptop anymore and I can tell you that it crashed a lot 
> due to my experiments with what not (especially OSS radeon drivers and 
> beryl;-). I also had some classical power outages. 

My laptop also supports write barriers, but I leave the battery in place in
case there's a power outage; effectively it's operating as a UPS.

This might be slightly off-topic, but in choosing a SATA drive for a desktop
machine, what features/standard-complaince should one look for in order to
ensure that write barriers work? I know this involves flushing the drive
cache, but is this support mandatory in any of the applicable standards?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2007-03-15  9:07     ` clflush
  2007-03-15 14:41       ` Geir A. Myrestrand
@ 2007-03-16 10:36       ` Martin Steigerwald
  2007-03-17  0:47         ` Jason White
  1 sibling, 1 reply; 50+ messages in thread
From: Martin Steigerwald @ 2007-03-16 10:36 UTC (permalink / raw)
  To: linux-xfs

Am Donnerstag 15 März 2007 schrieb clflush:
> From what I know, and correct me if I'm wrong, XFS relies on the
> application side to do the right job but real world experience shows us
> that *a lot* of applications out there behave badly and cannot be
> trusted hence if something happens, XFS cannot "correct" the problem
> leaving you with headaches behind depending on how much data you
> lost/corrupted and the importance of it. IMHO, XFS *should* do some
> effort at assuring integrity to minimize the bad behavior of badly
> written applications out there.

Hello,

as Eric wrote in this thread recent versions of XFS do an effort on 
avoiding these zeros in files:

"On the other hand, there were some changes made to xfs to explicitly 
sync files on close, if they have been truncated, which should help this 
sort of problem.  Depending on what's in OpenSuSE 10.2, that change may 
or may not be in your code..."

> On the one hand you have the old Ext3 FS which doesn't perform very
> well in many areas but IMO is a lot safer to work on (doesn't loose
> data that easily compared to XFS - and I'm talking from experience here
> because I use both file systems and I lost much more on the XFS system
> than on the Ext3 one) and on the other hand you have this excellent XFS
> file system with its clean layout and awesome performance + fancy
> features like GRIO, extents, allocate on flush, real time volumes, etc
> *but* is not "safe" enough to work with if you have unreliable hardware
> and/or a lot of power outage issues  - I've never lost data on Ext3
> during a power outage but already lost 2 times data on XFS

Since 2.6.17.7 and enabled write barriers I didn't loose meta data 
consistency on my laptop anymore and I can tell you that it crashed a lot 
due to my experiments with what not (especially OSS radeon drivers and 
beryl;-). I also had some classical power outages. I usually do not put a 
battery into my laptop if not needed.

And with recent XFS I did not encounter any data losses at all. Might have 
been luck, but before after a crash or power outage Akkregator told me 
sometimes that the file with the newsfeed stuff was corrupted and a 
backup has been restored. I didn't see this dialog since a long time on 
my laptop.

That given I would like to have more safety built into the filesystem 
itself, but at least current ext3 is too ancient technology for me. 
Coming from the Amiga a filesystem with a hard maximum number of inodes 
just doesn't fit my expectations (although original Amiga FFS has lot of 
shortcomings too;-).

The real challenge is to implement safety without serious loss of 
performance. You have more data safety in ext3, but less performance, and 
more performance in XFS, but potential less data safety with badly 
written applications. Not almost every bit of additional performance in 
XFS comes from transferring responsibility of data safety to the 
application, but I believe there is a relationship between safety and 
performance.

Maybe wandering logs / a log structured approach as (partly) seen in 
Reiser 4 and NetApp's WAFL might be a good approach to get more data 
safety without (much) less performance. (Well in the NetApp FAS non 
volatile RAM plays an important role, too.)

Regards,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2007-03-15  9:07     ` clflush
@ 2007-03-15 14:41       ` Geir A. Myrestrand
  2007-03-16 10:36       ` Martin Steigerwald
  1 sibling, 0 replies; 50+ messages in thread
From: Geir A. Myrestrand @ 2007-03-15 14:41 UTC (permalink / raw)
  To: xfs

clflush wrote:
> On the one hand you have the old Ext3 FS which doesn't perform very well in 
> many areas but IMO is a lot safer to work on (doesn't loose data that easily 
> compared to XFS - and I'm talking from experience here because I use both 
> file systems and I lost much more on the XFS system than on the Ext3 one) and 
> on the other hand you have this excellent XFS file system with its clean 
> layout and awesome performance + fancy features like GRIO, extents, allocate 
> on flush, real time volumes, etc *but* is not "safe" enough to work with if 
> you have unreliable hardware and/or a lot of power outage issues  - I've 
> never lost data on Ext3 during a power outage but already lost 2 times data 
> on XFS

You *always* use a UPS when you use XFS.
XFS does not prevent power outages [yet]...

> Just my $0.02

Save them for a UPS. ;-)

-- 

Geir A. Myrestrand

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2007-03-15  4:26   ` Taisuke Yamada
@ 2007-03-15  9:07     ` clflush
  2007-03-15 14:41       ` Geir A. Myrestrand
  2007-03-16 10:36       ` Martin Steigerwald
  0 siblings, 2 replies; 50+ messages in thread
From: clflush @ 2007-03-15  9:07 UTC (permalink / raw)
  To: Taisuke Yamada; +Cc: xfs

>From what I know, and correct me if I'm wrong, XFS relies on the application 
side to do the right job but real world experience shows us that *a lot* of 
applications out there behave badly and cannot be trusted hence if something 
happens, XFS cannot "correct" the problem leaving you with headaches behind 
depending on how much data you lost/corrupted and the importance of it. IMHO, 
XFS *should* do some effort at assuring integrity to minimize the bad 
behavior of badly written applications out there. I know that XFS wasn't 
written for PC class hardware in the first place, but most people do not read 
enough to understand XFS and use it on their desktops/laptops because to be 
honest Linux doesn't really have a good file system, and XFS out of all 
available file systems, is the best in performance and scalability terms. 

On the one hand you have the old Ext3 FS which doesn't perform very well in 
many areas but IMO is a lot safer to work on (doesn't loose data that easily 
compared to XFS - and I'm talking from experience here because I use both 
file systems and I lost much more on the XFS system than on the Ext3 one) and 
on the other hand you have this excellent XFS file system with its clean 
layout and awesome performance + fancy features like GRIO, extents, allocate 
on flush, real time volumes, etc *but* is not "safe" enough to work with if 
you have unreliable hardware and/or a lot of power outage issues  - I've 
never lost data on Ext3 during a power outage but already lost 2 times data 
on XFS

Just my $0.02


On Thursday 15 March 2007 05:26:18 you wrote:
> From end-user's POV, this infamous XFS behavior is somewhat
> taken as XFS's inferiority compared to other filesystems.
> Even with "bad" applications (ex. firefox), this rarely happens
> on others, so regardless of what's on the FAQ, people logically
> concludes that the fault belongs to XFS anyway.
>
> So, what is the correct way to do IO?
> Is what firefox (and other bad apps) doing is so obvious(ly buggy),
> that it'll be acknowledged as a bug once reported? Or is it simply
> a mismatch between application expectation and XFS behavior,
> requiring a non-(obvious|generic) fix?
>
> Although I'm not a filesystem developer, I'm pretty impressed with
> XFS and willing to file a report/patch to those "buggy" apps if the
> issue is explainable to other app developers.
>
> >> was to press the reset button on the computer. After the reboot, when I
> >> opened Firefox again, I noticed that all my bookmarks were gone. Those
> >> bookmarks were imported from my desktop machine a few days after I
> >> configured the new server.
> >
> > This is a firefox bug - I've seen it before (on my mother's machine).
> >
> > It's due to firefox not doing the correct thing with IO on the bookmarks
> > file.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2007-03-14 16:33 ` Stewart Smith
@ 2007-03-15  4:26   ` Taisuke Yamada
  2007-03-15  9:07     ` clflush
  0 siblings, 1 reply; 50+ messages in thread
From: Taisuke Yamada @ 2007-03-15  4:26 UTC (permalink / raw)
  To: xfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- From end-user's POV, this infamous XFS behavior is somewhat
taken as XFS's inferiority compared to other filesystems.
Even with "bad" applications (ex. firefox), this rarely happens
on others, so regardless of what's on the FAQ, people logically
concludes that the fault belongs to XFS anyway.

So, what is the correct way to do IO?
Is what firefox (and other bad apps) doing is so obvious(ly buggy),
that it'll be acknowledged as a bug once reported? Or is it simply
a mismatch between application expectation and XFS behavior,
requiring a non-(obvious|generic) fix?

Although I'm not a filesystem developer, I'm pretty impressed with
XFS and willing to file a report/patch to those "buggy" apps if the
issue is explainable to other app developers.

>> was to press the reset button on the computer. After the reboot, when I 
>> opened Firefox again, I noticed that all my bookmarks were gone. Those 
>> bookmarks were imported from my desktop machine a few days after I configured 
>> the new server.
> 
> This is a firefox bug - I've seen it before (on my mother's machine).
> 
> It's due to firefox not doing the correct thing with IO on the bookmarks
> file.

- --
Taisuke Yamada <tyamadajp@spam.rakugaki.org>, http://rakugaki.org/
2268 E9A2 D4F9 014E F11D  1DF7 DCA3 83BC 78E5 CD3A

Message to my public address may not be handled in a timely manner.
For a direct contact, please use my private address on my namecard.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF+Mrq3KODvHjlzToRAu/vAKC8pky15WJwocHbWhbRx9f2H+c5aQCeIeYp
ZJPcSeawAIbZN80GXJz+kYg=
=oAY3
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2007-03-13 13:40 clflush
                   ` (2 preceding siblings ...)
  2007-03-13 15:55 ` Eric Sandeen
@ 2007-03-14 16:33 ` Stewart Smith
  2007-03-15  4:26   ` Taisuke Yamada
  3 siblings, 1 reply; 50+ messages in thread
From: Stewart Smith @ 2007-03-14 16:33 UTC (permalink / raw)
  To: clflush; +Cc: xfs

[-- Attachment #1: Type: text/plain, Size: 1118 bytes --]

On Tue, 2007-03-13 at 14:40 +0100, clflush wrote:
> I have a few simple questions regarding the XFS file system. I built a new 
> small server here (commodity hardware, x86-64) and I've installed 32-bit 
> openSUSE 10.2 on it. After the system was installed, configured and up and 
> running, it hung while I was browsing with Firefox. The only thing I could do 
> was to press the reset button on the computer. After the reboot, when I 
> opened Firefox again, I noticed that all my bookmarks were gone. Those 
> bookmarks were imported from my desktop machine a few days after I configured 
> the new server.

This is a firefox bug - I've seen it before (on my mother's machine).

It's due to firefox not doing the correct thing with IO on the bookmarks
file.

As mentioned in another mail you can restore from a backup that firefox
makes.

In a future release, firefox is going to be using sqlite for storing
thes ethings, which will mean that these problems go away (pretty sure
sqllite does all the right things)
-- 
Stewart Smith (stewart@flamingspork.com)
http://www.flamingspork.com/


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2007-03-13 13:40 clflush
  2007-03-13 15:36 ` Klaus Strebel
  2007-03-13 15:53 ` Stein M. Hugubakken
@ 2007-03-13 15:55 ` Eric Sandeen
  2007-03-14 16:33 ` Stewart Smith
  3 siblings, 0 replies; 50+ messages in thread
From: Eric Sandeen @ 2007-03-13 15:55 UTC (permalink / raw)
  To: clflush; +Cc: xfs

clflush wrote:
> Hi,
> 
> I have a few simple questions regarding the XFS file system. I built a new 
> small server here (commodity hardware, x86-64) and I've installed 32-bit 
> openSUSE 10.2 on it. After the system was installed, configured and up and 
> running, it hung while I was browsing with Firefox. The only thing I could do 
> was to press the reset button on the computer. After the reboot, when I 
> opened Firefox again, I noticed that all my bookmarks were gone. Those 
> bookmarks were imported from my desktop machine a few days after I configured 
> the new server.
> 
> All file systems on this new server are XFS because I heard good things about 
> it and it generally performs better in database operations compared to other 
> file systems available for Linux. However, I was pretty surprised that when I 
> had to reset the machine because it hung for some reason, all the bookmarks 
> in Firefox were gone, so now I have my doubts about the reliability and data 
> integrity of XFS. My older server, which also runs openSUSE 10.2 (32-bit) but 
> uses Ext3 as file system never had such issues and I had to reset it many 
> times because it was hanging for some reason.

sounds like you have several reliability problems ;-)

> Am I right to assume that XFS compared to Ext3 does not do a very good job 
> regarding data integrity? I know a little bit about file systems and I know 
> that most file systems depend on the application to do the right job 
> regarding the way it opens/locks/saves files, but in reality not all 
> applications are written in a safe way to guarantee this.
> 
> Basically, my two question that I have are:
> 
> - Why did I lost bookmarks on a machine running XFS while on another one which 
> runs the same OS version but uses Ext3 as file system, it never happened, no 
> matter how many times I had to reset it.

see also http://oss.sgi.com/projects/xfs/faq.html#nulls

> - Are there any efforts currently made to increase the data integrity of XFS?

this is essentially a loss of buffered data in the VM, outside the realm 
of what xfs can realistically protect.  With ext3, you probably were 
losing your "latest" bookmarks as well, but were luckily(?) getting back 
whatever used to be on-disk.

On the other hand, there were some changes made to xfs to explicitly 
sync files on close, if they have been truncated, which should help this 
sort of problem.  Depending on what's in OpenSuSE 10.2, that change may 
or may not be in your code...

-Eric

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2007-03-13 13:40 clflush
  2007-03-13 15:36 ` Klaus Strebel
@ 2007-03-13 15:53 ` Stein M. Hugubakken
  2007-03-13 15:55 ` Eric Sandeen
  2007-03-14 16:33 ` Stewart Smith
  3 siblings, 0 replies; 50+ messages in thread
From: Stein M. Hugubakken @ 2007-03-13 15:53 UTC (permalink / raw)
  To: xfs

clflush wrote:
> Basically, my two question that I have are:
> 
> - Why did I lost bookmarks on a machine running XFS while on another one which 
> runs the same OS version but uses Ext3 as file system, it never happened, no 
> matter how many times I had to reset it.
> 
> - Are there any efforts currently made to increase the data integrity of XFS?
> 

Take a look at the FAQ:
http://oss.sgi.com/projects/xfs/faq.html#wcache

Regarding the lost bookmarks, you might find an old backup in 
~/.mozilla/firefox/<userprofile>/bookmarkbackups.

Regards
Stein

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Questions about XFS
  2007-03-13 13:40 clflush
@ 2007-03-13 15:36 ` Klaus Strebel
  2007-03-13 15:53 ` Stein M. Hugubakken
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 50+ messages in thread
From: Klaus Strebel @ 2007-03-13 15:36 UTC (permalink / raw)
  To: clflush; +Cc: xfs

clflush schrieb:
> Hi,
> 
> I have a few simple questions regarding the XFS file system. I built a new 
> small server here (commodity hardware, x86-64) and I've installed 32-bit 
> openSUSE 10.2 on it. After the system was installed, configured and up and 
> running, it hung while I was browsing with Firefox. The only thing I could do 
> was to press the reset button on the computer. After the reboot, when I 
> opened Firefox again, I noticed that all my bookmarks were gone. Those 
> bookmarks were imported from my desktop machine a few days after I configured 
> the new server.
> 
> All file systems on this new server are XFS because I heard good things about 
> it and it generally performs better in database operations compared to other 
> file systems available for Linux. However, I was pretty surprised that when I 
> had to reset the machine because it hung for some reason, all the bookmarks 
> in Firefox were gone, so now I have my doubts about the reliability and data 
> integrity of XFS. My older server, which also runs openSUSE 10.2 (32-bit) but 
> uses Ext3 as file system never had such issues and I had to reset it many 
> times because it was hanging for some reason.
> 
> Am I right to assume that XFS compared to Ext3 does not do a very good job 
> regarding data integrity? I know a little bit about file systems and I know 
> that most file systems depend on the application to do the right job 
> regarding the way it opens/locks/saves files, but in reality not all 
> applications are written in a safe way to guarantee this.
> 
> Basically, my two question that I have are:
> 
> - Why did I lost bookmarks on a machine running XFS while on another one which 
> runs the same OS version but uses Ext3 as file system, it never happened, no 
> matter how many times I had to reset it.
> 
> - Are there any efforts currently made to increase the data integrity of XFS?
> 
> Regards
> 
> 
Hi,

short and rude answer: 'Search the archives and FAQs'.

Simply short answer: no and no.

Longer answer: XFS only cares about meta-data integrity, if unwritten 
extends exist in memory, you'll get these empty on the disk if you reset 
your box. You should consider using the 'Magic SysRq' hotkeys to 
emergency sync your disk in cases like these before you reset your box.

Ciao
Klaus

-- 
Mit freundlichen Grüssen / best regards

Klaus Strebel, Dipl.-Inform. (FH), mailto:klaus.strebel@gmx.net

/"\
\ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
/ \

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Questions about XFS
@ 2007-03-13 13:40 clflush
  2007-03-13 15:36 ` Klaus Strebel
                   ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: clflush @ 2007-03-13 13:40 UTC (permalink / raw)
  To: xfs

Hi,

I have a few simple questions regarding the XFS file system. I built a new 
small server here (commodity hardware, x86-64) and I've installed 32-bit 
openSUSE 10.2 on it. After the system was installed, configured and up and 
running, it hung while I was browsing with Firefox. The only thing I could do 
was to press the reset button on the computer. After the reboot, when I 
opened Firefox again, I noticed that all my bookmarks were gone. Those 
bookmarks were imported from my desktop machine a few days after I configured 
the new server.

All file systems on this new server are XFS because I heard good things about 
it and it generally performs better in database operations compared to other 
file systems available for Linux. However, I was pretty surprised that when I 
had to reset the machine because it hung for some reason, all the bookmarks 
in Firefox were gone, so now I have my doubts about the reliability and data 
integrity of XFS. My older server, which also runs openSUSE 10.2 (32-bit) but 
uses Ext3 as file system never had such issues and I had to reset it many 
times because it was hanging for some reason.

Am I right to assume that XFS compared to Ext3 does not do a very good job 
regarding data integrity? I know a little bit about file systems and I know 
that most file systems depend on the application to do the right job 
regarding the way it opens/locks/saves files, but in reality not all 
applications are written in a safe way to guarantee this.

Basically, my two question that I have are:

- Why did I lost bookmarks on a machine running XFS while on another one which 
runs the same OS version but uses Ext3 as file system, it never happened, no 
matter how many times I had to reset it.

- Are there any efforts currently made to increase the data integrity of XFS?

Regards

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2013-10-27  3:29 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-11  9:56 Questions about XFS Steve Bergman
2013-06-11 13:10 ` Emmanuel Florac
2013-06-11 13:35 ` Stefan Ring
2013-06-11 13:52 ` Ric Wheeler
2013-06-11 13:59 ` Ric Wheeler
2013-06-11 16:12   ` Steve Bergman
2013-06-11 17:19     ` Ric Wheeler
2013-06-11 17:27       ` Stefan Ring
2013-06-11 17:31         ` Ric Wheeler
2013-06-11 17:41           ` Stefan Ring
2013-06-11 18:03             ` Eric Sandeen
2013-06-11 19:30           ` Steve Bergman
2013-06-11 21:03             ` Dave Chinner
2013-06-11 21:43               ` Steve Bergman
2013-06-11 17:59         ` Ben Myers
2013-06-11 17:28     ` Eric Sandeen
2013-06-11 19:17       ` Steve Bergman
2013-06-11 21:47         ` Dave Chinner
2013-07-22 14:59       ` Steve Bergman
2013-07-22 15:16         ` Steve Bergman
2013-06-12  8:26     ` Roger Oberholtzer
2013-06-12 10:34       ` Ric Wheeler
2013-06-12 13:52         ` Roger Oberholtzer
2013-06-12 12:12       ` Stan Hoeppner
2013-06-12 13:48         ` Roger Oberholtzer
2013-06-13  0:48       ` Dave Chinner
2013-06-11 19:35 ` Ben Myers
2013-06-11 19:55   ` Steve Bergman
2013-06-11 20:08     ` Ben Myers
2013-06-11 21:57     ` Matthias Schniedermeyer
2013-06-11 22:18       ` Steve Bergman
  -- strict thread matches above, loose matches on Subject: below --
2013-10-25 14:28 harryxiyou
2013-10-25 14:42 ` Emmanuel Florac
2013-10-25 14:57   ` Eric Sandeen
2013-10-25 16:24     ` harryxiyou
2013-10-25 16:44     ` harryxiyou
2013-10-26 10:41     ` Stan Hoeppner
2013-10-27  3:29       ` Eric Sandeen
2013-10-25 16:13   ` harryxiyou
2013-10-25 16:16     ` Eric Sandeen
2007-03-13 13:40 clflush
2007-03-13 15:36 ` Klaus Strebel
2007-03-13 15:53 ` Stein M. Hugubakken
2007-03-13 15:55 ` Eric Sandeen
2007-03-14 16:33 ` Stewart Smith
2007-03-15  4:26   ` Taisuke Yamada
2007-03-15  9:07     ` clflush
2007-03-15 14:41       ` Geir A. Myrestrand
2007-03-16 10:36       ` Martin Steigerwald
2007-03-17  0:47         ` Jason White

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.