All of lore.kernel.org
 help / color / mirror / Atom feed
* 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-06 14:28 ` Justin Piszcz
  0 siblings, 0 replies; 61+ messages in thread
From: Justin Piszcz @ 2008-12-06 14:28 UTC (permalink / raw)
  To: linux-raid, xfs; +Cc: Alan Piszcz

Someone should write a document with XFS and barrier support, if I recall,
in the past, they never worked right on raid1 or raid5 devices, but it
appears now they they work on RAID1, which slows down performance ~12 times!!

l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 
0.15user 1.54system 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+325minor)pagefaults 0swaps
l1:~#

l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+324minor)pagefaults 0swaps
l1:~#

Before:
/dev/md2        /               xfs     defaults,noatime  0       1

After:
/dev/md2        /               xfs     defaults,noatime,nobarrier,logbufs=8,logbsize=262144 0 1

There is some mention of it here:
http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent

But basically I believe it should be noted in the kernel logs, FAQ or somewhere
because just through the process of upgrading the kernel, not changing fstab
or any other part of the system, performance can drop 12x just because the
newer kernels implement barriers.

Justin.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-06 14:28 ` Justin Piszcz
  0 siblings, 0 replies; 61+ messages in thread
From: Justin Piszcz @ 2008-12-06 14:28 UTC (permalink / raw)
  To: linux-raid, xfs; +Cc: Alan Piszcz

Someone should write a document with XFS and barrier support, if I recall,
in the past, they never worked right on raid1 or raid5 devices, but it
appears now they they work on RAID1, which slows down performance ~12 times!!

l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 
0.15user 1.54system 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+325minor)pagefaults 0swaps
l1:~#

l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+324minor)pagefaults 0swaps
l1:~#

Before:
/dev/md2        /               xfs     defaults,noatime  0       1

After:
/dev/md2        /               xfs     defaults,noatime,nobarrier,logbufs=8,logbsize=262144 0 1

There is some mention of it here:
http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent

But basically I believe it should be noted in the kernel logs, FAQ or somewhere
because just through the process of upgrading the kernel, not changing fstab
or any other part of the system, performance can drop 12x just because the
newer kernels implement barriers.

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-06 14:28 ` Justin Piszcz
  (?)
@ 2008-12-06 15:36 ` Eric Sandeen
  2008-12-06 20:35     ` Redeeman
  2008-12-13 12:54     ` Justin Piszcz
  -1 siblings, 2 replies; 61+ messages in thread
From: Eric Sandeen @ 2008-12-06 15:36 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, Alan Piszcz, xfs

Justin Piszcz wrote:
> Someone should write a document with XFS and barrier support, if I recall,
> in the past, they never worked right on raid1 or raid5 devices, but it
> appears now they they work on RAID1, which slows down performance ~12 times!!

What sort of document do you propose?  xfs will enable barriers on any
block device which will support them, and after:

deeb5912db12e8b7ccf3f4b1afaad60bc29abed9

[XFS] Disable queue flag test in barrier check.

xfs is able to determine, via a test IO, that md raid1 does pass
barriers through properly even though it doesn't set an ordered flag on
the queue.

> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 
> 0.15user 1.54system 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+325minor)pagefaults 0swaps
> l1:~#
> 
> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
> 0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+324minor)pagefaults 0swaps
> l1:~#
> 
> Before:
> /dev/md2        /               xfs     defaults,noatime  0       1
> 
> After:
> /dev/md2        /               xfs     defaults,noatime,nobarrier,logbufs=8,logbsize=262144 0 1

Well, if you're investigating barriers can you do a test with just the
barrier option change; though I expect you'll still find it to have a
substantial impact.

> There is some mention of it here:
> http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent
> 
> But basically I believe it should be noted in the kernel logs, FAQ or somewhere
> because just through the process of upgrading the kernel, not changing fstab
> or any other part of the system, performance can drop 12x just because the
> newer kernels implement barriers.

Perhaps:

printk(KERN_ALERT "XFS is now looking after your metadata very
carefully; if you prefer the old, fast, dangerous way, mount with -o
nobarrier\n");

:)

Really, this just gets xfs on md raid1 in line with how it behaves on
most other devices.

But I agree, some documentation/education is probably in order; if you
choose to disable write caches or you have faith in the battery backup
of your write cache, turning off barriers would be a good idea.  Justin,
it might be interesting to do some tests with:

barrier,   write cache enabled
nobarrier, write cache enabled
nobarrier, write cache disabled

a 12x hit does hurt though...  If you're really motivated, try the same
scenarios on ext3 and ext4 to see what the barrier hit is on those as well.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-06 14:28 ` Justin Piszcz
  (?)
  (?)
@ 2008-12-06 18:42 ` Peter Grandi
  -1 siblings, 0 replies; 61+ messages in thread
From: Peter Grandi @ 2008-12-06 18:42 UTC (permalink / raw)
  To: Linux XFS, Linux RAID


> Someone should write a document with XFS and barrier support,
> if I recall, in the past, they never worked right on raid1 or
> raid5 devices, but it appears now they they work on RAID1,
> which slows down performance ~12 times!!

Of the many poorly misunderstood, misleading posts to the XFS
and RAID mailing lists this comparison is particularly bad:

> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 
> 0.15user 1.54system 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+325minor)pagefaults 0swaps
> l1:~#

> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
> 0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+324minor)pagefaults 0swaps
> l1:~#

In the first case 'linux-2.6.27.7.tar' is in effect being
extracted to volatile memory (depending on memory size, flusher
parameters, etc., which are gleefully unreported), in the second
to persistent disk; even worse in the particular case it is a
fairly metadata intensive test (25k inodes), and writing lots of
metadata to disk (twice as in RAID1) as opposed to memory of
course is going to be slow.

Comparing the two makes no sense and imparts no useful
information. It would be more interesting to see an analysis
with data and argument as to whether the metadata layout of XFS
is good or bad or how it could be improved; the issue here is
metadata policies, not barriers.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-06 15:36 ` Eric Sandeen
@ 2008-12-06 20:35     ` Redeeman
  2008-12-13 12:54     ` Justin Piszcz
  1 sibling, 0 replies; 61+ messages in thread
From: Redeeman @ 2008-12-06 20:35 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Justin Piszcz, linux-raid, xfs, Alan Piszcz

On Sat, 2008-12-06 at 09:36 -0600, Eric Sandeen wrote:
> Justin Piszcz wrote:
> > Someone should write a document with XFS and barrier support, if I recall,
> > in the past, they never worked right on raid1 or raid5 devices, but it
> > appears now they they work on RAID1, which slows down performance ~12 times!!
> 
> What sort of document do you propose?  xfs will enable barriers on any
> block device which will support them, and after:
> 
> deeb5912db12e8b7ccf3f4b1afaad60bc29abed9
> 
> [XFS] Disable queue flag test in barrier check.
> 
> xfs is able to determine, via a test IO, that md raid1 does pass
> barriers through properly even though it doesn't set an ordered flag on
> the queue.
> 
> > l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 
> > 0.15user 1.54system 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
> > 0inputs+0outputs (0major+325minor)pagefaults 0swaps
> > l1:~#
> > 
> > l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
> > 0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
> > 0inputs+0outputs (0major+324minor)pagefaults 0swaps
> > l1:~#
> > 
> > Before:
> > /dev/md2        /               xfs     defaults,noatime  0       1
> > 
> > After:
> > /dev/md2        /               xfs     defaults,noatime,nobarrier,logbufs=8,logbsize=262144 0 1
> 
> Well, if you're investigating barriers can you do a test with just the
> barrier option change; though I expect you'll still find it to have a
> substantial impact.
> 
> > There is some mention of it here:
> > http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent
> > 
> > But basically I believe it should be noted in the kernel logs, FAQ or somewhere
> > because just through the process of upgrading the kernel, not changing fstab
> > or any other part of the system, performance can drop 12x just because the
> > newer kernels implement barriers.
> 
> Perhaps:
> 
> printk(KERN_ALERT "XFS is now looking after your metadata very
> carefully; if you prefer the old, fast, dangerous way, mount with -o
> nobarrier\n");
> 
> :)
> 
> Really, this just gets xfs on md raid1 in line with how it behaves on
> most other devices.
> 
> But I agree, some documentation/education is probably in order; if you
> choose to disable write caches or you have faith in the battery backup
> of your write cache, turning off barriers would be a good idea.  Justin,
> it might be interesting to do some tests with:
> 
> barrier,   write cache enabled
> nobarrier, write cache enabled
> nobarrier, write cache disabled
> 
> a 12x hit does hurt though...  If you're really motivated, try the same
> scenarios on ext3 and ext4 to see what the barrier hit is on those as well.
I have tested with ext3/xfs, and barriers have considerably more impact
on xfs than ext3. this is ~4 months old test, I do not have any precise
data anymore.


> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-06 20:35     ` Redeeman
  0 siblings, 0 replies; 61+ messages in thread
From: Redeeman @ 2008-12-06 20:35 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-raid, Alan Piszcz, xfs

On Sat, 2008-12-06 at 09:36 -0600, Eric Sandeen wrote:
> Justin Piszcz wrote:
> > Someone should write a document with XFS and barrier support, if I recall,
> > in the past, they never worked right on raid1 or raid5 devices, but it
> > appears now they they work on RAID1, which slows down performance ~12 times!!
> 
> What sort of document do you propose?  xfs will enable barriers on any
> block device which will support them, and after:
> 
> deeb5912db12e8b7ccf3f4b1afaad60bc29abed9
> 
> [XFS] Disable queue flag test in barrier check.
> 
> xfs is able to determine, via a test IO, that md raid1 does pass
> barriers through properly even though it doesn't set an ordered flag on
> the queue.
> 
> > l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 
> > 0.15user 1.54system 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
> > 0inputs+0outputs (0major+325minor)pagefaults 0swaps
> > l1:~#
> > 
> > l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
> > 0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
> > 0inputs+0outputs (0major+324minor)pagefaults 0swaps
> > l1:~#
> > 
> > Before:
> > /dev/md2        /               xfs     defaults,noatime  0       1
> > 
> > After:
> > /dev/md2        /               xfs     defaults,noatime,nobarrier,logbufs=8,logbsize=262144 0 1
> 
> Well, if you're investigating barriers can you do a test with just the
> barrier option change; though I expect you'll still find it to have a
> substantial impact.
> 
> > There is some mention of it here:
> > http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent
> > 
> > But basically I believe it should be noted in the kernel logs, FAQ or somewhere
> > because just through the process of upgrading the kernel, not changing fstab
> > or any other part of the system, performance can drop 12x just because the
> > newer kernels implement barriers.
> 
> Perhaps:
> 
> printk(KERN_ALERT "XFS is now looking after your metadata very
> carefully; if you prefer the old, fast, dangerous way, mount with -o
> nobarrier\n");
> 
> :)
> 
> Really, this just gets xfs on md raid1 in line with how it behaves on
> most other devices.
> 
> But I agree, some documentation/education is probably in order; if you
> choose to disable write caches or you have faith in the battery backup
> of your write cache, turning off barriers would be a good idea.  Justin,
> it might be interesting to do some tests with:
> 
> barrier,   write cache enabled
> nobarrier, write cache enabled
> nobarrier, write cache disabled
> 
> a 12x hit does hurt though...  If you're really motivated, try the same
> scenarios on ext3 and ext4 to see what the barrier hit is on those as well.
I have tested with ext3/xfs, and barriers have considerably more impact
on xfs than ext3. this is ~4 months old test, I do not have any precise
data anymore.


> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-06 14:28 ` Justin Piszcz
@ 2008-12-11  0:20   ` Bill Davidsen
  -1 siblings, 0 replies; 61+ messages in thread
From: Bill Davidsen @ 2008-12-11  0:20 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, xfs, Alan Piszcz

Justin Piszcz wrote:
> Someone should write a document with XFS and barrier support, if I 
> recall,
> in the past, they never worked right on raid1 or raid5 devices, but it
> appears now they they work on RAID1, which slows down performance ~12 
> times!!
>
I would expect you, as an experienced tester, to have done this 
measurement more rigorously!
I don't think it means much if this is what you did.

> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 0.15user 1.54system 
> 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+325minor)pagefaults 0swaps
> l1:~#
>
> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
> 0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (0major+324minor)pagefaults 0swaps
> l1:~#
>
Before doing any disk test you need to start by dropping cache, to be 
sure the appropriate reproducible things happen. And in doing a timing 
test, you need to end with a sync for the same reason.

So:
 echo 1 >/proc/sys/vm/drop_caches
 time bash -c "YOUR TEST; sync"

This will give you a fair shot at being able to reproduce the results, 
done on an otherwise unloaded system.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-11  0:20   ` Bill Davidsen
  0 siblings, 0 replies; 61+ messages in thread
From: Bill Davidsen @ 2008-12-11  0:20 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-raid, Alan Piszcz, xfs

Justin Piszcz wrote:
> Someone should write a document with XFS and barrier support, if I 
> recall,
> in the past, they never worked right on raid1 or raid5 devices, but it
> appears now they they work on RAID1, which slows down performance ~12 
> times!!
>
I would expect you, as an experienced tester, to have done this 
measurement more rigorously!
I don't think it means much if this is what you did.

> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 0.15user 1.54system 
> 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+325minor)pagefaults 0swaps
> l1:~#
>
> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
> 0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (0major+324minor)pagefaults 0swaps
> l1:~#
>
Before doing any disk test you need to start by dropping cache, to be 
sure the appropriate reproducible things happen. And in doing a timing 
test, you need to end with a sync for the same reason.

So:
 echo 1 >/proc/sys/vm/drop_caches
 time bash -c "YOUR TEST; sync"

This will give you a fair shot at being able to reproduce the results, 
done on an otherwise unloaded system.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-11  0:20   ` Bill Davidsen
@ 2008-12-11  9:18     ` Justin Piszcz
  -1 siblings, 0 replies; 61+ messages in thread
From: Justin Piszcz @ 2008-12-11  9:18 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid, xfs, Alan Piszcz



On Wed, 10 Dec 2008, Bill Davidsen wrote:

> Justin Piszcz wrote:
>> Someone should write a document with XFS and barrier support, if I recall,
>> in the past, they never worked right on raid1 or raid5 devices, but it
>> appears now they they work on RAID1, which slows down performance ~12 
>> times!!
>> 
> I would expect you, as an experienced tester, to have done this measurement 
> more rigorously!
> I don't think it means much if this is what you did.
>
>> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 0.15user 1.54system 
>> 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
>> 0inputs+0outputs (0major+325minor)pagefaults 0swaps
>> l1:~#
>> 
>> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
>> 0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
>> 0inputs+0outputs (0major+324minor)pagefaults 0swaps
>> l1:~#
>> 
> Before doing any disk test you need to start by dropping cache, to be sure 
> the appropriate reproducible things happen. And in doing a timing test, you 
> need to end with a sync for the same reason.
>
> So:
> echo 1 >/proc/sys/vm/drop_caches
> time bash -c "YOUR TEST; sync"
>
> This will give you a fair shot at being able to reproduce the results, done 
> on an otherwise unloaded system.
>
> -- 
> Bill Davidsen <davidsen@tmr.com>
> "Woe unto the statesman who makes war without a reason that will still
> be valid when the war is over..." Otto von Bismark 
>

Roughly the same for non-barriers:
# bash -c '/usr/bin/time tar xf linux-2.6.27.7.tar' 
0.15user 1.51system 0:12.95elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (4major+320minor)pagefaults 0swaps

For barriers I cannot test that right now but it most likely will be around the
same as well.

Justin.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-11  9:18     ` Justin Piszcz
  0 siblings, 0 replies; 61+ messages in thread
From: Justin Piszcz @ 2008-12-11  9:18 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid, Alan Piszcz, xfs



On Wed, 10 Dec 2008, Bill Davidsen wrote:

> Justin Piszcz wrote:
>> Someone should write a document with XFS and barrier support, if I recall,
>> in the past, they never worked right on raid1 or raid5 devices, but it
>> appears now they they work on RAID1, which slows down performance ~12 
>> times!!
>> 
> I would expect you, as an experienced tester, to have done this measurement 
> more rigorously!
> I don't think it means much if this is what you did.
>
>> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar 0.15user 1.54system 
>> 0:13.18elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
>> 0inputs+0outputs (0major+325minor)pagefaults 0swaps
>> l1:~#
>> 
>> l1:~# /usr/bin/time tar xf linux-2.6.27.7.tar
>> 0.14user 1.66system 2:39.68elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
>> 0inputs+0outputs (0major+324minor)pagefaults 0swaps
>> l1:~#
>> 
> Before doing any disk test you need to start by dropping cache, to be sure 
> the appropriate reproducible things happen. And in doing a timing test, you 
> need to end with a sync for the same reason.
>
> So:
> echo 1 >/proc/sys/vm/drop_caches
> time bash -c "YOUR TEST; sync"
>
> This will give you a fair shot at being able to reproduce the results, done 
> on an otherwise unloaded system.
>
> -- 
> Bill Davidsen <davidsen@tmr.com>
> "Woe unto the statesman who makes war without a reason that will still
> be valid when the war is over..." Otto von Bismark 
>

Roughly the same for non-barriers:
# bash -c '/usr/bin/time tar xf linux-2.6.27.7.tar' 
0.15user 1.51system 0:12.95elapsed 12%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (4major+320minor)pagefaults 0swaps

For barriers I cannot test that right now but it most likely will be around the
same as well.

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-11  9:18     ` Justin Piszcz
@ 2008-12-11  9:24       ` Justin Piszcz
  -1 siblings, 0 replies; 61+ messages in thread
From: Justin Piszcz @ 2008-12-11  9:24 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid, xfs, Alan Piszcz



On Thu, 11 Dec 2008, Justin Piszcz wrote:

>
>
> On Wed, 10 Dec 2008, Bill Davidsen wrote:
>
>> Justin Piszcz wrote:
>>> Someone should write a document with XFS and barrier support, if I recall,
>
>

Best not to do things in a rush, here is the correct benchmark:

# rm -rf linux*7
# sync
# echo 1 > /proc/sys/vm/drop_caches
# /usr/bin/time bash -c 'tar xf linux-2.6.27.7.tar; sync'
0.13user 1.62system 0:46.67elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (17major+955minor)pagefaults 0swaps

Justin.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-11  9:24       ` Justin Piszcz
  0 siblings, 0 replies; 61+ messages in thread
From: Justin Piszcz @ 2008-12-11  9:24 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid, Alan Piszcz, xfs



On Thu, 11 Dec 2008, Justin Piszcz wrote:

>
>
> On Wed, 10 Dec 2008, Bill Davidsen wrote:
>
>> Justin Piszcz wrote:
>>> Someone should write a document with XFS and barrier support, if I recall,
>
>

Best not to do things in a rush, here is the correct benchmark:

# rm -rf linux*7
# sync
# echo 1 > /proc/sys/vm/drop_caches
# /usr/bin/time bash -c 'tar xf linux-2.6.27.7.tar; sync'
0.13user 1.62system 0:46.67elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (17major+955minor)pagefaults 0swaps

Justin.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-06 15:36 ` Eric Sandeen
@ 2008-12-13 12:54     ` Justin Piszcz
  2008-12-13 12:54     ` Justin Piszcz
  1 sibling, 0 replies; 61+ messages in thread
From: Justin Piszcz @ 2008-12-13 12:54 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-raid, xfs, Alan Piszcz



On Sat, 6 Dec 2008, Eric Sandeen wrote:

> Justin Piszcz wrote:
>> Someone should write a document with XFS and barrier support, if I recall,
>> in the past, they never worked right on raid1 or raid5 devices, but it
>> appears now they they work on RAID1, which slows down performance ~12 times!!
>
>> There is some mention of it here:
>> http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent
>>
>> But basically I believe it should be noted in the kernel logs, FAQ or somewhere
>> because just through the process of upgrading the kernel, not changing fstab
>> or any other part of the system, performance can drop 12x just because the
>> newer kernels implement barriers.
>
> Perhaps:
>
> printk(KERN_ALERT "XFS is now looking after your metadata very
> carefully; if you prefer the old, fast, dangerous way, mount with -o
> nobarrier\n");
>
> :)
>
> Really, this just gets xfs on md raid1 in line with how it behaves on
> most other devices.
>
> But I agree, some documentation/education is probably in order; if you
> choose to disable write caches or you have faith in the battery backup
> of your write cache, turning off barriers would be a good idea.  Justin,
> it might be interesting to do some tests with:
>
> barrier,   write cache enabled
> nobarrier, write cache enabled
> nobarrier, write cache disabled
>
> a 12x hit does hurt though...  If you're really motivated, try the same
> scenarios on ext3 and ext4 to see what the barrier hit is on those as well.
>
> -Eric
>

No, I have not forgotten about this I have just been quite busy, I will test
this now, as before, I did not use sync because I was in a hurry and did not
have the ability to test, I am using a different machine/hw type but the
setup is the same, md/raid1 etc.

Since I will only be measuring barriers, per esandeen@ I have changed the mount
options from what I typically use to the defaults.

Here is the /etc/fstab entry:
/dev/md2        /               xfs     defaults        0       1

And the nobarrier entry:
/dev/md2        /               xfs     defaults,nobarrier        0       1

Stop cron and make sure nothing else is using the disk I/O, done:

# /etc/init.d/cron stop
Stopping periodic command scheduler: crond.

The benchmark:
# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
# echo 1 > /proc/sys/vm/drop_caches # (between tests)

== The tests ==

  KEY:
  barriers = "b"
  write_cache = "w"

  SUMMARY:
   b=on,w=on: 1:19.53 elapsed @ 2% CPU [BENCH_1]
  b=on,w=off: 1:23.59 elapsed @ 2% CPU [BENCH_2]
  b=off,w=on: 0:21.35 elapsed @ 9% CPU [BENCH_3]
b=off,w=off: 0:42.90 elapsed @ 4% CPU [BENCH_4]

So it depends on your settings as far as how slow barriers affect the I/O.

Scheduler used: CFQ.
[    0.168390] io scheduler cfq registered (default)

The raw details:

BENCH_1
# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
0.16user 1.85system 1:19.53elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+970minor)pagefaults 0swaps

BENCH_2
(turn off write-cache)

# hdparm -W0 /dev/sda

/dev/sda:
  setting drive write-caching to 0 (off)
  write-caching =  0 (off)
# hdparm -W0 /dev/sdb

/dev/sdb:
  setting drive write-caching to 0 (off)
  write-caching =  0 (off)
#

# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
0.16user 1.86system 1:23.59elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+953minor)pagefaults 0swaps

BENCH_3
(barriers=off; write_cache=on)
# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
0.18user 1.86system 0:21.35elapsed 9%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+952minor)pagefaults 0swaps

BENCH_4
(turn off write-cache)

# hdparm -W0 /dev/sda

/dev/sda:
  setting drive write-caching to 0 (off)
  write-caching =  0 (off)
# hdparm -W0 /dev/sdb

/dev/sdb:
  setting drive write-caching to 0 (off)
  write-caching =  0 (off)
#

# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
0.18user 1.76system 0:42.90elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+954minor)pagefaults 0swaps









^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-13 12:54     ` Justin Piszcz
  0 siblings, 0 replies; 61+ messages in thread
From: Justin Piszcz @ 2008-12-13 12:54 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-raid, Alan Piszcz, xfs



On Sat, 6 Dec 2008, Eric Sandeen wrote:

> Justin Piszcz wrote:
>> Someone should write a document with XFS and barrier support, if I recall,
>> in the past, they never worked right on raid1 or raid5 devices, but it
>> appears now they they work on RAID1, which slows down performance ~12 times!!
>
>> There is some mention of it here:
>> http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent
>>
>> But basically I believe it should be noted in the kernel logs, FAQ or somewhere
>> because just through the process of upgrading the kernel, not changing fstab
>> or any other part of the system, performance can drop 12x just because the
>> newer kernels implement barriers.
>
> Perhaps:
>
> printk(KERN_ALERT "XFS is now looking after your metadata very
> carefully; if you prefer the old, fast, dangerous way, mount with -o
> nobarrier\n");
>
> :)
>
> Really, this just gets xfs on md raid1 in line with how it behaves on
> most other devices.
>
> But I agree, some documentation/education is probably in order; if you
> choose to disable write caches or you have faith in the battery backup
> of your write cache, turning off barriers would be a good idea.  Justin,
> it might be interesting to do some tests with:
>
> barrier,   write cache enabled
> nobarrier, write cache enabled
> nobarrier, write cache disabled
>
> a 12x hit does hurt though...  If you're really motivated, try the same
> scenarios on ext3 and ext4 to see what the barrier hit is on those as well.
>
> -Eric
>

No, I have not forgotten about this I have just been quite busy, I will test
this now, as before, I did not use sync because I was in a hurry and did not
have the ability to test, I am using a different machine/hw type but the
setup is the same, md/raid1 etc.

Since I will only be measuring barriers, per esandeen@ I have changed the mount
options from what I typically use to the defaults.

Here is the /etc/fstab entry:
/dev/md2        /               xfs     defaults        0       1

And the nobarrier entry:
/dev/md2        /               xfs     defaults,nobarrier        0       1

Stop cron and make sure nothing else is using the disk I/O, done:

# /etc/init.d/cron stop
Stopping periodic command scheduler: crond.

The benchmark:
# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
# echo 1 > /proc/sys/vm/drop_caches # (between tests)

== The tests ==

  KEY:
  barriers = "b"
  write_cache = "w"

  SUMMARY:
   b=on,w=on: 1:19.53 elapsed @ 2% CPU [BENCH_1]
  b=on,w=off: 1:23.59 elapsed @ 2% CPU [BENCH_2]
  b=off,w=on: 0:21.35 elapsed @ 9% CPU [BENCH_3]
b=off,w=off: 0:42.90 elapsed @ 4% CPU [BENCH_4]

So it depends on your settings as far as how slow barriers affect the I/O.

Scheduler used: CFQ.
[    0.168390] io scheduler cfq registered (default)

The raw details:

BENCH_1
# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
0.16user 1.85system 1:19.53elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+970minor)pagefaults 0swaps

BENCH_2
(turn off write-cache)

# hdparm -W0 /dev/sda

/dev/sda:
  setting drive write-caching to 0 (off)
  write-caching =  0 (off)
# hdparm -W0 /dev/sdb

/dev/sdb:
  setting drive write-caching to 0 (off)
  write-caching =  0 (off)
#

# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
0.16user 1.86system 1:23.59elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+953minor)pagefaults 0swaps

BENCH_3
(barriers=off; write_cache=on)
# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
0.18user 1.86system 0:21.35elapsed 9%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+952minor)pagefaults 0swaps

BENCH_4
(turn off write-cache)

# hdparm -W0 /dev/sda

/dev/sda:
  setting drive write-caching to 0 (off)
  write-caching =  0 (off)
# hdparm -W0 /dev/sdb

/dev/sdb:
  setting drive write-caching to 0 (off)
  write-caching =  0 (off)
#

# /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
0.18user 1.76system 0:42.90elapsed 4%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+954minor)pagefaults 0swaps








_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-13 12:54     ` Justin Piszcz
@ 2008-12-13 17:26       ` Martin Steigerwald
  -1 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-13 17:26 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-raid, Alan Piszcz, Eric Sandeen, xfs


[-- Attachment #1.1: Type: text/plain, Size: 3397 bytes --]

Am Samstag 13 Dezember 2008 schrieb Justin Piszcz:
> On Sat, 6 Dec 2008, Eric Sandeen wrote:
> > Justin Piszcz wrote:
> >> Someone should write a document with XFS and barrier support, if I
> >> recall, in the past, they never worked right on raid1 or raid5
> >> devices, but it appears now they they work on RAID1, which slows
> >> down performance ~12 times!!
> >>
> >> There is some mention of it here:
> >> http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent
> >>
> >> But basically I believe it should be noted in the kernel logs, FAQ
> >> or somewhere because just through the process of upgrading the
> >> kernel, not changing fstab or any other part of the system,
> >> performance can drop 12x just because the newer kernels implement
> >> barriers.
> >
> > Perhaps:
> >
> > printk(KERN_ALERT "XFS is now looking after your metadata very
> > carefully; if you prefer the old, fast, dangerous way, mount with -o
> > nobarrier\n");
> >
> > :)
> >
> > Really, this just gets xfs on md raid1 in line with how it behaves on
> > most other devices.
> >
> > But I agree, some documentation/education is probably in order; if
> > you choose to disable write caches or you have faith in the battery
> > backup of your write cache, turning off barriers would be a good
> > idea.  Justin, it might be interesting to do some tests with:
> >
> > barrier,   write cache enabled
> > nobarrier, write cache enabled
> > nobarrier, write cache disabled
> >
> > a 12x hit does hurt though...  If you're really motivated, try the
> > same scenarios on ext3 and ext4 to see what the barrier hit is on
> > those as well.
> >
> > -Eric
>
> No, I have not forgotten about this I have just been quite busy, I will
> test this now, as before, I did not use sync because I was in a hurry
> and did not have the ability to test, I am using a different machine/hw
> type but the setup is the same, md/raid1 etc.
>
> Since I will only be measuring barriers, per esandeen@ I have changed
> the mount options from what I typically use to the defaults.

[...]

> The benchmark:
> # /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
> # echo 1 > /proc/sys/vm/drop_caches # (between tests)
>
> == The tests ==
>
>   KEY:
>   barriers = "b"
>   write_cache = "w"
>
>   SUMMARY:
>    b=on,w=on: 1:19.53 elapsed @ 2% CPU [BENCH_1]
>   b=on,w=off: 1:23.59 elapsed @ 2% CPU [BENCH_2]
>   b=off,w=on: 0:21.35 elapsed @ 9% CPU [BENCH_3]
> b=off,w=off: 0:42.90 elapsed @ 4% CPU [BENCH_4]

This is quite similar to what I got on my laptop without any RAID 
setup[1]. At least without barriers it was faster in all of my tar -xf 
linux-2.6.27.tar.bz2 and rm -rf linux-2.6.27 tests.

At the moment it appears to me that disabling write cache may often give 
more performance than using barriers. And this doesn't match my 
expectation of write barriers as a feature that enhances performance. 
Right now a "nowcache" option and having this as default appears to make 
more sense than defaulting to barriers. But I think this needs more 
testing than just those simple high meta data load tests. Anyway I am 
happy cause I have a way to speed up XFS ;-).

[1] http://oss.sgi.com/archives/xfs/2008-12/msg00244.html

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-13 17:26       ` Martin Steigerwald
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-13 17:26 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-raid, Alan Piszcz, Eric Sandeen


[-- Attachment #1.1: Type: text/plain, Size: 3397 bytes --]

Am Samstag 13 Dezember 2008 schrieb Justin Piszcz:
> On Sat, 6 Dec 2008, Eric Sandeen wrote:
> > Justin Piszcz wrote:
> >> Someone should write a document with XFS and barrier support, if I
> >> recall, in the past, they never worked right on raid1 or raid5
> >> devices, but it appears now they they work on RAID1, which slows
> >> down performance ~12 times!!
> >>
> >> There is some mention of it here:
> >> http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent
> >>
> >> But basically I believe it should be noted in the kernel logs, FAQ
> >> or somewhere because just through the process of upgrading the
> >> kernel, not changing fstab or any other part of the system,
> >> performance can drop 12x just because the newer kernels implement
> >> barriers.
> >
> > Perhaps:
> >
> > printk(KERN_ALERT "XFS is now looking after your metadata very
> > carefully; if you prefer the old, fast, dangerous way, mount with -o
> > nobarrier\n");
> >
> > :)
> >
> > Really, this just gets xfs on md raid1 in line with how it behaves on
> > most other devices.
> >
> > But I agree, some documentation/education is probably in order; if
> > you choose to disable write caches or you have faith in the battery
> > backup of your write cache, turning off barriers would be a good
> > idea.  Justin, it might be interesting to do some tests with:
> >
> > barrier,   write cache enabled
> > nobarrier, write cache enabled
> > nobarrier, write cache disabled
> >
> > a 12x hit does hurt though...  If you're really motivated, try the
> > same scenarios on ext3 and ext4 to see what the barrier hit is on
> > those as well.
> >
> > -Eric
>
> No, I have not forgotten about this I have just been quite busy, I will
> test this now, as before, I did not use sync because I was in a hurry
> and did not have the ability to test, I am using a different machine/hw
> type but the setup is the same, md/raid1 etc.
>
> Since I will only be measuring barriers, per esandeen@ I have changed
> the mount options from what I typically use to the defaults.

[...]

> The benchmark:
> # /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
> # echo 1 > /proc/sys/vm/drop_caches # (between tests)
>
> == The tests ==
>
>   KEY:
>   barriers = "b"
>   write_cache = "w"
>
>   SUMMARY:
>    b=on,w=on: 1:19.53 elapsed @ 2% CPU [BENCH_1]
>   b=on,w=off: 1:23.59 elapsed @ 2% CPU [BENCH_2]
>   b=off,w=on: 0:21.35 elapsed @ 9% CPU [BENCH_3]
> b=off,w=off: 0:42.90 elapsed @ 4% CPU [BENCH_4]

This is quite similar to what I got on my laptop without any RAID 
setup[1]. At least without barriers it was faster in all of my tar -xf 
linux-2.6.27.tar.bz2 and rm -rf linux-2.6.27 tests.

At the moment it appears to me that disabling write cache may often give 
more performance than using barriers. And this doesn't match my 
expectation of write barriers as a feature that enhances performance. 
Right now a "nowcache" option and having this as default appears to make 
more sense than defaulting to barriers. But I think this needs more 
testing than just those simple high meta data load tests. Anyway I am 
happy cause I have a way to speed up XFS ;-).

[1] http://oss.sgi.com/archives/xfs/2008-12/msg00244.html

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-13 17:26       ` Martin Steigerwald
@ 2008-12-13 17:40         ` Eric Sandeen
  -1 siblings, 0 replies; 61+ messages in thread
From: Eric Sandeen @ 2008-12-13 17:40 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-xfs, linux-raid, Alan Piszcz

Martin Steigerwald wrote:

> At the moment it appears to me that disabling write cache may often give 
> more performance than using barriers. And this doesn't match my 
> expectation of write barriers as a feature that enhances performance. 

Why do you have that expectation?  I've never seen barriers advertised
as enhancing performance.  :)

I do wonder why barriers on, write cache off is so slow; I'd have
thought the barriers were a no-op.  Maybe I'm missing something.

> Right now a "nowcache" option and having this as default appears to make 
> more sense than defaulting to barriers. 

I don't think that turning off write cache is something the filesystem
can do; you have to take that as an administrative step on your block
devices.

> But I think this needs more 
> testing than just those simple high meta data load tests. Anyway I am 
> happy cause I have a way to speed up XFS ;-).

My only hand-wavy concern is whether this has any adverse physical
effect on the drive (no cache == lots more head movement etc?) but then
barriers are constantly flushing/invalidating that cache, so it's
probably a wash.  And really, I have no idea.  :)

-Eric


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-13 17:40         ` Eric Sandeen
  0 siblings, 0 replies; 61+ messages in thread
From: Eric Sandeen @ 2008-12-13 17:40 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-xfs, linux-raid, Alan Piszcz

Martin Steigerwald wrote:

> At the moment it appears to me that disabling write cache may often give 
> more performance than using barriers. And this doesn't match my 
> expectation of write barriers as a feature that enhances performance. 

Why do you have that expectation?  I've never seen barriers advertised
as enhancing performance.  :)

I do wonder why barriers on, write cache off is so slow; I'd have
thought the barriers were a no-op.  Maybe I'm missing something.

> Right now a "nowcache" option and having this as default appears to make 
> more sense than defaulting to barriers. 

I don't think that turning off write cache is something the filesystem
can do; you have to take that as an administrative step on your block
devices.

> But I think this needs more 
> testing than just those simple high meta data load tests. Anyway I am 
> happy cause I have a way to speed up XFS ;-).

My only hand-wavy concern is whether this has any adverse physical
effect on the drive (no cache == lots more head movement etc?) but then
barriers are constantly flushing/invalidating that cache, so it's
probably a wash.  And really, I have no idea.  :)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-13 17:26       ` Martin Steigerwald
@ 2008-12-13 18:01         ` David Lethe
  -1 siblings, 0 replies; 61+ messages in thread
From: David Lethe @ 2008-12-13 18:01 UTC (permalink / raw)
  To: Martin Steigerwald, linux-xfs
  Cc: Justin Piszcz, Eric Sandeen, linux-raid, Alan Piszcz, xfs



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Martin Steigerwald
> Sent: Saturday, December 13, 2008 11:26 AM
> To: linux-xfs@oss.sgi.com
> Cc: Justin Piszcz; Eric Sandeen; linux-raid@vger.kernel.org; Alan
> Piszcz; xfs@oss.sgi.com
> Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers
> [xfs]
> 
> Am Samstag 13 Dezember 2008 schrieb Justin Piszcz:
> > On Sat, 6 Dec 2008, Eric Sandeen wrote:
> > > Justin Piszcz wrote:
> > >> Someone should write a document with XFS and barrier support, if
I
> > >> recall, in the past, they never worked right on raid1 or raid5
> > >> devices, but it appears now they they work on RAID1, which slows
> > >> down performance ~12 times!!
> > >>
> > >> There is some mention of it here:
> > >> http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent
> > >>
> > >> But basically I believe it should be noted in the kernel logs,
FAQ
> > >> or somewhere because just through the process of upgrading the
> > >> kernel, not changing fstab or any other part of the system,
> > >> performance can drop 12x just because the newer kernels implement
> > >> barriers.
> > >
> > > Perhaps:
> > >
> > > printk(KERN_ALERT "XFS is now looking after your metadata very
> > > carefully; if you prefer the old, fast, dangerous way, mount with
-
> o
> > > nobarrier\n");
> > >
> > > :)
> > >
> > > Really, this just gets xfs on md raid1 in line with how it behaves
> > > on most other devices.
> > >
> > > But I agree, some documentation/education is probably in order; if
> > > you choose to disable write caches or you have faith in the
battery
> > > backup of your write cache, turning off barriers would be a good
> > > idea.  Justin, it might be interesting to do some tests with:
> > >
> > > barrier,   write cache enabled
> > > nobarrier, write cache enabled
> > > nobarrier, write cache disabled
> > >
> > > a 12x hit does hurt though...  If you're really motivated, try the
> > > same scenarios on ext3 and ext4 to see what the barrier hit is on
> > > those as well.
> > >
> > > -Eric
> >
> > No, I have not forgotten about this I have just been quite busy, I
> > will test this now, as before, I did not use sync because I was in a
> > hurry and did not have the ability to test, I am using a different
> > machine/hw type but the setup is the same, md/raid1 etc.
> >
> > Since I will only be measuring barriers, per esandeen@ I have
changed
> > the mount options from what I typically use to the defaults.
> 
> [...]
> 
> > The benchmark:
> > # /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
> > # echo 1 > /proc/sys/vm/drop_caches # (between tests)
> >
> > == The tests ==
> >
> >   KEY:
> >   barriers = "b"
> >   write_cache = "w"
> >
> >   SUMMARY:
> >    b=on,w=on: 1:19.53 elapsed @ 2% CPU [BENCH_1]
> >   b=on,w=off: 1:23.59 elapsed @ 2% CPU [BENCH_2]
> >   b=off,w=on: 0:21.35 elapsed @ 9% CPU [BENCH_3]
> > b=off,w=off: 0:42.90 elapsed @ 4% CPU [BENCH_4]
> 
> This is quite similar to what I got on my laptop without any RAID
> setup[1]. At least without barriers it was faster in all of my tar -xf
> linux-2.6.27.tar.bz2 and rm -rf linux-2.6.27 tests.
> 
> At the moment it appears to me that disabling write cache may often
> give more performance than using barriers. And this doesn't match my
> expectation of write barriers as a feature that enhances performance.
> Right now a "nowcache" option and having this as default appears to
> make more sense than defaulting to barriers. But I think this needs
> more testing than just those simple high meta data load tests. Anyway
I
> am happy cause I have a way to speed up XFS ;-).
> 
> [1] http://oss.sgi.com/archives/xfs/2008-12/msg00244.html
> 
> Ciao,
> --
> Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
> GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

Consider if write cache is enabled, and 128 blocks are in write cache
...
waiting to be flushed.

If those 128 blocks are not needed again before it is time to flush, 
then not only did you waste cycles copying those 128 blocks into cache,
but you prevented those same 128 block from being used by read cache,
buffers,
whatever..  You also have overhead of cache lookup, and no matter what,
you still
have to flush cache eventually.   If you are doing extended writes, then
the cache
will fill up quickly, so it hurts you.  Write cache is of greatest
benefit on a 
transactional environment, like database, and can hurt performance on
benchmarks,
rebuilds, etc .. depending on whether or not the extended operations can
actually 
save a disk I/O by getting information from the cache before it is time
to flush
cache to disk.

If you have SCSI, FC, or SAS disks, then you can query the drive's cache
log pages
(they are in vendor-specific fields for some drives), to see how the
cache is being
Utilized and determine relative efficiency.


David




^ permalink raw reply	[flat|nested] 61+ messages in thread

* RE: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-13 18:01         ` David Lethe
  0 siblings, 0 replies; 61+ messages in thread
From: David Lethe @ 2008-12-13 18:01 UTC (permalink / raw)
  To: Martin Steigerwald, linux-xfs; +Cc: linux-raid, Alan Piszcz, Eric Sandeen



> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Martin Steigerwald
> Sent: Saturday, December 13, 2008 11:26 AM
> To: linux-xfs@oss.sgi.com
> Cc: Justin Piszcz; Eric Sandeen; linux-raid@vger.kernel.org; Alan
> Piszcz; xfs@oss.sgi.com
> Subject: Re: 12x performance drop on md/linux+sw raid1 due to barriers
> [xfs]
> 
> Am Samstag 13 Dezember 2008 schrieb Justin Piszcz:
> > On Sat, 6 Dec 2008, Eric Sandeen wrote:
> > > Justin Piszcz wrote:
> > >> Someone should write a document with XFS and barrier support, if
I
> > >> recall, in the past, they never worked right on raid1 or raid5
> > >> devices, but it appears now they they work on RAID1, which slows
> > >> down performance ~12 times!!
> > >>
> > >> There is some mention of it here:
> > >> http://oss.sgi.com/projects/xfs/faq.html#wcache_persistent
> > >>
> > >> But basically I believe it should be noted in the kernel logs,
FAQ
> > >> or somewhere because just through the process of upgrading the
> > >> kernel, not changing fstab or any other part of the system,
> > >> performance can drop 12x just because the newer kernels implement
> > >> barriers.
> > >
> > > Perhaps:
> > >
> > > printk(KERN_ALERT "XFS is now looking after your metadata very
> > > carefully; if you prefer the old, fast, dangerous way, mount with
-
> o
> > > nobarrier\n");
> > >
> > > :)
> > >
> > > Really, this just gets xfs on md raid1 in line with how it behaves
> > > on most other devices.
> > >
> > > But I agree, some documentation/education is probably in order; if
> > > you choose to disable write caches or you have faith in the
battery
> > > backup of your write cache, turning off barriers would be a good
> > > idea.  Justin, it might be interesting to do some tests with:
> > >
> > > barrier,   write cache enabled
> > > nobarrier, write cache enabled
> > > nobarrier, write cache disabled
> > >
> > > a 12x hit does hurt though...  If you're really motivated, try the
> > > same scenarios on ext3 and ext4 to see what the barrier hit is on
> > > those as well.
> > >
> > > -Eric
> >
> > No, I have not forgotten about this I have just been quite busy, I
> > will test this now, as before, I did not use sync because I was in a
> > hurry and did not have the ability to test, I am using a different
> > machine/hw type but the setup is the same, md/raid1 etc.
> >
> > Since I will only be measuring barriers, per esandeen@ I have
changed
> > the mount options from what I typically use to the defaults.
> 
> [...]
> 
> > The benchmark:
> > # /usr/bin/time bash -c 'tar xf linux-2.6.27.8.tar; sync'
> > # echo 1 > /proc/sys/vm/drop_caches # (between tests)
> >
> > == The tests ==
> >
> >   KEY:
> >   barriers = "b"
> >   write_cache = "w"
> >
> >   SUMMARY:
> >    b=on,w=on: 1:19.53 elapsed @ 2% CPU [BENCH_1]
> >   b=on,w=off: 1:23.59 elapsed @ 2% CPU [BENCH_2]
> >   b=off,w=on: 0:21.35 elapsed @ 9% CPU [BENCH_3]
> > b=off,w=off: 0:42.90 elapsed @ 4% CPU [BENCH_4]
> 
> This is quite similar to what I got on my laptop without any RAID
> setup[1]. At least without barriers it was faster in all of my tar -xf
> linux-2.6.27.tar.bz2 and rm -rf linux-2.6.27 tests.
> 
> At the moment it appears to me that disabling write cache may often
> give more performance than using barriers. And this doesn't match my
> expectation of write barriers as a feature that enhances performance.
> Right now a "nowcache" option and having this as default appears to
> make more sense than defaulting to barriers. But I think this needs
> more testing than just those simple high meta data load tests. Anyway
I
> am happy cause I have a way to speed up XFS ;-).
> 
> [1] http://oss.sgi.com/archives/xfs/2008-12/msg00244.html
> 
> Ciao,
> --
> Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
> GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

Consider if write cache is enabled, and 128 blocks are in write cache
...
waiting to be flushed.

If those 128 blocks are not needed again before it is time to flush, 
then not only did you waste cycles copying those 128 blocks into cache,
but you prevented those same 128 block from being used by read cache,
buffers,
whatever..  You also have overhead of cache lookup, and no matter what,
you still
have to flush cache eventually.   If you are doing extended writes, then
the cache
will fill up quickly, so it hurts you.  Write cache is of greatest
benefit on a 
transactional environment, like database, and can hurt performance on
benchmarks,
rebuilds, etc .. depending on whether or not the extended operations can
actually 
save a disk I/O by getting information from the cache before it is time
to flush
cache to disk.

If you have SCSI, FC, or SAS disks, then you can query the drive's cache
log pages
(they are in vendor-specific fields for some drives), to see how the
cache is being
Utilized and determine relative efficiency.


David




_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-13 17:40         ` Eric Sandeen
@ 2008-12-14  3:31           ` Redeeman
  -1 siblings, 0 replies; 61+ messages in thread
From: Redeeman @ 2008-12-14  3:31 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Martin Steigerwald, linux-xfs, linux-raid, Alan Piszcz

On Sat, 2008-12-13 at 11:40 -0600, Eric Sandeen wrote:
> Martin Steigerwald wrote:
> 
> > At the moment it appears to me that disabling write cache may often give 
> > more performance than using barriers. And this doesn't match my 
> > expectation of write barriers as a feature that enhances performance. 
> 
> Why do you have that expectation?  I've never seen barriers advertised
> as enhancing performance.  :)

My initial thoughts were that write barriers would enhance performance,
in that, you could have write cache on. So its really more of an
expectation that wc+barriers on, performs better than wc+barriers off :)

> 
> I do wonder why barriers on, write cache off is so slow; I'd have
> thought the barriers were a no-op.  Maybe I'm missing something.
> 
> > Right now a "nowcache" option and having this as default appears to make 
> > more sense than defaulting to barriers. 
> 
> I don't think that turning off write cache is something the filesystem
> can do; you have to take that as an administrative step on your block
> devices.
> 
> > But I think this needs more 
> > testing than just those simple high meta data load tests. Anyway I am 
> > happy cause I have a way to speed up XFS ;-).
> 
> My only hand-wavy concern is whether this has any adverse physical
> effect on the drive (no cache == lots more head movement etc?) but then
> barriers are constantly flushing/invalidating that cache, so it's
> probably a wash.  And really, I have no idea.  :)
> 
> -Eric
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-14  3:31           ` Redeeman
  0 siblings, 0 replies; 61+ messages in thread
From: Redeeman @ 2008-12-14  3:31 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, Alan Piszcz, linux-raid

On Sat, 2008-12-13 at 11:40 -0600, Eric Sandeen wrote:
> Martin Steigerwald wrote:
> 
> > At the moment it appears to me that disabling write cache may often give 
> > more performance than using barriers. And this doesn't match my 
> > expectation of write barriers as a feature that enhances performance. 
> 
> Why do you have that expectation?  I've never seen barriers advertised
> as enhancing performance.  :)

My initial thoughts were that write barriers would enhance performance,
in that, you could have write cache on. So its really more of an
expectation that wc+barriers on, performs better than wc+barriers off :)

> 
> I do wonder why barriers on, write cache off is so slow; I'd have
> thought the barriers were a no-op.  Maybe I'm missing something.
> 
> > Right now a "nowcache" option and having this as default appears to make 
> > more sense than defaulting to barriers. 
> 
> I don't think that turning off write cache is something the filesystem
> can do; you have to take that as an administrative step on your block
> devices.
> 
> > But I think this needs more 
> > testing than just those simple high meta data load tests. Anyway I am 
> > happy cause I have a way to speed up XFS ;-).
> 
> My only hand-wavy concern is whether this has any adverse physical
> effect on the drive (no cache == lots more head movement etc?) but then
> barriers are constantly flushing/invalidating that cache, so it's
> probably a wash.  And really, I have no idea.  :)
> 
> -Eric
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14  3:31           ` Redeeman
@ 2008-12-14 14:02             ` Peter Grandi
  -1 siblings, 0 replies; 61+ messages in thread
From: Peter Grandi @ 2008-12-14 14:02 UTC (permalink / raw)
  To: Linux XFS, Linux RAID

First of all, why are you people sending TWO copies to the XFS
mailing list? (to both linux-xfs@oss.sgi.com and xfs@oss.sgi.com).

>>> At the moment it appears to me that disabling write cache
>>> may often give more performance than using barriers. And
>>> this doesn't match my expectation of write barriers as a
>>> feature that enhances performance.

>> Why do you have that expectation?  I've never seen barriers
>> advertised as enhancing performance.  :)

This entire discussion is based on the usual misleading and
pointless avoidance of the substance, in particular because of
stupid, shallow diregard for the particular nature of the
"benchmark" used.

Barriers can be used to create atomic storage transaction for
metadata or data. For data, they mean that 'fsync' does what is
expected to do. It is up to the application to issue 'fsync' as
often or as rarely as appropriate.

For metadata, it is the file system code itself that uses
barriers to do something like 'fsync' for metadata updates, and
enforce POSIX or whatever guarantees.

The "benchmark" used involves 290MB of data in around 26k files
and directories, that is the average inode size is around 11KB.

That means that an inode is created and flushed to disk every
11KB written; a metadata write barrier happens every 11KB.

A synchronization every 11KB is a very high rate, and it will
(unless the disk host adapter or the disk controller are clever
mor have battery backed memory for queues) involve a lot of
waiting for the barrier to complete, and presumably break the
smooth flow of data to the disk with pauses.

Also whether or not the host adapter or the conroller write
cache are disabled, 290MB will fit inside most recent hosts' RAM
entirely, and even adding 'sync' at the end will not help that
much as to helping with a meaningful comparison.

> My initial thoughts were that write barriers would enhance
> performance, in that, you could have write cache on.

Well, that all depends on whether the write caches (in the host
adapter or the controller) are persistent and how frequently
barriers are issued.

If the write caches are not persistent (at least for a while),
the hard disk controller or the host adapter cannot have more
than one barrier completion request in flight at a time, and if
a barrier completion is requested every 11KB that will be pretty
constraining.

Barriers are much more useful when the host adapter or the disk
controller can cache multiple transactions and then execute them
in the order in which barriers have been issued, so that the
host can pipeline transactions down to the last stage in the
chain, instead of operating the last stages synchronously or
semi-synchronously.

But talking about barriers in the context of metadata, and for a
"benchmark" which has a metadata barrier every 11KB, and without
knowing whether the storage subsystem can queue multiple barrier
operations seems to be pretty crass and meangingless, if not
misleading. A waste of time at best.

> So its really more of an expectation that wc+barriers on,
> performs better than wc+barriers off :)

This is of course a misstatement: perhaps you intended to write
that ''wc on+barriers on'' would perform better than ''wc off +
barriers off'.

As to this apparent anomaly, I am only mildly surprised, as
there are plenty of similar anomalies (why ever should have a
very large block device readahead to get decent performance from
MD block devices?), due to poorly ill conceived schemes in all
sorts of stages of the storage chain, from the sometimes
comically misguided misdesigns in the Linux block cache or
elevators or storage drivers, to the often even worse
"optimizations" embedded in the firmware of host adapters and
hard disk controllers.

Consider for example (and also as a hint towards less futile and
meaningless "benchmarks") the 'no-fsync' option of 'star', the
reasons for its existence and for the Linux related advice:

  http://gd.tuwien.ac.at/utils/schilling/man/star.html

    «-no-fsync
          Do not call  fsync(2)  for  each  file  that  has  been
          extracted  from  the archive. Using -no-fsync may speed
          up extraction on operating systems with slow  file  I/O
          (such  as  Linux),  but includes the risk that star may
          not be able to detect extraction  problems  that  occur
          after  the  call to close(2).»

Now ask yourself if you know whether GNU tar does 'fsync' or not
(a rather interesting detail, and the reasons why may also be
interesting...).
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-14 14:02             ` Peter Grandi
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Grandi @ 2008-12-14 14:02 UTC (permalink / raw)
  To: Linux XFS, Linux RAID

First of all, why are you people sending TWO copies to the XFS
mailing list? (to both linux-xfs@oss.sgi.com and xfs@oss.sgi.com).

>>> At the moment it appears to me that disabling write cache
>>> may often give more performance than using barriers. And
>>> this doesn't match my expectation of write barriers as a
>>> feature that enhances performance.

>> Why do you have that expectation?  I've never seen barriers
>> advertised as enhancing performance.  :)

This entire discussion is based on the usual misleading and
pointless avoidance of the substance, in particular because of
stupid, shallow diregard for the particular nature of the
"benchmark" used.

Barriers can be used to create atomic storage transaction for
metadata or data. For data, they mean that 'fsync' does what is
expected to do. It is up to the application to issue 'fsync' as
often or as rarely as appropriate.

For metadata, it is the file system code itself that uses
barriers to do something like 'fsync' for metadata updates, and
enforce POSIX or whatever guarantees.

The "benchmark" used involves 290MB of data in around 26k files
and directories, that is the average inode size is around 11KB.

That means that an inode is created and flushed to disk every
11KB written; a metadata write barrier happens every 11KB.

A synchronization every 11KB is a very high rate, and it will
(unless the disk host adapter or the disk controller are clever
mor have battery backed memory for queues) involve a lot of
waiting for the barrier to complete, and presumably break the
smooth flow of data to the disk with pauses.

Also whether or not the host adapter or the conroller write
cache are disabled, 290MB will fit inside most recent hosts' RAM
entirely, and even adding 'sync' at the end will not help that
much as to helping with a meaningful comparison.

> My initial thoughts were that write barriers would enhance
> performance, in that, you could have write cache on.

Well, that all depends on whether the write caches (in the host
adapter or the controller) are persistent and how frequently
barriers are issued.

If the write caches are not persistent (at least for a while),
the hard disk controller or the host adapter cannot have more
than one barrier completion request in flight at a time, and if
a barrier completion is requested every 11KB that will be pretty
constraining.

Barriers are much more useful when the host adapter or the disk
controller can cache multiple transactions and then execute them
in the order in which barriers have been issued, so that the
host can pipeline transactions down to the last stage in the
chain, instead of operating the last stages synchronously or
semi-synchronously.

But talking about barriers in the context of metadata, and for a
"benchmark" which has a metadata barrier every 11KB, and without
knowing whether the storage subsystem can queue multiple barrier
operations seems to be pretty crass and meangingless, if not
misleading. A waste of time at best.

> So its really more of an expectation that wc+barriers on,
> performs better than wc+barriers off :)

This is of course a misstatement: perhaps you intended to write
that ''wc on+barriers on'' would perform better than ''wc off +
barriers off'.

As to this apparent anomaly, I am only mildly surprised, as
there are plenty of similar anomalies (why ever should have a
very large block device readahead to get decent performance from
MD block devices?), due to poorly ill conceived schemes in all
sorts of stages of the storage chain, from the sometimes
comically misguided misdesigns in the Linux block cache or
elevators or storage drivers, to the often even worse
"optimizations" embedded in the firmware of host adapters and
hard disk controllers.

Consider for example (and also as a hint towards less futile and
meaningless "benchmarks") the 'no-fsync' option of 'star', the
reasons for its existence and for the Linux related advice:

  http://gd.tuwien.ac.at/utils/schilling/man/star.html

    «-no-fsync
          Do not call  fsync(2)  for  each  file  that  has  been
          extracted  from  the archive. Using -no-fsync may speed
          up extraction on operating systems with slow  file  I/O
          (such  as  Linux),  but includes the risk that star may
          not be able to detect extraction  problems  that  occur
          after  the  call to close(2).»

Now ask yourself if you know whether GNU tar does 'fsync' or not
(a rather interesting detail, and the reasons why may also be
interesting...).

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14  3:31           ` Redeeman
@ 2008-12-14 17:49             ` Martin Steigerwald
  -1 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-14 17:49 UTC (permalink / raw)
  To: linux-xfs; +Cc: Redeeman, Eric Sandeen, linux-raid, Alan Piszcz, xfs

Am Sonntag 14 Dezember 2008 schrieb Redeeman:
> On Sat, 2008-12-13 at 11:40 -0600, Eric Sandeen wrote:
> > Martin Steigerwald wrote:
> > > At the moment it appears to me that disabling write cache may often
> > > give more performance than using barriers. And this doesn't match
> > > my expectation of write barriers as a feature that enhances
> > > performance.
> >
> > Why do you have that expectation?  I've never seen barriers
> > advertised as enhancing performance.  :)
>
> My initial thoughts were that write barriers would enhance performance,
> in that, you could have write cache on. So its really more of an
> expectation that wc+barriers on, performs better than wc+barriers off
> :)

Exactly that. My expectation from my technical understanding of the write 
barrier feature is from most performant to least performant:

1) Write cache + no barrier, but NVRAM ;)
2) Write cache + barrier
3) No write cache, where is shouldn't matter whether barrier was enabled 
or not

With 1 write requests are unordered, thus meta data changes could be 
applied in place before landing into the journal for example, thus NVRAM 
is a must. With 2 write requests are unordered except for certain 
markers, the barriers that say: Anything before the barrier goes before 
and anything after the barrier goes after it. This leaves room for 
optimizing the write requests before and after - either in-kernel by an 
IO scheduler or in firmware by NCQ, TCQ, FUA. And with 3 write requests 
would always be ordered... and if the filesystems places a marker - a 
sync in this case - any write requests that are in flight till then have 
to land on disk before the filesystem can proceed.

From that understanding, which I explained in detail in my Linux-Magazin 
article[1] I always thought that write cache + barrier has to be faster 
than no write cache.

Well I am ready to learn more. But for me until now that was the whole 
point of the effort with write barriers. Seems I completely misunderstood 
their purpose if thats not what they where meant for.

[1] Only in german, it had een translated to english but never published: 
http://www.linux-magazin.de/online_artikel/beschraenktes_schreiben

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-14 17:49             ` Martin Steigerwald
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-14 17:49 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-raid, Eric Sandeen, Alan Piszcz, Redeeman

Am Sonntag 14 Dezember 2008 schrieb Redeeman:
> On Sat, 2008-12-13 at 11:40 -0600, Eric Sandeen wrote:
> > Martin Steigerwald wrote:
> > > At the moment it appears to me that disabling write cache may often
> > > give more performance than using barriers. And this doesn't match
> > > my expectation of write barriers as a feature that enhances
> > > performance.
> >
> > Why do you have that expectation?  I've never seen barriers
> > advertised as enhancing performance.  :)
>
> My initial thoughts were that write barriers would enhance performance,
> in that, you could have write cache on. So its really more of an
> expectation that wc+barriers on, performs better than wc+barriers off
> :)

Exactly that. My expectation from my technical understanding of the write 
barrier feature is from most performant to least performant:

1) Write cache + no barrier, but NVRAM ;)
2) Write cache + barrier
3) No write cache, where is shouldn't matter whether barrier was enabled 
or not

With 1 write requests are unordered, thus meta data changes could be 
applied in place before landing into the journal for example, thus NVRAM 
is a must. With 2 write requests are unordered except for certain 
markers, the barriers that say: Anything before the barrier goes before 
and anything after the barrier goes after it. This leaves room for 
optimizing the write requests before and after - either in-kernel by an 
IO scheduler or in firmware by NCQ, TCQ, FUA. And with 3 write requests 
would always be ordered... and if the filesystems places a marker - a 
sync in this case - any write requests that are in flight till then have 
to land on disk before the filesystem can proceed.

>From that understanding, which I explained in detail in my Linux-Magazin 
article[1] I always thought that write cache + barrier has to be faster 
than no write cache.

Well I am ready to learn more. But for me until now that was the whole 
point of the effort with write barriers. Seems I completely misunderstood 
their purpose if thats not what they where meant for.

[1] Only in german, it had een translated to english but never published: 
http://www.linux-magazin.de/online_artikel/beschraenktes_schreiben

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14 14:02             ` Peter Grandi
@ 2008-12-14 18:12               ` Martin Steigerwald
  -1 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-14 18:12 UTC (permalink / raw)
  To: xfs; +Cc: Linux RAID

[-- Attachment #1: Type: text/plain, Size: 7700 bytes --]

Am Sonntag 14 Dezember 2008 schrieb Peter Grandi:
> First of all, why are you people sending TWO copies to the XFS
> mailing list? (to both linux-xfs@oss.sgi.com and xfs@oss.sgi.com).

Just took the CC as it seems to be custom on xfs mailinglist to take it. I 
stripped it this time.

> >>> At the moment it appears to me that disabling write cache
> >>> may often give more performance than using barriers. And
> >>> this doesn't match my expectation of write barriers as a
> >>> feature that enhances performance.
> >>
> >> Why do you have that expectation?  I've never seen barriers
> >> advertised as enhancing performance.  :)
>
> This entire discussion is based on the usual misleading and
> pointless avoidance of the substance, in particular because of
> stupid, shallow diregard for the particular nature of the
> "benchmark" used.
>
> Barriers can be used to create atomic storage transaction for
> metadata or data. For data, they mean that 'fsync' does what is
> expected to do. It is up to the application to issue 'fsync' as
> often or as rarely as appropriate.
>
> For metadata, it is the file system code itself that uses
> barriers to do something like 'fsync' for metadata updates, and
> enforce POSIX or whatever guarantees.
>
> The "benchmark" used involves 290MB of data in around 26k files
> and directories, that is the average inode size is around 11KB.
>
> That means that an inode is created and flushed to disk every
> 11KB written; a metadata write barrier happens every 11KB.
>
> A synchronization every 11KB is a very high rate, and it will
> (unless the disk host adapter or the disk controller are clever
> mor have battery backed memory for queues) involve a lot of
> waiting for the barrier to complete, and presumably break the
> smooth flow of data to the disk with pauses.

But - as far as I understood - the filesystem doesn't have to wait for 
barriers to complete, but could continue issuing IO requests happily. A 
barrier only means, any request prior to that have to land before and any 
after it after it. It doesn't mean that the barrier has to land 
immediately and the filesystem has to wait for this.

At least that always was the whole point of barriers for me. If thats not 
the case I misunderstood the purpose of barriers to the maximum extent 
possible.

> Also whether or not the host adapter or the conroller write
> cache are disabled, 290MB will fit inside most recent hosts' RAM
> entirely, and even adding 'sync' at the end will not help that
> much as to helping with a meaningful comparison.

Okay, so dropping caches would be required. Got that in the meantime.

> > My initial thoughts were that write barriers would enhance
> > performance, in that, you could have write cache on.
>
> Well, that all depends on whether the write caches (in the host
> adapter or the controller) are persistent and how frequently
> barriers are issued.
>
> If the write caches are not persistent (at least for a while),
> the hard disk controller or the host adapter cannot have more
> than one barrier completion request in flight at a time, and if
> a barrier completion is requested every 11KB that will be pretty
> constraining.

Hmmm, didn't know that. How comes? But the IO scheduler should be able to 
handle more than one barrier request at a time, shouldn't it? And even 
than how can it be slower writing 11 KB at a time than writing every IO 
request at a time - i.e. write cache *off*.

> Barriers are much more useful when the host adapter or the disk
> controller can cache multiple transactions and then execute them
> in the order in which barriers have been issued, so that the
> host can pipeline transactions down to the last stage in the
> chain, instead of operating the last stages synchronously or
> semi-synchronously.
>
> But talking about barriers in the context of metadata, and for a
> "benchmark" which has a metadata barrier every 11KB, and without
> knowing whether the storage subsystem can queue multiple barrier
> operations seems to be pretty crass and meangingless, if not
> misleading. A waste of time at best.

Hmmm, as far as I understood it would be that the IO scheduler would 
handle barrier requests itself if the device was not capable for queuing 
and ordering requests.

Only thing that occurs to me know, that with barriers off it has more 
freedom to order requests and that might matter for that metadata 
intensive workload. With barriers it can only order 11 KB of requests. 
Without it could order as much as it wants... but even then the 
filesystem would have to make sure that metadata changes land in the 
journal first and then in-place. And this would involve a sync, if no 
barrier request was possible.

So I still don't get why even that metadata intense workload of tar -xf 
linux-2.6.27.tar.bz2 - or may better bzip2 -d the tar before - should be 
slower with barriers + write cache on than with no barriers and write 
cache off.

> > So its really more of an expectation that wc+barriers on,
> > performs better than wc+barriers off :)
>
> This is of course a misstatement: perhaps you intended to write
> that ''wc on+barriers on'' would perform better than ''wc off +
> barriers off'.
>
> As to this apparent anomaly, I am only mildly surprised, as
> there are plenty of similar anomalies (why ever should have a
> very large block device readahead to get decent performance from
> MD block devices?), due to poorly ill conceived schemes in all
> sorts of stages of the storage chain, from the sometimes
> comically misguided misdesigns in the Linux block cache or
> elevators or storage drivers, to the often even worse
> "optimizations" embedded in the firmware of host adapters and
> hard disk controllers.

Well and then that is something that could potentially be fixed!

> Consider for example (and also as a hint towards less futile and
> meaningless "benchmarks") the 'no-fsync' option of 'star', the
> reasons for its existence and for the Linux related advice:
>
>   http://gd.tuwien.ac.at/utils/schilling/man/star.html
>
>     «-no-fsync
>           Do not call  fsync(2)  for  each  file  that  has  been
>           extracted  from  the archive. Using -no-fsync may speed
>           up extraction on operating systems with slow  file  I/O
>           (such  as  Linux),  but includes the risk that star may
>           not be able to detect extraction  problems  that  occur
>           after  the  call to close(2).»
>
> Now ask yourself if you know whether GNU tar does 'fsync' or not
> (a rather interesting detail, and the reasons why may also be
> interesting...).

Talking about less futile benchmarks and mentioning the manpage of a tool 
from a author who is known as Solaris advocate appears to be a bit futile 
in itself for me. Especially if the author tends to chime into into any 
discussion mentioning his name and at least in my experience is very 
difficult to talk with in a constructive manner. 

For me its important to look whether there might be reason to look in more 
detail at how efficient write barriers work on Linux. For that as I 
mentioned already, testing just this simple workload would not be enough. 
And testing just on XFS neither.

I think this is neither useless nor futile. The simplified benchmark IMHO 
has shown something that deserves further investigation. Nothing more, 
nothing less.

[1] http://oss.sgi.com/archives/xfs/2008-12/msg00244.html

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-14 18:12               ` Martin Steigerwald
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-14 18:12 UTC (permalink / raw)
  To: xfs; +Cc: Linux RAID


[-- Attachment #1.1: Type: text/plain, Size: 7700 bytes --]

Am Sonntag 14 Dezember 2008 schrieb Peter Grandi:
> First of all, why are you people sending TWO copies to the XFS
> mailing list? (to both linux-xfs@oss.sgi.com and xfs@oss.sgi.com).

Just took the CC as it seems to be custom on xfs mailinglist to take it. I 
stripped it this time.

> >>> At the moment it appears to me that disabling write cache
> >>> may often give more performance than using barriers. And
> >>> this doesn't match my expectation of write barriers as a
> >>> feature that enhances performance.
> >>
> >> Why do you have that expectation?  I've never seen barriers
> >> advertised as enhancing performance.  :)
>
> This entire discussion is based on the usual misleading and
> pointless avoidance of the substance, in particular because of
> stupid, shallow diregard for the particular nature of the
> "benchmark" used.
>
> Barriers can be used to create atomic storage transaction for
> metadata or data. For data, they mean that 'fsync' does what is
> expected to do. It is up to the application to issue 'fsync' as
> often or as rarely as appropriate.
>
> For metadata, it is the file system code itself that uses
> barriers to do something like 'fsync' for metadata updates, and
> enforce POSIX or whatever guarantees.
>
> The "benchmark" used involves 290MB of data in around 26k files
> and directories, that is the average inode size is around 11KB.
>
> That means that an inode is created and flushed to disk every
> 11KB written; a metadata write barrier happens every 11KB.
>
> A synchronization every 11KB is a very high rate, and it will
> (unless the disk host adapter or the disk controller are clever
> mor have battery backed memory for queues) involve a lot of
> waiting for the barrier to complete, and presumably break the
> smooth flow of data to the disk with pauses.

But - as far as I understood - the filesystem doesn't have to wait for 
barriers to complete, but could continue issuing IO requests happily. A 
barrier only means, any request prior to that have to land before and any 
after it after it. It doesn't mean that the barrier has to land 
immediately and the filesystem has to wait for this.

At least that always was the whole point of barriers for me. If thats not 
the case I misunderstood the purpose of barriers to the maximum extent 
possible.

> Also whether or not the host adapter or the conroller write
> cache are disabled, 290MB will fit inside most recent hosts' RAM
> entirely, and even adding 'sync' at the end will not help that
> much as to helping with a meaningful comparison.

Okay, so dropping caches would be required. Got that in the meantime.

> > My initial thoughts were that write barriers would enhance
> > performance, in that, you could have write cache on.
>
> Well, that all depends on whether the write caches (in the host
> adapter or the controller) are persistent and how frequently
> barriers are issued.
>
> If the write caches are not persistent (at least for a while),
> the hard disk controller or the host adapter cannot have more
> than one barrier completion request in flight at a time, and if
> a barrier completion is requested every 11KB that will be pretty
> constraining.

Hmmm, didn't know that. How comes? But the IO scheduler should be able to 
handle more than one barrier request at a time, shouldn't it? And even 
than how can it be slower writing 11 KB at a time than writing every IO 
request at a time - i.e. write cache *off*.

> Barriers are much more useful when the host adapter or the disk
> controller can cache multiple transactions and then execute them
> in the order in which barriers have been issued, so that the
> host can pipeline transactions down to the last stage in the
> chain, instead of operating the last stages synchronously or
> semi-synchronously.
>
> But talking about barriers in the context of metadata, and for a
> "benchmark" which has a metadata barrier every 11KB, and without
> knowing whether the storage subsystem can queue multiple barrier
> operations seems to be pretty crass and meangingless, if not
> misleading. A waste of time at best.

Hmmm, as far as I understood it would be that the IO scheduler would 
handle barrier requests itself if the device was not capable for queuing 
and ordering requests.

Only thing that occurs to me know, that with barriers off it has more 
freedom to order requests and that might matter for that metadata 
intensive workload. With barriers it can only order 11 KB of requests. 
Without it could order as much as it wants... but even then the 
filesystem would have to make sure that metadata changes land in the 
journal first and then in-place. And this would involve a sync, if no 
barrier request was possible.

So I still don't get why even that metadata intense workload of tar -xf 
linux-2.6.27.tar.bz2 - or may better bzip2 -d the tar before - should be 
slower with barriers + write cache on than with no barriers and write 
cache off.

> > So its really more of an expectation that wc+barriers on,
> > performs better than wc+barriers off :)
>
> This is of course a misstatement: perhaps you intended to write
> that ''wc on+barriers on'' would perform better than ''wc off +
> barriers off'.
>
> As to this apparent anomaly, I am only mildly surprised, as
> there are plenty of similar anomalies (why ever should have a
> very large block device readahead to get decent performance from
> MD block devices?), due to poorly ill conceived schemes in all
> sorts of stages of the storage chain, from the sometimes
> comically misguided misdesigns in the Linux block cache or
> elevators or storage drivers, to the often even worse
> "optimizations" embedded in the firmware of host adapters and
> hard disk controllers.

Well and then that is something that could potentially be fixed!

> Consider for example (and also as a hint towards less futile and
> meaningless "benchmarks") the 'no-fsync' option of 'star', the
> reasons for its existence and for the Linux related advice:
>
>   http://gd.tuwien.ac.at/utils/schilling/man/star.html
>
>     «-no-fsync
>           Do not call  fsync(2)  for  each  file  that  has  been
>           extracted  from  the archive. Using -no-fsync may speed
>           up extraction on operating systems with slow  file  I/O
>           (such  as  Linux),  but includes the risk that star may
>           not be able to detect extraction  problems  that  occur
>           after  the  call to close(2).»
>
> Now ask yourself if you know whether GNU tar does 'fsync' or not
> (a rather interesting detail, and the reasons why may also be
> interesting...).

Talking about less futile benchmarks and mentioning the manpage of a tool 
from a author who is known as Solaris advocate appears to be a bit futile 
in itself for me. Especially if the author tends to chime into into any 
discussion mentioning his name and at least in my experience is very 
difficult to talk with in a constructive manner. 

For me its important to look whether there might be reason to look in more 
detail at how efficient write barriers work on Linux. For that as I 
mentioned already, testing just this simple workload would not be enough. 
And testing just on XFS neither.

I think this is neither useless nor futile. The simplified benchmark IMHO 
has shown something that deserves further investigation. Nothing more, 
nothing less.

[1] http://oss.sgi.com/archives/xfs/2008-12/msg00244.html

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14 14:02             ` Peter Grandi
@ 2008-12-14 18:35               ` Martin Steigerwald
  -1 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-14 18:35 UTC (permalink / raw)
  To: linux-xfs; +Cc: Linux RAID

Am Sonntag 14 Dezember 2008 schrieb Peter Grandi:
> First of all, why are you people sending TWO copies to the XFS
> mailing list? (to both linux-xfs@oss.sgi.com and xfs@oss.sgi.com).
[...]
> > So its really more of an expectation that wc+barriers on,
> > performs better than wc+barriers off :)
>
> This is of course a misstatement: perhaps you intended to write
> that ''wc on+barriers on'' would perform better than ''wc off +
> barriers off'.

I think Redeeman said exactly that ;-). Either both on or both off.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-14 18:35               ` Martin Steigerwald
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-14 18:35 UTC (permalink / raw)
  To: linux-xfs; +Cc: Linux RAID

Am Sonntag 14 Dezember 2008 schrieb Peter Grandi:
> First of all, why are you people sending TWO copies to the XFS
> mailing list? (to both linux-xfs@oss.sgi.com and xfs@oss.sgi.com).
[...]
> > So its really more of an expectation that wc+barriers on,
> > performs better than wc+barriers off :)
>
> This is of course a misstatement: perhaps you intended to write
> that ''wc on+barriers on'' would perform better than ''wc off +
> barriers off'.

I think Redeeman said exactly that ;-). Either both on or both off.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14 18:12               ` Martin Steigerwald
@ 2008-12-14 22:02                 ` Peter Grandi
  -1 siblings, 0 replies; 61+ messages in thread
From: Peter Grandi @ 2008-12-14 22:02 UTC (permalink / raw)
  To: Linux XFS, Linux RAID

[ ... ]

> But - as far as I understood - the filesystem doesn't have to
> wait for barriers to complete, but could continue issuing IO
> requests happily. A barrier only means, any request prior to
> that have to land before and any after it after it.

> It doesn't mean that the barrier has to land immediately and
> the filesystem has to wait for this. At least that always was
> the whole point of barriers for me. If thats not the case I
> misunderstood the purpose of barriers to the maximum extent
> possible.

Unfortunately that seems the case.

The purpose of barriers is to guarantee that relevant data is
known to be on persistent storage (kind of hardware 'fsync').

In effect write barrier means "tell me when relevant data is on
persistent storage", or less precisely "flush/sync writes now
and tell me when it is done". Properties as to ordering are just
a side effect.

That is, the application (file system in the case of metadata,
user process in the case of data) knows that a barrier operation
is complete, it knows that all data involved in the barrier
operation are on persistent storage. In case of serially
dependent transactions, applications do wait until the previous
transaction is completed before starting the next one (e.g.
creating potentially many files in the same directory, something
that 'tar' does).

  "all data involved" is usually all previous writes, but in
  more sophisticated cases it can be just specific writes.

When an applications at transaction end points (for a file
system, metadata updates) issues a write barrier and then waits
for its completion.

If the host adapter/disk controllers don't have persistent
storage, then completion (should) only happen when the data
involved is actually on disk; if they do have it, then multiple
barriers can be outstanding, if the host adapter/disk controller
does support multiple outstanding operations (e.g. thanks to
tagged queueing).

The best case is when the IO subsystem supports all of these:

* tagged queueing: multiple write barriers can be outstanding;

* fine granule (specific writes, not all writes) barriers: just
  metadata writes need to be flushed to persistent storage, not
  any intervening data writes too;

* the host adapter and/or disk controller have persistent
  caches: as long as those caches have space, barriers can
  complete immediately, without waiting a write to disk.

It just happens that typical contemporary PC IO subsystems (at
the hardware level, not the Linux level) have none of those
features, except sometimes for NCQ which is a reduced form of
TCQ, and apparently is not that useful.

Write barriers are also useful without persistent caches, if
there is proper tagged queueing and fine granularity.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-14 22:02                 ` Peter Grandi
  0 siblings, 0 replies; 61+ messages in thread
From: Peter Grandi @ 2008-12-14 22:02 UTC (permalink / raw)
  To: Linux XFS, Linux RAID

[ ... ]

> But - as far as I understood - the filesystem doesn't have to
> wait for barriers to complete, but could continue issuing IO
> requests happily. A barrier only means, any request prior to
> that have to land before and any after it after it.

> It doesn't mean that the barrier has to land immediately and
> the filesystem has to wait for this. At least that always was
> the whole point of barriers for me. If thats not the case I
> misunderstood the purpose of barriers to the maximum extent
> possible.

Unfortunately that seems the case.

The purpose of barriers is to guarantee that relevant data is
known to be on persistent storage (kind of hardware 'fsync').

In effect write barrier means "tell me when relevant data is on
persistent storage", or less precisely "flush/sync writes now
and tell me when it is done". Properties as to ordering are just
a side effect.

That is, the application (file system in the case of metadata,
user process in the case of data) knows that a barrier operation
is complete, it knows that all data involved in the barrier
operation are on persistent storage. In case of serially
dependent transactions, applications do wait until the previous
transaction is completed before starting the next one (e.g.
creating potentially many files in the same directory, something
that 'tar' does).

  "all data involved" is usually all previous writes, but in
  more sophisticated cases it can be just specific writes.

When an applications at transaction end points (for a file
system, metadata updates) issues a write barrier and then waits
for its completion.

If the host adapter/disk controllers don't have persistent
storage, then completion (should) only happen when the data
involved is actually on disk; if they do have it, then multiple
barriers can be outstanding, if the host adapter/disk controller
does support multiple outstanding operations (e.g. thanks to
tagged queueing).

The best case is when the IO subsystem supports all of these:

* tagged queueing: multiple write barriers can be outstanding;

* fine granule (specific writes, not all writes) barriers: just
  metadata writes need to be flushed to persistent storage, not
  any intervening data writes too;

* the host adapter and/or disk controller have persistent
  caches: as long as those caches have space, barriers can
  complete immediately, without waiting a write to disk.

It just happens that typical contemporary PC IO subsystems (at
the hardware level, not the Linux level) have none of those
features, except sometimes for NCQ which is a reduced form of
TCQ, and apparently is not that useful.

Write barriers are also useful without persistent caches, if
there is proper tagged queueing and fine granularity.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-13 17:40         ` Eric Sandeen
@ 2008-12-14 23:36           ` Dave Chinner
  -1 siblings, 0 replies; 61+ messages in thread
From: Dave Chinner @ 2008-12-14 23:36 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Martin Steigerwald, linux-xfs, linux-raid, Alan Piszcz

On Sat, Dec 13, 2008 at 11:40:11AM -0600, Eric Sandeen wrote:
> Martin Steigerwald wrote:
> 
> > At the moment it appears to me that disabling write cache may often give 
> > more performance than using barriers. And this doesn't match my 
> > expectation of write barriers as a feature that enhances performance. 
> 
> Why do you have that expectation?  I've never seen barriers advertised
> as enhancing performance.  :)
> 
> I do wonder why barriers on, write cache off is so slow; I'd have
> thought the barriers were a no-op.  Maybe I'm missing something.

Barriers still enforce ordering in this case, so it affects the
elevator algorithm....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-14 23:36           ` Dave Chinner
  0 siblings, 0 replies; 61+ messages in thread
From: Dave Chinner @ 2008-12-14 23:36 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, Alan Piszcz, linux-raid

On Sat, Dec 13, 2008 at 11:40:11AM -0600, Eric Sandeen wrote:
> Martin Steigerwald wrote:
> 
> > At the moment it appears to me that disabling write cache may often give 
> > more performance than using barriers. And this doesn't match my 
> > expectation of write barriers as a feature that enhances performance. 
> 
> Why do you have that expectation?  I've never seen barriers advertised
> as enhancing performance.  :)
> 
> I do wonder why barriers on, write cache off is so slow; I'd have
> thought the barriers were a no-op.  Maybe I'm missing something.

Barriers still enforce ordering in this case, so it affects the
elevator algorithm....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14 23:36           ` Dave Chinner
  (?)
@ 2008-12-14 23:55           ` Eric Sandeen
  -1 siblings, 0 replies; 61+ messages in thread
From: Eric Sandeen @ 2008-12-14 23:55 UTC (permalink / raw)
  To: Eric Sandeen, Martin Steigerwald, linux-xfs, Alan Piszcz

Dave Chinner wrote:
> On Sat, Dec 13, 2008 at 11:40:11AM -0600, Eric Sandeen wrote:
>> Martin Steigerwald wrote:
>>
>>> At the moment it appears to me that disabling write cache may often give 
>>> more performance than using barriers. And this doesn't match my 
>>> expectation of write barriers as a feature that enhances performance. 
>> Why do you have that expectation?  I've never seen barriers advertised
>> as enhancing performance.  :)
>>
>> I do wonder why barriers on, write cache off is so slow; I'd have
>> thought the barriers were a no-op.  Maybe I'm missing something.
> 
> Barriers still enforce ordering in this case, so it affects the
> elevator algorithm....

(taking linux-raid off becase at this point it really has nothing to do
with the thread).

oh, er, so is nobarrier+nowritecache safe or not?  If the elevator can
reorder for us (even though the drive won't) then a journaling fs which
needs these ordering guarantees may still be in trouble?

Just when I think I have it all straight... :)

(ok, so now nobarrier+nowritecache+noop io scheduler might be an
interesting test).

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14 22:02                 ` Peter Grandi
  (?)
@ 2008-12-15 18:48                 ` Martin Steigerwald
  2008-12-15 22:50                   ` Peter Grandi
  -1 siblings, 1 reply; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-15 18:48 UTC (permalink / raw)
  To: linux-xfs

Am Sonntag 14 Dezember 2008 schrieb Peter Grandi:
> [ ... ]
> > But - as far as I understood - the filesystem doesn't have to
> > wait for barriers to complete, but could continue issuing IO
> > requests happily. A barrier only means, any request prior to
> > that have to land before and any after it after it.
> >
> > It doesn't mean that the barrier has to land immediately and
> > the filesystem has to wait for this. At least that always was
> > the whole point of barriers for me. If thats not the case I
> > misunderstood the purpose of barriers to the maximum extent
> > possible.
>
> Unfortunately that seems the case.
>
> The purpose of barriers is to guarantee that relevant data is
> known to be on persistent storage (kind of hardware 'fsync').
>
> In effect write barrier means "tell me when relevant data is on
> persistent storage", or less precisely "flush/sync writes now
> and tell me when it is done". Properties as to ordering are just
> a side effect.

Interesting to know. Thanks for long explaination.

Unfortunately in my understanding none of this is reflected by

Documentation/block/barrier.txt

Especially this mentions:

---------------------------------------------------------------------
I/O Barriers
============
Tejun Heo <htejun@gmail.com>, July 22 2005

I/O barrier requests are used to guarantee ordering around the barrier
requests.  Unless you're crazy enough to use disk drives for
implementing synchronization constructs (wow, sounds interesting...),
the ordering is meaningful only for write requests for things like
journal checkpoints.  All requests queued before a barrier request
must be finished (made it to the physical medium) before the barrier
request is started, and all requests queued after the barrier request
must be started only after the barrier request is finished (again,
made it to the physical medium)

In other words, I/O barrier requests have the following two properties.

1. Request ordering

Requests cannot pass the barrier request.  Preceding requests are
processed before the barrier and following requests after.

Depending on what features a drive supports, this can be done in one
of the following three ways.

i. For devices which have queue depth greater than 1 (TCQ devices) and
support ordered tags, block layer can just issue the barrier as an
ordered request and the lower level driver, controller and drive
itself are responsible for making sure that the ordering constraint is
met.  Most modern SCSI controllers/drives should support this.

NOTE: SCSI ordered tag isn't currently used due to limitation in the
      SCSI midlayer, see the following random notes section.

ii. For devices which have queue depth greater than 1 but don't
support ordered tags, block layer ensures that the requests preceding
a barrier request finishes before issuing the barrier request.  Also,
it defers requests following the barrier until the barrier request is
finished.  Older SCSI controllers/drives and SATA drives fall in this
category.

iii. Devices which have queue depth of 1.  This is a degenerate case
of ii.  Just keeping issue order suffices.  Ancient SCSI
controllers/drives and IDE drives are in this category.


2. Forced flushing to physical medium

Again, if you're not gonna do synchronization with disk drives (dang,
it sounds even more appealing now!), the reason you use I/O barriers
is mainly to protect filesystem integrity when power failure or some
other events abruptly stop the drive from operating and possibly make
the drive lose data in its cache.  So, I/O barriers need to guarantee
that requests actually get written to non-volatile medium in order.

There are four cases,

i. No write-back cache.  Keeping requests ordered is enough.

ii. Write-back cache but no flush operation.  There's no way to
guarantee physical-medium commit order.  This kind of devices can't to
I/O barriers.

iii. Write-back cache and flush operation but no FUA (forced unit
access).  We need two cache flushes - before and after the barrier
request.

iv. Write-back cache, flush operation and FUA.  We still need one
flush to make sure requests preceding a barrier are written to medium,
but post-barrier flush can be avoided by using FUA write on the
barrier itself.
---------------------------------------------------------------------

I do not see any mention of "tell me when its finished" in that file. It 
just mentions that a cache flush has to be issued before the write 
barrier and then it shall issue the barrier either as a FUA (forced unit 
access) request or it shall issue a cache flush after the barrier 
request. No where it is written that this has to happen immediately. The 
documentation file is mainly about ordering requests instead and that 
cache flushes may be used to enforce that regular requests cannot pass 
barrier requests.

Nor do I understand why the filesystem needs to know whether a barrier has 
been completed - it just needs to know whether the block device / driver 
can handle barrier requests. If the filesystem knows that requests are 
written with certain order constraint, then it shouldn't matter when they 
are written. When should be a choice of the user on how much data she / 
he risks to loose in case of a sudden interruption of writing out 
requests.

Thus I think the mentioned documentation is at least misleading, if your 
description matches the actual implementation of write barriers. Then I 
think it should be adapted, changed.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14 22:02                 ` Peter Grandi
@ 2008-12-15 22:38                   ` Dave Chinner
  -1 siblings, 0 replies; 61+ messages in thread
From: Dave Chinner @ 2008-12-15 22:38 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux XFS, Linux RAID

On Sun, Dec 14, 2008 at 10:02:05PM +0000, Peter Grandi wrote:
> [ ... ]
> 
> > But - as far as I understood - the filesystem doesn't have to
> > wait for barriers to complete, but could continue issuing IO
> > requests happily. A barrier only means, any request prior to
> > that have to land before and any after it after it.
> 
> > It doesn't mean that the barrier has to land immediately and
> > the filesystem has to wait for this. At least that always was
> > the whole point of barriers for me. If thats not the case I
> > misunderstood the purpose of barriers to the maximum extent
> > possible.
> 
> Unfortunately that seems the case.
> 
> The purpose of barriers is to guarantee that relevant data is
> known to be on persistent storage (kind of hardware 'fsync').
> 
> In effect write barrier means "tell me when relevant data is on
> persistent storage", or less precisely "flush/sync writes now
> and tell me when it is done". Properties as to ordering are just
> a side effect.

No, that is incorrect.

Barriers provide strong ordering semantics.  I/Os issued before the
barrier must be completed before the barrier I/O, and I/Os issued
after the barrier write must not be started before the barrier write
completes. The elevators are not allowed to re-оrder I/Os around
barriers.

This is all documented in Documentation/block/barrier.txt. Please
read it because most of what you are saying appears to be based on
incorrect assumptions about what barriers do.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-15 22:38                   ` Dave Chinner
  0 siblings, 0 replies; 61+ messages in thread
From: Dave Chinner @ 2008-12-15 22:38 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux RAID, Linux XFS

On Sun, Dec 14, 2008 at 10:02:05PM +0000, Peter Grandi wrote:
> [ ... ]
> 
> > But - as far as I understood - the filesystem doesn't have to
> > wait for barriers to complete, but could continue issuing IO
> > requests happily. A barrier only means, any request prior to
> > that have to land before and any after it after it.
> 
> > It doesn't mean that the barrier has to land immediately and
> > the filesystem has to wait for this. At least that always was
> > the whole point of barriers for me. If thats not the case I
> > misunderstood the purpose of barriers to the maximum extent
> > possible.
> 
> Unfortunately that seems the case.
> 
> The purpose of barriers is to guarantee that relevant data is
> known to be on persistent storage (kind of hardware 'fsync').
> 
> In effect write barrier means "tell me when relevant data is on
> persistent storage", or less precisely "flush/sync writes now
> and tell me when it is done". Properties as to ordering are just
> a side effect.

No, that is incorrect.

Barriers provide strong ordering semantics.  I/Os issued before the
barrier must be completed before the barrier I/O, and I/Os issued
after the barrier write must not be started before the barrier write
completes. The elevators are not allowed to re-оrder I/Os around
barriers.

This is all documented in Documentation/block/barrier.txt. Please
read it because most of what you are saying appears to be based on
incorrect assumptions about what barriers do.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-15 18:48                 ` Martin Steigerwald
@ 2008-12-15 22:50                   ` Peter Grandi
  2009-02-18 22:14                     ` Leon Woestenberg
  0 siblings, 1 reply; 61+ messages in thread
From: Peter Grandi @ 2008-12-15 22:50 UTC (permalink / raw)
  To: Linux XFS

[ ... ]

>> The purpose of barriers is to guarantee that relevant data is
>> known to be on persistent storage (kind of hardware 'fsync').
>> In effect write barrier means "tell me when relevant data is
>> on persistent storage", or less precisely "flush/sync writes
>> now and tell me when it is done". Properties as to ordering
>> are just a side effect.

> [ ... ] Unfortunately in my understanding none of this is
> reflected by Documentation/block/barrier.txt

But we are talking about XFS and barriers here. That described
just a (flawed, buggy) mechanism to implement those. Consider
for example:

  http://www.xfs.org/index.php/XFS_FAQ#Write_barrier_support.
  http://www.xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F

In any case as to the kernel "barrier" mechanism, its
description is misleading because it heavily fixates on the
ordering issue, which is just a consequence, but yet mentions
the far more important "flush/sync" aspect.

Still, there is a lot of confusion about barrier support and
what it means at which level, as reflected in several online
discussions and the different behaviour of different kernel
versions.

> Especially this mentions:

  > [ ... ] All requests queued before a barrier request must be
  > finished (made it to the physical medium) before the barrier
  > request is started, and all requests queued after the
  > barrier request must be started only after the barrier
  > request is finished (again, made it to the physical medium)

This does say that the essential property is "made it to the
physical medium".

  > i. For devices which have queue depth greater than 1 (TCQ
  > devices) and support ordered tags, block layer can just
  > issue the barrier as an ordered request and the lower level
  > driver, controller and drive itself

Note that the terminology here is wrong: here "controller"
really means "host adapter", and "drive itself" actually means
"drive controller".

  > are responsible for making sure that the ordering constraint
  > is met.

This is subtly incorrect. The driver, host adapter and drive
controller should only keep queued multiple barrier requests if
their caches are persistent. But this seems corrected below in
the "Forced flushing to physical medium".

  > ii. For devices which have queue depth greater than 1 but
  > don't support ordered tags, block layer ensures that the
  > requests preceding a barrier request finishes before issuing
  > the barrier request.  Also, it defers requests following the
  > barrier until the barrier request is finished.  Older SCSI
  > controllers/drives and SATA drives fall in this category.

  > iii. Devices which have queue depth of 1.  This is a
  > degenerate case of ii. Just keeping issue order suffices.
  > Ancient SCSI controllers/drives and IDE drives are in this
  > category.

Both of these seem to match my discussion; here "requests" means
of course "write requests".

  > 2. Forced flushing to physical medium

  > Again, if you're not gonna do synchronization with disk
  > drives (dang, it sounds even more appealing now!), the
  > reason you use I/O barriers is mainly to protect filesystem
  > integrity when power failure or some other events abruptly
  > stop the drive from operating and possibly make the drive
                                      =======================
  > lose data in its cache.
    ======================

  > So, I/O barriers need to guarantee that requests actually
  > get written to non-volatile medium in order.

Here it is incorrect again: barriers need to guarantee both that
data gets written to non-volatile medium, and that this happens
in order, for serially dependent transactions.

  > There are four cases, [ ... ] We still need one flush to
  > make sure requests preceding a barrier are written to medium
  > [ ... ]

[ ... ]

> Nor do I understand why the filesystem needs to know whether a
> barrier has been completed - it just needs to know whether the
> block device / driver can handle barrier requests.

Perhaps you are thinking about an API like "issue barrier, wait
for barrier completion". But it can be instead "issue barrier,
this only returns when it is complete", or "issue barrier, any
subsequent write completes only when the barrier has been
executed" much to the same effect.  In the discussion of the
four cases above

> If the filesystem knows that requests are written with certain
> order constraint, then it shouldn't matter when they are written.

Ah it sure does, in two ways. Barriers are essentially a way to
implement 'fsync' or 'fdatasync', whether these are explicitly
issued by processes or implicitly by the file system code.

  > When should be a choice of the user on how much data she /
  > he risks to loose in case of a sudden interruption of
  > writing out requests.

Sure, and for *data* the user can issue 'fdatasync'/'msync' (or
the new 'sync_file_range'), and for metadata 'fsync'; or things
like implicit versions of these with filesystem options. But
once the 'fsync' or 'fdatasync' has been issued the file system
code must wait until flush/sync implicit in the barrier is
complete.

Anyhow what the kernel does with 'fsync'/'fdatasync', what the
host adapter or drive controller do, has been controversial, and
depending on Linux kernel versions and host adapter/drive
controller firmware versions different things happen.

Let's say that given this mess it is *exceptionally difficult*
to create an IO subsystem with properly working write barriers
(unless one buy SGI kit of course :->).

A couple of relevant threads:

http://groups.google.com/group/linux.kernel/tree/browse_frm/thread/d343e51655b4ac7c
http://kerneltrap.org/mailarchive/linux-kernel/2008/2/26/987744

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-15 22:38                   ` Dave Chinner
@ 2008-12-16  9:39                     ` Martin Steigerwald
  -1 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-16  9:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner, Peter Grandi, Linux RAID, Linux XFS

Am Montag 15 Dezember 2008 schrieb Dave Chinner:
> On Sun, Dec 14, 2008 at 10:02:05PM +0000, Peter Grandi wrote:
> > [ ... ]
> >
> > > But - as far as I understood - the filesystem doesn't have to
> > > wait for barriers to complete, but could continue issuing IO
> > > requests happily. A barrier only means, any request prior to
> > > that have to land before and any after it after it.
> > >
> > > It doesn't mean that the barrier has to land immediately and
> > > the filesystem has to wait for this. At least that always was
> > > the whole point of barriers for me. If thats not the case I
> > > misunderstood the purpose of barriers to the maximum extent
> > > possible.
> >
> > Unfortunately that seems the case.
> >
> > The purpose of barriers is to guarantee that relevant data is
> > known to be on persistent storage (kind of hardware 'fsync').
> >
> > In effect write barrier means "tell me when relevant data is on
> > persistent storage", or less precisely "flush/sync writes now
> > and tell me when it is done". Properties as to ordering are just
> > a side effect.
>
> No, that is incorrect.
>
> Barriers provide strong ordering semantics.  I/Os issued before the
> barrier must be completed before the barrier I/O, and I/Os issued
> after the barrier write must not be started before the barrier write
> completes. The elevators are not allowed to re-оrder I/Os around
> barriers.
>
> This is all documented in Documentation/block/barrier.txt. Please
> read it because most of what you are saying appears to be based on
> incorrect assumptions about what barriers do.

Hmmm, so I am not completely off track it seems ;-).

What I still do not understand then is: How can write barriers + write 
cache be slower than no write barriers + no cache? I still would expect 
write barriers + write cache be in between no barriers + write cache and 
no barriers + no cache performance wise. And would see anything else as a 
regression basically.

This doesn't go into my brain yet and I thought I understood 
Documentation/block/barrier.txt well enough before writing my article.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-16  9:39                     ` Martin Steigerwald
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-16  9:39 UTC (permalink / raw)
  To: linux-xfs; +Cc: Linux RAID

Am Montag 15 Dezember 2008 schrieb Dave Chinner:
> On Sun, Dec 14, 2008 at 10:02:05PM +0000, Peter Grandi wrote:
> > [ ... ]
> >
> > > But - as far as I understood - the filesystem doesn't have to
> > > wait for barriers to complete, but could continue issuing IO
> > > requests happily. A barrier only means, any request prior to
> > > that have to land before and any after it after it.
> > >
> > > It doesn't mean that the barrier has to land immediately and
> > > the filesystem has to wait for this. At least that always was
> > > the whole point of barriers for me. If thats not the case I
> > > misunderstood the purpose of barriers to the maximum extent
> > > possible.
> >
> > Unfortunately that seems the case.
> >
> > The purpose of barriers is to guarantee that relevant data is
> > known to be on persistent storage (kind of hardware 'fsync').
> >
> > In effect write barrier means "tell me when relevant data is on
> > persistent storage", or less precisely "flush/sync writes now
> > and tell me when it is done". Properties as to ordering are just
> > a side effect.
>
> No, that is incorrect.
>
> Barriers provide strong ordering semantics.  I/Os issued before the
> barrier must be completed before the barrier I/O, and I/Os issued
> after the barrier write must not be started before the barrier write
> completes. The elevators are not allowed to re-оrder I/Os around
> barriers.
>
> This is all documented in Documentation/block/barrier.txt. Please
> read it because most of what you are saying appears to be based on
> incorrect assumptions about what barriers do.

Hmmm, so I am not completely off track it seems ;-).

What I still do not understand then is: How can write barriers + write 
cache be slower than no write barriers + no cache? I still would expect 
write barriers + write cache be in between no barriers + write cache and 
no barriers + no cache performance wise. And would see anything else as a 
regression basically.

This doesn't go into my brain yet and I thought I understood 
Documentation/block/barrier.txt well enough before writing my article.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-16  9:39                     ` Martin Steigerwald
  (?)
@ 2008-12-16 20:57                     ` Peter Grandi
  -1 siblings, 0 replies; 61+ messages in thread
From: Peter Grandi @ 2008-12-16 20:57 UTC (permalink / raw)
  To: Linux RAID, Linux XFS

[ ... ]

>>>> It doesn't mean that the barrier has to land immediately and
>>>> the filesystem has to wait for this. At least that always was
>>>> the whole point of barriers for me. If thats not the case I
>>>> misunderstood the purpose of barriers to the maximum extent
>>>> possible.

>>> The purpose of barriers is to guarantee that relevant data is
>>> known to be on persistent storage (kind of hardware 'fsync').

>> Barriers provide strong ordering semantics. [ ... ]This is all
>> documented in Documentation/block/barrier.txt. Please read it
>> because most of what you are saying appears to be based on
>> incorrect assumptions about what barriers do.

No, it is based on the assumption that we are discussing the
"whole point of barriers" and "the purpose of barriers".

Those are the ability to do atomic, serially dependent transactions
*to stable storage*. Some people may be interested in integrity
only, with potentially unbounded data loss, but most people who
care about barriers are interested in reliable commit to stable
storage.

Then there are different types of barriers, from XFS barriers to
host adapter/drive controller barriers, and even the Linux block
layer "barrier" mechanism, which is arguably misdesigned, because
what it does is not what it should be doing to achieve "the whole
point" and "the purpose" of a barrier system, and achieving that
can be quite difficult.

This is somewhat controversial, and to further the understanding
of the whole point of barriers and their purpose I have provided
in a previous post a pointer to two very relevant discussion
threads, which to me seem pretty clear.

> Hmmm, so I am not completely off track it seems ;-).

Well, your description seems to be based on the actual properties
of the flawed implementation of barriers in current Linux, but not
about the "whole point" and "purpose" that should be served by such
a mechanism.

The documentation of barriers in the Linux kernel makes the mess
worse, because it does talk about committing to stable storage, but
then gives the impression that the point and purpose is indeed
ordering, which it should not be. That an ordering is imposed
should be consequence of the committment of serially dependent
transactions to stable storage in a consistent way, not a goal in
itself.

The discussion threads I mentioned previously show that the big
issue is indeed having a reliable mechanism to commit transactions
to stable storage, rather than provide just the transaction
dependency part of that mechanism.

Quite a few people think that just transaction property is too
weak a purpose or point for barriers. Which point or purpose is
precisely to offer the application (file system or user process
like a DBMS instance) the ability to definitely commit to stable
storage:

  > When should be a choice of the user on how much data she /
  > he risks to loose in case of a sudden interruption of
  > writing out requests.

Unfortunately as I have already remarked this area, which should be
crystal clear as it is important to people who need transaction
persistence guarantees, is messy, with various file systems or
DBMSes doing bad, dirty things because the point and purpose of
barriers has been misunderstood so often (arguably even by the
POSIX committee with 'fsync'/'fdatasync').

The rapid escalation of complexity of the levels and types of
nonpersistent caching in current storage subsystem is so bad that
reminding people that the whole point and purpose of barriers is to
provide stable storage commits rather than merely ordering seems
quite important to me.

The way Linux block layer barriers currently work, like other
aspects of that block layer (for example the absurd rationale
behind the plugging/unplugging mechanism), is so misguided that it
should not be confused with the whole point and purpose of barriers.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-16  9:39                     ` Martin Steigerwald
@ 2008-12-16 23:14                       ` Dave Chinner
  -1 siblings, 0 replies; 61+ messages in thread
From: Dave Chinner @ 2008-12-16 23:14 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-xfs, Peter Grandi, Linux RAID

On Tue, Dec 16, 2008 at 10:39:07AM +0100, Martin Steigerwald wrote:
> Am Montag 15 Dezember 2008 schrieb Dave Chinner:
> > On Sun, Dec 14, 2008 at 10:02:05PM +0000, Peter Grandi wrote:
> > > The purpose of barriers is to guarantee that relevant data is
> > > known to be on persistent storage (kind of hardware 'fsync').
> > >
> > > In effect write barrier means "tell me when relevant data is on
> > > persistent storage", or less precisely "flush/sync writes now
> > > and tell me when it is done". Properties as to ordering are just
> > > a side effect.
> >
> > No, that is incorrect.
> >
> > Barriers provide strong ordering semantics.  I/Os issued before the
> > barrier must be completed before the barrier I/O, and I/Os issued
> > after the barrier write must not be started before the barrier write
> > completes. The elevators are not allowed to re-оrder I/Os around
> > barriers.
> >
> > This is all documented in Documentation/block/barrier.txt. Please
> > read it because most of what you are saying appears to be based on
> > incorrect assumptions about what barriers do.
> 
> Hmmm, so I am not completely off track it seems ;-).
> 
> What I still do not understand then is: How can write barriers + write 
> cache be slower than no write barriers + no cache?

Because frequent write barriers cause ordering constraints on I/O.
For example, in XFS log I/Os are sequential. With barriers enabled
they cannot be merged by the elevator, whereas without barriers
they can be merged and issued as a single I/O.

Further, if you have no barrier I/os queued in the elevator, sorting
and merging occurs across the entire queue of I/Os, not just the
I/Os that have been issued after the last barrier I/O.

Effectively the ordering constraints of barriers introduce more
seeks by reducing the efficiency of the elevator due to constraining
sorting and merging ranges.

In many cases, the ordering constraints impose a higher seek penalty
than the write cache can mitigate - the whole purpose of the barrier
IOs is to force the cache to be flushed - so write caching does not
improve performance when frequent barriers are issued. In this case,
barriers are the problem and hence turning of the cache and barriers
will result in higher performance.



> I still would expect 
> write barriers + write cache be in between no barriers + write cache and 
> no barriers + no cache performance wise.

Depends entirely on the disk and the workload. Some disks are faster
with wcache and barriers (e.g. laptop drives), some are faster with
no wcache and no barriers (e.g. server drives)....

> And would see anything else as a 
> regression basically.

No, just your usual "pick the right hardware" problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-16 23:14                       ` Dave Chinner
  0 siblings, 0 replies; 61+ messages in thread
From: Dave Chinner @ 2008-12-16 23:14 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-xfs, Linux RAID

On Tue, Dec 16, 2008 at 10:39:07AM +0100, Martin Steigerwald wrote:
> Am Montag 15 Dezember 2008 schrieb Dave Chinner:
> > On Sun, Dec 14, 2008 at 10:02:05PM +0000, Peter Grandi wrote:
> > > The purpose of barriers is to guarantee that relevant data is
> > > known to be on persistent storage (kind of hardware 'fsync').
> > >
> > > In effect write barrier means "tell me when relevant data is on
> > > persistent storage", or less precisely "flush/sync writes now
> > > and tell me when it is done". Properties as to ordering are just
> > > a side effect.
> >
> > No, that is incorrect.
> >
> > Barriers provide strong ordering semantics.  I/Os issued before the
> > barrier must be completed before the barrier I/O, and I/Os issued
> > after the barrier write must not be started before the barrier write
> > completes. The elevators are not allowed to re-оrder I/Os around
> > barriers.
> >
> > This is all documented in Documentation/block/barrier.txt. Please
> > read it because most of what you are saying appears to be based on
> > incorrect assumptions about what barriers do.
> 
> Hmmm, so I am not completely off track it seems ;-).
> 
> What I still do not understand then is: How can write barriers + write 
> cache be slower than no write barriers + no cache?

Because frequent write barriers cause ordering constraints on I/O.
For example, in XFS log I/Os are sequential. With barriers enabled
they cannot be merged by the elevator, whereas without barriers
they can be merged and issued as a single I/O.

Further, if you have no barrier I/os queued in the elevator, sorting
and merging occurs across the entire queue of I/Os, not just the
I/Os that have been issued after the last barrier I/O.

Effectively the ordering constraints of barriers introduce more
seeks by reducing the efficiency of the elevator due to constraining
sorting and merging ranges.

In many cases, the ordering constraints impose a higher seek penalty
than the write cache can mitigate - the whole purpose of the barrier
IOs is to force the cache to be flushed - so write caching does not
improve performance when frequent barriers are issued. In this case,
barriers are the problem and hence turning of the cache and barriers
will result in higher performance.



> I still would expect 
> write barriers + write cache be in between no barriers + write cache and 
> no barriers + no cache performance wise.

Depends entirely on the disk and the workload. Some disks are faster
with wcache and barriers (e.g. laptop drives), some are faster with
no wcache and no barriers (e.g. server drives)....

> And would see anything else as a 
> regression basically.

No, just your usual "pick the right hardware" problem.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14 22:02                 ` Peter Grandi
@ 2008-12-17 21:40                   ` Bill Davidsen
  -1 siblings, 0 replies; 61+ messages in thread
From: Bill Davidsen @ 2008-12-17 21:40 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux XFS, Linux RAID

Peter Grandi wrote:
> Unfortunately that seems the case.
>
> The purpose of barriers is to guarantee that relevant data is
> known to be on persistent storage (kind of hardware 'fsync').
>
> In effect write barrier means "tell me when relevant data is on
> persistent storage", or less precisely "flush/sync writes now
> and tell me when it is done". Properties as to ordering are just
> a side effect.
>   

I don't get that sense from the barriers stuff in Documentation, in fact 
I think it's essentially a pure ordering thing, I don't even see that it 
has an effect of forcing the data to be written to the device, other 
than by preventing other writes until the drive writes everything. So we 
read the intended use differently.

What really bothers me is that there's no obvious need for barriers at 
the device level if the file system is just a bit smarter and does it's 
own async io (like aio_*), because you can track writes outstanding on a 
per-fd basis, so instead of stopping the flow of data to the drive, you 
can just block a file descriptor and wait for the count of outstanding 
i/o to drop to zero. That provides the order semantics of barriers as 
far as I can see, having tirelessly thought about it for ten minutes or 
so. Oh, and did something very similar decades ago in a long-gone 
mainframe OS.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-17 21:40                   ` Bill Davidsen
  0 siblings, 0 replies; 61+ messages in thread
From: Bill Davidsen @ 2008-12-17 21:40 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux RAID, Linux XFS

Peter Grandi wrote:
> Unfortunately that seems the case.
>
> The purpose of barriers is to guarantee that relevant data is
> known to be on persistent storage (kind of hardware 'fsync').
>
> In effect write barrier means "tell me when relevant data is on
> persistent storage", or less precisely "flush/sync writes now
> and tell me when it is done". Properties as to ordering are just
> a side effect.
>   

I don't get that sense from the barriers stuff in Documentation, in fact 
I think it's essentially a pure ordering thing, I don't even see that it 
has an effect of forcing the data to be written to the device, other 
than by preventing other writes until the drive writes everything. So we 
read the intended use differently.

What really bothers me is that there's no obvious need for barriers at 
the device level if the file system is just a bit smarter and does it's 
own async io (like aio_*), because you can track writes outstanding on a 
per-fd basis, so instead of stopping the flow of data to the drive, you 
can just block a file descriptor and wait for the count of outstanding 
i/o to drop to zero. That provides the order semantics of barriers as 
far as I can see, having tirelessly thought about it for ten minutes or 
so. Oh, and did something very similar decades ago in a long-gone 
mainframe OS.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-17 21:40                   ` Bill Davidsen
  (?)
@ 2008-12-18  8:20                   ` Leon Woestenberg
  2008-12-18 23:33                     ` Bill Davidsen
  2008-12-21 19:16                     ` Peter Grandi
  -1 siblings, 2 replies; 61+ messages in thread
From: Leon Woestenberg @ 2008-12-18  8:20 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID, Peter Grandi, Linux XFS

Hello all,

Bill Davidsen wrote:
> Peter Grandi wrote:
>   
>> Unfortunately that seems the case.
>>
>> The purpose of barriers is to guarantee that relevant data is
>> known to be on persistent storage (kind of hardware 'fsync').
>>
>> In effect write barrier means "tell me when relevant data is on
>> persistent storage", or less precisely "flush/sync writes now
>> and tell me when it is done". Properties as to ordering are just
>> a side effect.
>>   
>>     
>
> I don't get that sense from the barriers stuff in Documentation, in fact 
> I think it's essentially a pure ordering thing, I don't even see that it 
> has an effect of forcing the data to be written to the device, other 
> than by preventing other writes until the drive writes everything. So we 
> read the intended use differently.
>
> What really bothers me is that there's no obvious need for barriers at 
> the device level if the file system is just a bit smarter and does it's 
> own async io (like aio_*), because you can track writes outstanding on a 
> per-fd basis, so instead of stopping the flow of data to the drive, you 
> can just block a file descriptor and wait for the count of outstanding 
> i/o to drop to zero. That provides the order semantics of barriers as 
> far as I can see, having tirelessly thought about it for ten minutes or 
> so. Oh, and did something very similar decades ago in a long-gone 
> mainframe OS.
>   
Did that mainframe OS have re-ordering devices? If it did, you'ld still 
need barriers all the way down:

The drive itself may still re-order writes, thus can cause corruption if 
halfway the power goes down.
 From my understanding, disabling write-caches simply forces the drive 
to operate in-order.

Barriers need to travel all the way down to the point where-after 
everything remains in-order.
Devices with write-cache enabled will still re-order, but not across 
barriers (which are implemented as
either a single cache flush with forced unit access, or a double cache 
flush around the barrier write).

Whether the data has made it to the drive platters is not really 
important from a barrier point of view, however,
iff part of the data made it to the platters, then we want to be sure it 
was in-order.

Because only in this way can we ensure that the data that is on the 
platters is consistent.

Regards,

Leon.



[[HTML alternate version deleted]]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-17 21:40                   ` Bill Davidsen
@ 2008-12-18 22:26                     ` Dave Chinner
  -1 siblings, 0 replies; 61+ messages in thread
From: Dave Chinner @ 2008-12-18 22:26 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Peter Grandi, Linux RAID, Linux XFS

On Wed, Dec 17, 2008 at 04:40:02PM -0500, Bill Davidsen wrote:
> What really bothers me is that there's no obvious need for
> barriers at the device level if the file system is just a bit
> smarter and does it's own async io (like aio_*), because you can
> track writes outstanding on a per-fd basis, so instead of stopping
> the flow of data to the drive, you can just block a file
> descriptor and wait for the count of outstanding i/o to drop to
> zero. That provides the order semantics of barriers as far as I
> can see, having tirelessly thought about it for ten minutes or so.

Well, you've pretty much described the algorithm XFS uses in it's
transaction system - it's entirely asynchronous - and it's been
clear for many, many years that this model is broken when you have
devices with volatile write caches and internal re-ordering.  I/O
completion on such devices does not guarantee data is safe on stable
storage.

If the device does not commit writes to stable storage in the same
order they are signalled as complete (i.e. internal device
re-ordering occurred after completion), then the device violates
fundamental assumptions about I/O completion that the above model
relies on.

XFS uses barriers to guarantee that the devices don't lie about the
completion order of critical I/O, not that the I/Os are on stable
storage. The fact that this causes cache flushes to stable storage
is result of the implementation of that guarantee of ordering. I'm
sure the linux barrier implementation could be smarter and faster
(for some hardware), but for an operation that is used to guarantee
integrity I'll take conservative and safe over smart and fast any
day of the week....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-18 22:26                     ` Dave Chinner
  0 siblings, 0 replies; 61+ messages in thread
From: Dave Chinner @ 2008-12-18 22:26 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux RAID, Peter Grandi, Linux XFS

On Wed, Dec 17, 2008 at 04:40:02PM -0500, Bill Davidsen wrote:
> What really bothers me is that there's no obvious need for
> barriers at the device level if the file system is just a bit
> smarter and does it's own async io (like aio_*), because you can
> track writes outstanding on a per-fd basis, so instead of stopping
> the flow of data to the drive, you can just block a file
> descriptor and wait for the count of outstanding i/o to drop to
> zero. That provides the order semantics of barriers as far as I
> can see, having tirelessly thought about it for ten minutes or so.

Well, you've pretty much described the algorithm XFS uses in it's
transaction system - it's entirely asynchronous - and it's been
clear for many, many years that this model is broken when you have
devices with volatile write caches and internal re-ordering.  I/O
completion on such devices does not guarantee data is safe on stable
storage.

If the device does not commit writes to stable storage in the same
order they are signalled as complete (i.e. internal device
re-ordering occurred after completion), then the device violates
fundamental assumptions about I/O completion that the above model
relies on.

XFS uses barriers to guarantee that the devices don't lie about the
completion order of critical I/O, not that the I/Os are on stable
storage. The fact that this causes cache flushes to stable storage
is result of the implementation of that guarantee of ordering. I'm
sure the linux barrier implementation could be smarter and faster
(for some hardware), but for an operation that is used to guarantee
integrity I'll take conservative and safe over smart and fast any
day of the week....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-18  8:20                   ` Leon Woestenberg
@ 2008-12-18 23:33                     ` Bill Davidsen
  2008-12-21 19:16                     ` Peter Grandi
  1 sibling, 0 replies; 61+ messages in thread
From: Bill Davidsen @ 2008-12-18 23:33 UTC (permalink / raw)
  To: Leon Woestenberg; +Cc: Linux RAID, Peter Grandi, Linux XFS

Leon Woestenberg wrote:
> Hello all,
>
> Bill Davidsen wrote:
>> Peter Grandi wrote:
>>   
>>> Unfortunately that seems the case.
>>>
>>> The purpose of barriers is to guarantee that relevant data is
>>> known to be on persistent storage (kind of hardware 'fsync').
>>>
>>> In effect write barrier means "tell me when relevant data is on
>>> persistent storage", or less precisely "flush/sync writes now
>>> and tell me when it is done". Properties as to ordering are just
>>> a side effect.
>>>   
>>>     
>>
>> I don't get that sense from the barriers stuff in Documentation, in fact 
>> I think it's essentially a pure ordering thing, I don't even see that it 
>> has an effect of forcing the data to be written to the device, other 
>> than by preventing other writes until the drive writes everything. So we 
>> read the intended use differently.
>>
>> What really bothers me is that there's no obvious need for barriers at 
>> the device level if the file system is just a bit smarter and does it's 
>> own async io (like aio_*), because you can track writes outstanding on a 
>> per-fd basis, so instead of stopping the flow of data to the drive, you 
>> can just block a file descriptor and wait for the count of outstanding 
>> i/o to drop to zero. That provides the order semantics of barriers as 
>> far as I can see, having tirelessly thought about it for ten minutes or 
>> so. Oh, and did something very similar decades ago in a long-gone 
>> mainframe OS.
>>   
> Did that mainframe OS have re-ordering devices? If it did, you'ld 
> still need barriers all the way down:
>
Why? As long as you can tell when all the writes before the barrier are 
physically on the drive (this is on a per fd basis, remember) you don't 
care about the order of physical writes, you serialize either one fd, or 
one thread, or one application, but you don't have to kill performance 
for the rest of the system to the drive. So you can fsync() one fd or 
several, then write another thread. Or you can wait until the 
outstanding write could for a whole process reaches zero. And the 
application satisfies the needs, not the kernel, which reduces impact on 
other applications.
> The drive itself may still re-order writes, thus can cause corruption 
> if halfway the power goes down.
> >From my understanding, disabling write-caches simply forces the drive 
> to operate in-order.
>
If you ordering logic is 'write A, B, and C, then barrier, then write D' 
I don't see that the physical order of A, B, or C matters, as long as 
they are all complete before you write D. That's what I see in the 
barrier description, let previous writes finish.
> Barriers need to travel all the way down to the point where-after 
> everything remains in-order.
> Devices with write-cache enabled will still re-order, but not across 
> barriers (which are implemented as
> either a single cache flush with forced unit access, or a double cache 
> flush around the barrier write).
>
> Whether the data has made it to the drive platters is not really 
> important from a barrier point of view, however,
> iff part of the data made it to the platters, then we want to be sure 
> it was in-order.
>
And you could use a barrier after every write (some DB setups do fsync() 
after each). Perhaps you mean parts like the journal entry before a 
change is made, then the change, then the journal entry for transaction 
complete?
> Because only in this way can we ensure that the data that is on the 
> platters is consistent.

I think we mean the same thing, but I'm not totally sure. As long a 
logical operations are completed in order, the physical writes don't 
matter, because a journal rollback would reset things to consistent anyway.

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 




[[HTML alternate version deleted]]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-14 18:12               ` Martin Steigerwald
  (?)
  (?)
@ 2008-12-20 14:06               ` Peter Grandi
  -1 siblings, 0 replies; 61+ messages in thread
From: Peter Grandi @ 2008-12-20 14:06 UTC (permalink / raw)
  To: Linux XFS, Linux RAID

[ ... ]

> But - as far as I understood - the filesystem doesn't have to
> wait for barriers to complete, but could continue issuing IO
> requests happily. A barrier only means, any request prior to
> that have to land before and any after it after it.

> It doesn't mean that the barrier has to land immediately and
> the filesystem has to wait for this. At least that always was
> the whole point of barriers for me. If thats not the case I
> misunderstood the purpose of barriers to the maximum extent
> possible.

Unfortunately that seems the case.

The purpose of barriers is to guarantee that relevant data is
known to be on persistent storage (kind of hardware 'fsync').

In effect write barrier means "tell me when relevant data is on
persistent storage", or less precisely "flush/sync writes now
and tell me when it is done". Properties as to ordering are just
a side effect.

That is, the application (file system in the case of metadata,
user process in the case of data) knows that a barrier operation
is complete, it knows that all data involved in the barrier
operation are on persistent storage. In case of serially
dependent transactions, applications do wait until the previous
transaction is completed before starting the next one (e.g.
creating potentially many files in the same directory, something
that 'tar' does).

  "all data involved" is usually all previous writes, but in
  more sophisticated cases it can be just specific writes.

When an applications at transaction end points (for a file
system, metadata updates) issues a write barrier and then waits
for its completion.

If the host adapter/disk controllers don't have persistent
storage, then completion (should) only happen when the data
involved is actually on disk; if they do have it, then multiple
barriers can be outstanding, if the host adapter/disk controller
does support multiple outstanding operations (e.g. thanks to
tagged queueing).

The best case is when the IO subsystem supports all of these:

* tagged queueing: multiple write barriers can be outstanding;

* fine granule (specific writes, not all writes) barriers: just
  metadata writes need to be flushed to persistent storage, not
  any intervening data writes too;

* the host adapter and/or disk controller have persistent
  caches: as long as those caches have space, barriers can
  complete immediately, without waiting a write to disk.

It just happens that typical contemporary PC IO subsystems (at
the hardware level, not the Linux level) have none of those
features, except sometimes for NCQ which is a reduced form of
TCQ, and apparently is not that useful.

Write barriers are also useful without persistent caches, if
there is proper tagged queueing and fine granularity.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-18  8:20                   ` Leon Woestenberg
  2008-12-18 23:33                     ` Bill Davidsen
@ 2008-12-21 19:16                     ` Peter Grandi
  2008-12-22 13:19                         ` Leon Woestenberg
  1 sibling, 1 reply; 61+ messages in thread
From: Peter Grandi @ 2008-12-21 19:16 UTC (permalink / raw)
  To: Linux RAID, Linux XFS

[ ... ]

>> What really bothers me is that there's no obvious need for
>> barriers at the device level if the file system is just a bit
>> smarter and does it's own async io (like aio_*), because you
>> can track writes outstanding on a per-fd basis,

> The drive itself may still re-order writes, thus can cause
> corruption if halfway the power goes down. [ ... ] Barriers need
> to travel all the way down to the point where-after everything
> remains in-order. [ ... ] Whether the data has made it to the
> drive platters is not really important from a barrier point of
> view, however, iff part of the data made it to the platters, then
> we want to be sure it was in-order. [ ... ]

But this discussion is backwards, as usual: the *purpose* of any
kind of barriers cannot be just to guarantee consistency, but also
stability, because ordered commits are not that useful without
commit to stable storage.

If barriers guarantee transaction stability, then consistency is
also a consequence of serial dependencies among transactions (and
as to that per-device barriers are a coarse and very underoptimal
design).

Anyhow, barriers for ordering only have been astutely patented
quite recently:

  http://www.freshpatents.com/Transforming-flush-queue-command-to-memory-barrier-command-in-disk-drive-dt20070719ptan20070168626.php

Amazing new from the patent office.y

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-21 19:16                     ` Peter Grandi
@ 2008-12-22 13:19                         ` Leon Woestenberg
  0 siblings, 0 replies; 61+ messages in thread
From: Leon Woestenberg @ 2008-12-22 13:19 UTC (permalink / raw)
  To: Peter Grandi, Linux RAID, Linux XFS

Hello,

On Sun, 21 Dec 2008 19:16:32 +0000, "Peter Grandi" <pg_mh@sabi.co.UK>
said:
 
> > The drive itself may still re-order writes, thus can cause
> > corruption if halfway the power goes down. [ ... ] Barriers need
> > to travel all the way down to the point where-after everything
> > remains in-order. [ ... ] Whether the data has made it to the
> > drive platters is not really important from a barrier point of
> > view, however, iff part of the data made it to the platters, then
> > we want to be sure it was in-order. [ ... ]
> 
> But this discussion is backwards, as usual: the *purpose* of any
> kind of barriers cannot be just to guarantee consistency, but also
> stability, because ordered commits are not that useful without
> commit to stable storage.
>
I do not see in what sense you mean "stability"? Stable as in BIBO or
non-volatile?

Barriers are time-related. Once data is on storage, there is no relation
with time.

So I do not see how barriers help to "stabilize" storage.

Ordered commits is a strong-enough condition to ensure consistency in
the sense that
atomic transactions either made it to the disk completely or not at all.

> If barriers guarantee transaction stability, then consistency is
> also a consequence of serial dependencies among transactions (and
> as to that per-device barriers are a coarse and very underoptimal
> design).
>
Of course, the higher level should ensure that between transactions, the
(meta)data is always consistent.

In filesystem design, we see that some FS's decide to split metadata and
data in this regard.

 
> Anyhow, barriers for ordering only have been astutely patented
> quite recently:
> 
>   http://www.freshpatents.com/Transforming-flush-queue-command-to-memory-barrier-command-in-disk-drive-dt20070719ptan20070168626.php
> 
> Amazing new from the patent office.y
> 
Grand. Another case of no prior art. :-)

Leon.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-22 13:19                         ` Leon Woestenberg
  0 siblings, 0 replies; 61+ messages in thread
From: Leon Woestenberg @ 2008-12-22 13:19 UTC (permalink / raw)
  To: Peter Grandi, Linux RAID, Linux XFS

Hello,

On Sun, 21 Dec 2008 19:16:32 +0000, "Peter Grandi" <pg_mh@sabi.co.UK>
said:
 
> > The drive itself may still re-order writes, thus can cause
> > corruption if halfway the power goes down. [ ... ] Barriers need
> > to travel all the way down to the point where-after everything
> > remains in-order. [ ... ] Whether the data has made it to the
> > drive platters is not really important from a barrier point of
> > view, however, iff part of the data made it to the platters, then
> > we want to be sure it was in-order. [ ... ]
> 
> But this discussion is backwards, as usual: the *purpose* of any
> kind of barriers cannot be just to guarantee consistency, but also
> stability, because ordered commits are not that useful without
> commit to stable storage.
>
I do not see in what sense you mean "stability"? Stable as in BIBO or
non-volatile?

Barriers are time-related. Once data is on storage, there is no relation
with time.

So I do not see how barriers help to "stabilize" storage.

Ordered commits is a strong-enough condition to ensure consistency in
the sense that
atomic transactions either made it to the disk completely or not at all.

> If barriers guarantee transaction stability, then consistency is
> also a consequence of serial dependencies among transactions (and
> as to that per-device barriers are a coarse and very underoptimal
> design).
>
Of course, the higher level should ensure that between transactions, the
(meta)data is always consistent.

In filesystem design, we see that some FS's decide to split metadata and
data in this regard.

 
> Anyhow, barriers for ordering only have been astutely patented
> quite recently:
> 
>   http://www.freshpatents.com/Transforming-flush-queue-command-to-memory-barrier-command-in-disk-drive-dt20070719ptan20070168626.php
> 
> Amazing new from the patent office.y
> 
Grand. Another case of no prior art. :-)

Leon.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2008-12-15 22:50                   ` Peter Grandi
@ 2009-02-18 22:14                     ` Leon Woestenberg
  2009-02-18 22:24                       ` Eric Sandeen
                                         ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Leon Woestenberg @ 2009-02-18 22:14 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux XFS

Hello,

On 15 dec 2008, at 23:50, Peter Grandi wrote:

> [ ... ]
>
>>> The purpose of barriers is to guarantee that relevant data is
>>> known to be on persistent storage (kind of hardware 'fsync').
>>>
>
>> [ ... ] Unfortunately in my understanding none of this is
>> reflected by Documentation/block/barrier.txt
>
> But we are talking about XFS and barriers here. That described
> just a (flawed, buggy) mechanism to implement those. Consider
> for example:
>
>  http://www.xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>  http://www.xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F
>
> In any case as to the kernel "barrier" mechanism, its
> description is misleading because it heavily fixates on the
> ordering issue, which is just a consequence, but yet mentions
> the far more important "flush/sync" aspect.
>
> Still, there is a lot of confusion about barrier support and
> what it means at which level, as reflected in several online
> discussions and the different behaviour of different kernel
> versions.
>
The semantics of a barrier are whatever semantics we describe to it.  
So we can continue to be confused about it.

I strongly disagree on the ordering issue being a side effect.

Correct ordering can be proven to be enough to provide transactional  
correctness, enough to ensure that filesystems can not get corrupted  
on power down.

Using barriers to guarantee that (all submitted) write requests  
(before the barrier) made it to the medium are a stronger predicate.

The Linux approach and documentation talks about the first type of  
semantics (which I rather like for them being strong enough and not  
more).

Regards,

Leon


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2009-02-18 22:14                     ` Leon Woestenberg
@ 2009-02-18 22:24                       ` Eric Sandeen
  2009-02-18 23:09                       ` Ralf Liebenow
  2009-02-20 19:19                       ` Peter Grandi
  2 siblings, 0 replies; 61+ messages in thread
From: Eric Sandeen @ 2009-02-18 22:24 UTC (permalink / raw)
  To: Leon Woestenberg; +Cc: Linux XFS, Peter Grandi

Leon Woestenberg wrote:
> Hello,
> 
> On 15 dec 2008, at 23:50, Peter Grandi wrote:
> 
>> [ ... ]
>>
>>>> The purpose of barriers is to guarantee that relevant data is
>>>> known to be on persistent storage (kind of hardware 'fsync').
>>>>
>>> [ ... ] Unfortunately in my understanding none of this is
>>> reflected by Documentation/block/barrier.txt
>> But we are talking about XFS and barriers here. That described
>> just a (flawed, buggy) mechanism to implement those. Consider
>> for example:
>>
>>  http://www.xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>>  http://www.xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F
>>
>> In any case as to the kernel "barrier" mechanism, its
>> description is misleading because it heavily fixates on the
>> ordering issue, which is just a consequence, but yet mentions
>> the far more important "flush/sync" aspect.
>>
>> Still, there is a lot of confusion about barrier support and
>> what it means at which level, as reflected in several online
>> discussions and the different behaviour of different kernel
>> versions.
>>
> The semantics of a barrier are whatever semantics we describe to it.  
> So we can continue to be confused about it.
> 
> I strongly disagree on the ordering issue being a side effect.
> 
> Correct ordering can be proven to be enough to provide transactional  
> correctness, enough to ensure that filesystems can not get corrupted  
> on power down.
> 
> Using barriers to guarantee that (all submitted) write requests  
> (before the barrier) made it to the medium are a stronger predicate.
> 
> The Linux approach and documentation talks about the first type of  
> semantics (which I rather like for them being strong enough and not  
> more).

Agreed.  I'll have a look over those (wiki) faq entries and make sure
they're not confusing cache flushes with ordering requirements.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2009-02-18 22:14                     ` Leon Woestenberg
  2009-02-18 22:24                       ` Eric Sandeen
@ 2009-02-18 23:09                       ` Ralf Liebenow
  2009-02-18 23:19                         ` Eric Sandeen
  2009-02-20 19:19                       ` Peter Grandi
  2 siblings, 1 reply; 61+ messages in thread
From: Ralf Liebenow @ 2009-02-18 23:09 UTC (permalink / raw)
  To: xfs

Hello !

> Correct ordering can be proven to be enough to provide transactional
> correctness, enough to ensure that filesystems can not get corrupted
> on power down.

Please beware that caching RAID controllers which are not battery
backed and the harddisk (when write caching) may decide to 
re-order writes to the disk, so the ordering imposed by the 
operating system (filesystem driver) may not be retained. 
This is usually done by harddisks and
controllers to minimize seek times and thats what disk
command queueing is good for. So ordering can only be retained
if all external caching mechanism and command queueing are
switched off. Otherwise you need to have something like fsync
points (barriers ?) to have consistent checkpoints you can
rollback to ...

So the answer has many variables: 
  do you have a persistent (battery backed) write cache ?

    Yes -> you can go with nobarriers if you can make sure
           that the harddisk cache is off, if the
           filesystem does proper write ordering.

    No  -> if you switch off the disks cache, you _may_
           switch off barriers, when the filesystem driver
           uses properly placed write ordering

        -> if you have disk write caching on, you are on
           your own when power goes down and you dont
           use barriers ... you maybe lucky or not ...
           But to make that clear: its only a problem
           when power is failing ... its not a problem
           when the machine crashes ... the disks will
           eventually write down their caches then.
           So if your system is somewhere connected with
           a redundant power supply and failsave power
           supply sytems (as this is the case for most
           data centers) you can probably live with
           disk write caching on and nobarriers, if the
           filesystem driver does order its writes
           properly ....

So I have one open question left: does xfs do proper
(transactional) ordering when barriers are off ? Im using
xfs for years now and had many machine crashes (not
power failures) without xfs get corrupted (and that was
before 2.6.17 ... and therefore without barrier support).
So I assume it always does proper ordering and barrier
support is only making "fsynced" checkpoints in time.

Am I right ?

   Ralf

> Hello,
> 
> On 15 dec 2008, at 23:50, Peter Grandi wrote:
> 
> >[ ... ]
> >
> >>>The purpose of barriers is to guarantee that relevant data is
> >>>known to be on persistent storage (kind of hardware 'fsync').
> >>>
> >
> >>[ ... ] Unfortunately in my understanding none of this is
> >>reflected by Documentation/block/barrier.txt
> >
> >But we are talking about XFS and barriers here. That described
> >just a (flawed, buggy) mechanism to implement those. Consider
> >for example:
> >
> > http://www.xfs.org/index.php/XFS_FAQ#Write_barrier_support.
> > http://www.xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F
> >
> >In any case as to the kernel "barrier" mechanism, its
> >description is misleading because it heavily fixates on the
> >ordering issue, which is just a consequence, but yet mentions
> >the far more important "flush/sync" aspect.
> >
> >Still, there is a lot of confusion about barrier support and
> >what it means at which level, as reflected in several online
> >discussions and the different behaviour of different kernel
> >versions.
> >
> The semantics of a barrier are whatever semantics we describe to it.  
> So we can continue to be confused about it.
> 
> I strongly disagree on the ordering issue being a side effect.
> 
> Correct ordering can be proven to be enough to provide transactional  
> correctness, enough to ensure that filesystems can not get corrupted  
> on power down.
> 
> Using barriers to guarantee that (all submitted) write requests  
> (before the barrier) made it to the medium are a stronger predicate.
> 
> The Linux approach and documentation talks about the first type of  
> semantics (which I rather like for them being strong enough and not  
> more).
> 
> Regards,
> 
> Leon
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

-- 
theCode AG 
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0  fax -10
ralf@theCo.de http://www.theCo.de

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2009-02-18 23:09                       ` Ralf Liebenow
@ 2009-02-18 23:19                         ` Eric Sandeen
  0 siblings, 0 replies; 61+ messages in thread
From: Eric Sandeen @ 2009-02-18 23:19 UTC (permalink / raw)
  To: ralf; +Cc: xfs

Ralf Liebenow wrote:
> Hello !
> 
>> Correct ordering can be proven to be enough to provide transactional
>> correctness, enough to ensure that filesystems can not get corrupted
>> on power down.
> 
> Please beware that caching RAID controllers which are not battery
> backed and the harddisk (when write caching) may decide to 
> re-order writes to the disk, so the ordering imposed by the 
> operating system (filesystem driver) may not be retained. 
> This is usually done by harddisks and
> controllers to minimize seek times and thats what disk
> command queueing is good for. So ordering can only be retained
> if all external caching mechanism and command queueing are
> switched off. 

That's not necessarily true.

The only *requirement* for barriers is preservation of ordering.  It is
*implemented* today by cache flushing, because that's the best we can do
for now (as I understand it).

It is certainly possible that an IO could be flagged which tells the
drive that it may not rearrange the cache destaging across a barrier IO.
 It could cache at will, as long as the critical ordering is maintained.

http://www.t13.org/Documents/UploadedDocuments/docs2007/e07174r0-Write_Barrier_Command_Proposal.doc

So even though barriers are be implemented w/ cache flushes today, it
would be a mistake to rely on that implementation.  IOW, don't confuse
cache flushing w/ ordering requirements just because the ordering
problem is solved *today* with cache flushes.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
  2009-02-18 22:14                     ` Leon Woestenberg
  2009-02-18 22:24                       ` Eric Sandeen
  2009-02-18 23:09                       ` Ralf Liebenow
@ 2009-02-20 19:19                       ` Peter Grandi
  2 siblings, 0 replies; 61+ messages in thread
From: Peter Grandi @ 2009-02-20 19:19 UTC (permalink / raw)
  To: Linux XFS

>>> The purpose of barriers is to guarantee that relevant data is known
>>> to be on persistent storage (kind of hardware 'fsync').

>>> [ ... ] Unfortunately in my understanding none of this is reflected
>>> by Documentation/block/barrier.txt

>> But we are talking about XFS and barriers here. That described just a
>> (flawed, buggy) mechanism to implement those. Consider for example:

>> http://www.xfs.org/index.php/XFS_FAQ#Write_barrier_support.
>> http://www.xfs.org/index.php/XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache.3F

>> In any case as to the kernel "barrier" mechanism, its description is
>> misleading because it heavily fixates on the ordering issue, which is
>> just a consequence, but yet mentions the far more important "flush/sync"
>> aspect.

>> Still, there is a lot of confusion about barrier support and what it
>> means at which level, as reflected in several online discussions and
>> the different behaviour of different kernel versions.

> The semantics of a barrier are whatever semantics we describe to it.
> So we can continue to be confused about it.

As Humpty Dumpty said, one can make anything mean anything.

But we are not discussing the *semantics* of barriers...

We are discussing, as the original poster said, their *purpose*. The
semantics are a formal property, and the purpose is a practical one.

There is no dispute that Linux/Posix barrier *semantics* do not require
any form of persistence at all, ever, only that *if* data is made
persistent that be done in order.

The question is about the *purpose* of barriers, and that is to
implement timely, reliable transactions to persistent storage, and
ordering consistency is just a side effect of that.

But then if the *semantics* of Linux barriers do not support the
*purpose* of barriers, those semantics are buggy.

> [ ... ] Correct ordering can be proven to be enough to provide
> transactional correctness, enough to ensure that filesystems can not
> get corrupted on power down.

Indeed, and it can also be proven that not writing *anything* to disk is
enough to provide transactional correctness and correct ordering; and
filesystems that do not get written to cannot get corrupted ever.

By the same principle, whether one loses 1KiB or 10GiB or 1TiB of
pending transactions matters not at all to the semantics of Linux
barriers, because that's a violation of a stronger predicate:

> Using barriers to guarantee that (all submitted) write requests
> (before the barrier) made it to the medium are a stronger predicate.

Sure, and indeed, writing nothing guarantees transactional correctness,
and fully respects the semantics of Linux barriers. That's by far the
safest and most semantically correct solution.

There are however deluded fellows like those who use computers to record
real-world transactions who care about whether and when data is made
persistent (and usually as quickly as possible) and to whom consistency
is a side effect of completeness.

Fortunately such concerns are not significant because they require
excessively strong semantics:

> The Linux approach and documentation talks about the first type of
> semantics (which I rather like for them being strong enough and not
> more).

Precisely as the "Linux approach and documentation" do not guarantee
that anything will ever be written to disk, preserving transactional
correctness with the least possible effort. Why bother with stronger
predicates?

BTW, as the other links that I have provided show, the root cause of
this silliness is that POSIX 'fsync' does not guarantee persistency
either, only (practically useless on its own) ordering.

But that is a bug, not something for clever people to claim is righteous
not-stronger-than-necessary semantics.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-14 18:33 ` Martin Steigerwald
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-14 18:33 UTC (permalink / raw)
  To: xfs; +Cc: Linux RAID

Am Sonntag 14 Dezember 2008 schrieben Sie:
> Am Sonntag 14 Dezember 2008 schrieb Peter Grandi:

> > But talking about barriers in the context of metadata, and for a
> > "benchmark" which has a metadata barrier every 11KB, and without
> > knowing whether the storage subsystem can queue multiple barrier
> > operations seems to be pretty crass and meangingless, if not
> > misleading. A waste of time at best.
>
> Hmmm, as far as I understood it would be that the IO scheduler would
> handle barrier requests itself if the device was not capable for
> queuing and ordering requests.
>
> Only thing that occurs to me know, that with barriers off it has more
> freedom to order requests and that might matter for that metadata
> intensive workload. With barriers it can only order 11 KB of requests.
> Without it could order as much as it wants... but even then the
> filesystem would have to make sure that metadata changes land in the
> journal first and then in-place. And this would involve a sync, if no
> barrier request was possible.

No it hasn't. As I do not think XFS or any other filesystem would be keen 
to see the IO scheduler reorder a journal write after a corresponding 
meta data in-place write. So either the filesystem uses sync...

> So I still don't get why even that metadata intense workload of tar -xf
> linux-2.6.27.tar.bz2 - or may better bzip2 -d the tar before - should
> be slower with barriers + write cache on than with no barriers and
> write cache off.

... or it tells the scheduler that this journal write should come prior to 
the later writes. This is what a barrier would do - except for that it 
cannot utilize any additional in-hardware /  in-firmware support.

So why on earth can write cache off +  barrier off be faster than write 
cache on + barrier on in *any workload*? There must be some technical 
detail that I miss.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 12x performance drop on md/linux+sw raid1 due to barriers [xfs]
@ 2008-12-14 18:33 ` Martin Steigerwald
  0 siblings, 0 replies; 61+ messages in thread
From: Martin Steigerwald @ 2008-12-14 18:33 UTC (permalink / raw)
  To: xfs; +Cc: Linux RAID

Am Sonntag 14 Dezember 2008 schrieben Sie:
> Am Sonntag 14 Dezember 2008 schrieb Peter Grandi:

> > But talking about barriers in the context of metadata, and for a
> > "benchmark" which has a metadata barrier every 11KB, and without
> > knowing whether the storage subsystem can queue multiple barrier
> > operations seems to be pretty crass and meangingless, if not
> > misleading. A waste of time at best.
>
> Hmmm, as far as I understood it would be that the IO scheduler would
> handle barrier requests itself if the device was not capable for
> queuing and ordering requests.
>
> Only thing that occurs to me know, that with barriers off it has more
> freedom to order requests and that might matter for that metadata
> intensive workload. With barriers it can only order 11 KB of requests.
> Without it could order as much as it wants... but even then the
> filesystem would have to make sure that metadata changes land in the
> journal first and then in-place. And this would involve a sync, if no
> barrier request was possible.

No it hasn't. As I do not think XFS or any other filesystem would be keen 
to see the IO scheduler reorder a journal write after a corresponding 
meta data in-place write. So either the filesystem uses sync...

> So I still don't get why even that metadata intense workload of tar -xf
> linux-2.6.27.tar.bz2 - or may better bzip2 -d the tar before - should
> be slower with barriers + write cache on than with no barriers and
> write cache off.

... or it tells the scheduler that this journal write should come prior to 
the later writes. This is what a barrier would do - except for that it 
cannot utilize any additional in-hardware /  in-firmware support.

So why on earth can write cache off +  barrier off be faster than write 
cache on + barrier on in *any workload*? There must be some technical 
detail that I miss.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2009-02-22  9:58 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-06 14:28 12x performance drop on md/linux+sw raid1 due to barriers [xfs] Justin Piszcz
2008-12-06 14:28 ` Justin Piszcz
2008-12-06 15:36 ` Eric Sandeen
2008-12-06 20:35   ` Redeeman
2008-12-06 20:35     ` Redeeman
2008-12-13 12:54   ` Justin Piszcz
2008-12-13 12:54     ` Justin Piszcz
2008-12-13 17:26     ` Martin Steigerwald
2008-12-13 17:26       ` Martin Steigerwald
2008-12-13 17:40       ` Eric Sandeen
2008-12-13 17:40         ` Eric Sandeen
2008-12-14  3:31         ` Redeeman
2008-12-14  3:31           ` Redeeman
2008-12-14 14:02           ` Peter Grandi
2008-12-14 14:02             ` Peter Grandi
2008-12-14 18:12             ` Martin Steigerwald
2008-12-14 18:12               ` Martin Steigerwald
2008-12-14 22:02               ` Peter Grandi
2008-12-14 22:02                 ` Peter Grandi
2008-12-15 18:48                 ` Martin Steigerwald
2008-12-15 22:50                   ` Peter Grandi
2009-02-18 22:14                     ` Leon Woestenberg
2009-02-18 22:24                       ` Eric Sandeen
2009-02-18 23:09                       ` Ralf Liebenow
2009-02-18 23:19                         ` Eric Sandeen
2009-02-20 19:19                       ` Peter Grandi
2008-12-15 22:38                 ` Dave Chinner
2008-12-15 22:38                   ` Dave Chinner
2008-12-16  9:39                   ` Martin Steigerwald
2008-12-16  9:39                     ` Martin Steigerwald
2008-12-16 20:57                     ` Peter Grandi
2008-12-16 23:14                     ` Dave Chinner
2008-12-16 23:14                       ` Dave Chinner
2008-12-17 21:40                 ` Bill Davidsen
2008-12-17 21:40                   ` Bill Davidsen
2008-12-18  8:20                   ` Leon Woestenberg
2008-12-18 23:33                     ` Bill Davidsen
2008-12-21 19:16                     ` Peter Grandi
2008-12-22 13:19                       ` Leon Woestenberg
2008-12-22 13:19                         ` Leon Woestenberg
2008-12-18 22:26                   ` Dave Chinner
2008-12-18 22:26                     ` Dave Chinner
2008-12-20 14:06               ` Peter Grandi
2008-12-14 18:35             ` Martin Steigerwald
2008-12-14 18:35               ` Martin Steigerwald
2008-12-14 17:49           ` Martin Steigerwald
2008-12-14 17:49             ` Martin Steigerwald
2008-12-14 23:36         ` Dave Chinner
2008-12-14 23:36           ` Dave Chinner
2008-12-14 23:55           ` Eric Sandeen
2008-12-13 18:01       ` David Lethe
2008-12-13 18:01         ` David Lethe
2008-12-06 18:42 ` Peter Grandi
2008-12-11  0:20 ` Bill Davidsen
2008-12-11  0:20   ` Bill Davidsen
2008-12-11  9:18   ` Justin Piszcz
2008-12-11  9:18     ` Justin Piszcz
2008-12-11  9:24     ` Justin Piszcz
2008-12-11  9:24       ` Justin Piszcz
2008-12-14 18:33 Martin Steigerwald
2008-12-14 18:33 ` Martin Steigerwald

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.