All of lore.kernel.org
 help / color / mirror / Atom feed
* Bad performance with XFS + 2.6.38 / 2.6.39
@ 2011-12-11 12:45 Xupeng Yun
  2011-12-11 23:39 ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Xupeng Yun @ 2011-12-11 12:45 UTC (permalink / raw)
  To: XFS group


[-- Attachment #1.1: Type: text/plain, Size: 4795 bytes --]

Hi,

I am using XFS + 2.6.29 on my MySQL servers, they perform great.

I am testing XFS on SSD these days, due to the fact that FITRIM support of
XFS was
shipped with Linux kernel 2.6.38 or newer, I tested XFS + 2.6.38 and XFS +
2.6.39, but
it surprises me that the performance of XFS with these two versions of
kernel drops so
much.

Here are the results of my tests with fio, all these two tests were taken
on the same hardware
with same testing environment (except for different kernel version).

====== XFS + 2.6.29 ======

# mount | grep /mnt/xfs
/dev/sdc1 on /mnt/xfs type xfs (rw,noatime,nodiratime,nobarrier,logbufs=8)
# fio --filename=/mnt/xfs/test --direct=1 --rw=randrw --bs=16k --size=50G
--numjobs=16 --runtime=120 --group_reporting --name=test --rwmixread=90
--thread --ioengine=psync
test: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
...
test: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
fio 1.58
Starting 16 threads
test: Laying out IO file(s) (1 file(s) / 51200MB)
Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [181.5M/21118K /s]
[11.4K/1289 iops] [eta 00m:00s]
test: (groupid=0, jobs=16): err= 0: pid=8446
read : io=21312MB, bw=181862KB/s, iops=11366 , runt=120001msec
clat (usec): min=80 , max=146337 , avg=1369.72, stdev=1026.26
lat (usec): min=81 , max=146338 , avg=1370.87, stdev=1026.27
bw (KB/s) : min= 6998, max=13600, per=6.26%, avg=11376.13, stdev=499.42
write: io=2369.4MB, bw=20218KB/s, iops=1263 , runt=120001msec
clat (usec): min=67 , max=145760 , avg=268.28, stdev=894.06
lat (usec): min=67 , max=145761 , avg=269.46, stdev=894.09
bw (KB/s) : min= 509, max= 2166, per=6.26%, avg=1265.42, stdev=213.82
cpu : usr=11.09%, sys=44.83%, ctx=26015341, majf=0, minf=8396
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=1363980/151635/0, short=0/0/0
lat (usec): 100=0.11%, 250=5.85%, 500=3.79%, 750=0.32%, 1000=5.51%
lat (msec): 2=80.06%, 4=1.26%, 10=3.07%, 20=0.01%, 50=0.01%
lat (msec): 100=0.01%, 250=0.01%

Run status group 0 (all jobs):
READ: io=21312MB, aggrb=181862KB/s, minb=186227KB/s, maxb=186227KB/s,
mint=120001msec, maxt=120001msec
WRITE: io=2369.4MB, aggrb=20217KB/s, minb=20703KB/s, maxb=20703KB/s,
mint=120001msec, maxt=120001msec

Disk stats (read/write):
sdc: ios=1361926/151423, merge=0/0, ticks=1793432/27812, in_queue=1820240,
util=99.99%




====== XFS + 2.6.39 ======

# mount | grep /mnt/xfs
/dev/sdc1 on /mnt/xfs type xfs (rw,noatime,nodiratime,nobarrier,logbufs=8)
# fio --filename=/mnt/xfs/test --direct=1 --rw=randrw --bs=16k --size=50G
--numjobs=16 --runtime=120 --group_reporting --name=test --rwmixread=90
--thread --ioengine=psync
test: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
...
test: (g=0): rw=randrw, bs=16K-16K/16K-16K, ioengine=psync, iodepth=1
fio 1.58
Starting 16 threads
test: Laying out IO file(s) (1 file(s) / 51200MB)
Jobs: 16 (f=16): [mmmmmmmmmmmmmmmm] [100.0% done] [58416K/6680K /s] [3565
/407 iops] [eta 00m:00s]
test: (groupid=0, jobs=16): err= 0: pid=26902
read : io=6507.1MB, bw=55533KB/s, iops=3470 , runt=120004msec
clat (usec): min=155 , max=356038 , avg=4562.52, stdev=4748.18
lat (usec): min=156 , max=356038 , avg=4562.69, stdev=4748.19
bw (KB/s) : min= 1309, max= 4864, per=6.26%, avg=3479.03, stdev=441.47
write: io=741760KB, bw=6181.2KB/s, iops=386 , runt=120004msec
clat (usec): min=71 , max=348236 , avg=390.11, stdev=3106.30
lat (usec): min=71 , max=348236 , avg=390.31, stdev=3106.30
bw (KB/s) : min= 28, max= 921, per=6.29%, avg=389.02, stdev=114.68
cpu : usr=3.43%, sys=11.12%, ctx=21598477, majf=0, minf=7762
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=416508/46360/0, short=0/0/0
lat (usec): 100=2.65%, 250=0.98%, 500=6.58%, 750=31.88%, 1000=0.27%
lat (msec): 2=0.08%, 4=0.23%, 10=55.04%, 20=1.76%, 50=0.49%
lat (msec): 100=0.02%, 250=0.01%, 500=0.01%

Run status group 0 (all jobs):
READ: io=6507.1MB, aggrb=55532KB/s, minb=56865KB/s, maxb=56865KB/s,
mint=120004msec, maxt=120004msec
WRITE: io=741760KB, aggrb=6181KB/s, minb=6329KB/s, maxb=6329KB/s,
mint=120004msec, maxt=120004msec

Disk stats (read/write):
sdc: ios=416285/46351, merge=0/1, ticks=108136/8768, in_queue=116368,
util=93.60%


as the tests result shows, the IOPS of XFS + 2.6.29 is about 12600, but it
drops to about 3900
with XFS + 2.6.39.

I tried different XFS format options and different mount options, but
it did not help.

Any thought?

--
Xupeng Yun
http://about.me/xupeng

[-- Attachment #1.2: Type: text/html, Size: 5422 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-11 12:45 Bad performance with XFS + 2.6.38 / 2.6.39 Xupeng Yun
@ 2011-12-11 23:39 ` Dave Chinner
  2011-12-12  0:40   ` Xupeng Yun
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2011-12-11 23:39 UTC (permalink / raw)
  To: Xupeng Yun; +Cc: XFS group

On Sun, Dec 11, 2011 at 08:45:14PM +0800, Xupeng Yun wrote:
> Hi,
> 
> I am using XFS + 2.6.29 on my MySQL servers, they perform great.
> 
> I am testing XFS on SSD these days, due to the fact that FITRIM support of
> XFS was
> shipped with Linux kernel 2.6.38 or newer, I tested XFS + 2.6.38 and XFS +
> 2.6.39, but
> it surprises me that the performance of XFS with these two versions of
> kernel drops so
> much.
> 
> Here are the results of my tests with fio, all these two tests were taken
> on the same hardware
> with same testing environment (except for different kernel version).
> 
> ====== XFS + 2.6.29 ======

Read 21GB @ 11k iops, 210MB/s, av latency of 1.3ms/IO
Wrote 2.3GB @ 1250 iops, 20MB/s, av latency of 0.27ms/IO
Total 1.5m IOs, 95% @ <= 2ms

> ====== XFS + 2.6.39 ======

Read 6.5GB @ 3.5k iops, 55MB/s, av latency of 4.5ms/IO
Wrote 700MB @ 386 iops, 6MB/s, av latency of 0.39ms/IO
Total 460k IOs, 95% @ <= 10ms, 4ms > 50% < 10ms

Looking at the IO stats there, this doesn't look to me like an XFS
problem. The IO times are much, much longer on 2.6.39, so that's the
first thing to understand. If the two tests are doing identical IO
patterns, then I'd be looking at validating raw device performance
first.

> I tried different XFS format options and different mount options, but
> it did not help.

It won't if the problem is inthe layers below XFS.

e.g. IO scheduler behavioural changes could be the cause (esp. if
you are using CFQ), the SSD could be in different states or running
garbage collection intermittently and slowing things down, the
filesystem could be in different states (did you use a fresh
filesystem for each of these tests?), etc, recent mkfs.xfs will trim
the entire device if the kernel supports it, etc.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-11 23:39 ` Dave Chinner
@ 2011-12-12  0:40   ` Xupeng Yun
  2011-12-12  1:00     ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Xupeng Yun @ 2011-12-12  0:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: XFS group


[-- Attachment #1.1: Type: text/plain, Size: 2731 bytes --]

On Mon, Dec 12, 2011 at 07:39, Dave Chinner <david@fromorbit.com> wrote:
>
> > ====== XFS + 2.6.29 ======
>
> Read 21GB @ 11k iops, 210MB/s, av latency of 1.3ms/IO
> Wrote 2.3GB @ 1250 iops, 20MB/s, av latency of 0.27ms/IO
> Total 1.5m IOs, 95% @ <= 2ms
>
> > ====== XFS + 2.6.39 ======
>
> Read 6.5GB @ 3.5k iops, 55MB/s, av latency of 4.5ms/IO
> Wrote 700MB @ 386 iops, 6MB/s, av latency of 0.39ms/IO
> Total 460k IOs, 95% @ <= 10ms, 4ms > 50% < 10ms
>
> Looking at the IO stats there, this doesn't look to me like an XFS
> problem. The IO times are much, much longer on 2.6.39, so that's the
> first thing to understand. If the two tests are doing identical IO
> patterns, then I'd be looking at validating raw device performance
> first.
>

Thank you Dave.

I also did raw device and ext4 performance test with 2.6.39, all these
tests are
doing identical IO patterns(non-buffered IO, 16 IO threads, 16KB block size,
mixed random read and write, r:w=9:1):
====== raw device + 2.6.39 ======
Read 21.7GB @ 11.6k IOPS , 185MB/s, av latency of 1.37 ms/IO
Wrote 2.4GB @ 1.3k IOPS, 20MB/s, av latency of 0.095 ms/IO
Total 1.5M IOs, @ 96% <= 2ms

====== ext4 + 2.6.39 ======
Read 21.7GB @ 11.6k IOPS , 185MB/s, av latency of 1.37 ms/IO
Wrote 2.4GB @ 1.3k IOPS, 20MB/s, av latency of 0.1 ms/IO
Total 1.5M IOs, @ 96% <= 2ms

====== XFS + 2.6.39 ======
Read 6.5GB @ 3.5k iops, 55MB/s, av latency of 4.5ms/IO
Wrote 700MB @ 386 iops, 6MB/s, av latency of 0.39ms/IO
Total 460k IOs, @ 95% <= 10ms, 4ms > 50% < 10ms

here are the detailed test results:
== 2.6.39 ==
http://blog.xupeng.me/wp-content/uploads/ext4-xfs-perf/2.6.39-xfs.txt
http://blog.xupeng.me/wp-content/uploads/ext4-xfs-perf/2.6.39-ext4.txt
http://blog.xupeng.me/wp-content/uploads/ext4-xfs-perf/2.6.39-raw.txt

== 2.6.29 ==
http://blog.xupeng.me/wp-content/uploads/ext4-xfs-perf/2.6.29-xfs.txt
http://blog.xupeng.me/wp-content/uploads/ext4-xfs-perf/2.6.29-ext4.txt
http://blog.xupeng.me/wp-content/uploads/ext4-xfs-perf/2.6.29-raw.txt

>
> > I tried different XFS format options and different mount options, but
> > it did not help.
>
> It won't if the problem is inthe layers below XFS.
>
> e.g. IO scheduler behavioural changes could be the cause (esp. if
> you are using CFQ), the SSD could be in different states or running
> garbage collection intermittently and slowing things down, the
> filesystem could be in different states (did you use a fresh
> filesystem for each of these tests?), etc, recent mkfs.xfs will trim
> the entire device if the kernel supports it, etc.


I did all the tests on the same server with deadline scheduler, and
xfsprogs version
is 3.1.4. I also ran tests with noop scheduler, but not big difference.

--
Xupeng Yun
http://about.me/xupeng

[-- Attachment #1.2: Type: text/html, Size: 3807 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-12  0:40   ` Xupeng Yun
@ 2011-12-12  1:00     ` Dave Chinner
  2011-12-12  2:00       ` Xupeng Yun
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2011-12-12  1:00 UTC (permalink / raw)
  To: Xupeng Yun; +Cc: XFS group

On Mon, Dec 12, 2011 at 08:40:15AM +0800, Xupeng Yun wrote:
> On Mon, Dec 12, 2011 at 07:39, Dave Chinner <david@fromorbit.com> wrote:
> >
> > > ====== XFS + 2.6.29 ======
> >
> > Read 21GB @ 11k iops, 210MB/s, av latency of 1.3ms/IO
> > Wrote 2.3GB @ 1250 iops, 20MB/s, av latency of 0.27ms/IO
> > Total 1.5m IOs, 95% @ <= 2ms
> >
> > > ====== XFS + 2.6.39 ======
> >
> > Read 6.5GB @ 3.5k iops, 55MB/s, av latency of 4.5ms/IO
> > Wrote 700MB @ 386 iops, 6MB/s, av latency of 0.39ms/IO
> > Total 460k IOs, 95% @ <= 10ms, 4ms > 50% < 10ms
> >
> > Looking at the IO stats there, this doesn't look to me like an XFS
> > problem. The IO times are much, much longer on 2.6.39, so that's the
> > first thing to understand. If the two tests are doing identical IO
> > patterns, then I'd be looking at validating raw device performance
> > first.
> >
> 
> Thank you Dave.
> 
> I also did raw device and ext4 performance test with 2.6.39, all these
> tests are
> doing identical IO patterns(non-buffered IO, 16 IO threads, 16KB block size,
> mixed random read and write, r:w=9:1):
> ====== raw device + 2.6.39 ======
> Read 21.7GB @ 11.6k IOPS , 185MB/s, av latency of 1.37 ms/IO
> Wrote 2.4GB @ 1.3k IOPS, 20MB/s, av latency of 0.095 ms/IO
> Total 1.5M IOs, @ 96% <= 2ms
> 
> ====== ext4 + 2.6.39 ======
> Read 21.7GB @ 11.6k IOPS , 185MB/s, av latency of 1.37 ms/IO
> Wrote 2.4GB @ 1.3k IOPS, 20MB/s, av latency of 0.1 ms/IO
> Total 1.5M IOs, @ 96% <= 2ms
> 
> ====== XFS + 2.6.39 ======
> Read 6.5GB @ 3.5k iops, 55MB/s, av latency of 4.5ms/IO
> Wrote 700MB @ 386 iops, 6MB/s, av latency of 0.39ms/IO
> Total 460k IOs, @ 95% <= 10ms, 4ms > 50% < 10ms

Oh, of course, now I remember what the problem is - it's a locking
issue that was fixed in 3.0.11, 3.1.5 and 3.2-rc1.

commit 0c38a2512df272b14ef4238b476a2e4f70da1479
Author: Dave Chinner <dchinner@redhat.com>
Date:   Thu Aug 25 07:17:01 2011 +0000

    xfs: don't serialise direct IO reads on page cache checks
    
    There is no need to grab the i_mutex of the IO lock in exclusive
    mode if we don't need to invalidate the page cache. Taking these
    locks on every direct IO effective serialises them as taking the IO
    lock in exclusive mode has to wait for all shared holders to drop
    the lock. That only happens when IO is complete, so effective it
    prevents dispatch of concurrent direct IO reads to the same inode.
    
    Fix this by taking the IO lock shared to check the page cache state,
    and only then drop it and take the IO lock exclusively if there is
    work to be done. Hence for the normal direct IO case, no exclusive
    locking will occur.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Tested-by: Joern Engel <joern@logfs.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Alex Elder <aelder@sgi.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-12  1:00     ` Dave Chinner
@ 2011-12-12  2:00       ` Xupeng Yun
  2011-12-12 13:57         ` Christoph Hellwig
  2011-12-21  9:08         ` Yann Dupont
  0 siblings, 2 replies; 20+ messages in thread
From: Xupeng Yun @ 2011-12-12  2:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: XFS group


[-- Attachment #1.1: Type: text/plain, Size: 254 bytes --]

On Mon, Dec 12, 2011 at 09:00, Dave Chinner <david@fromorbit.com> wrote:

> Oh, of course, now I remember what the problem is - it's a locking
> issue that was fixed in 3.0.11, 3.1.5 and 3.2-rc1.
>

Got it, thanks.

-- 
Xupeng Yun
http://about.me/xupeng

[-- Attachment #1.2: Type: text/html, Size: 613 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-12  2:00       ` Xupeng Yun
@ 2011-12-12 13:57         ` Christoph Hellwig
  2011-12-21  9:08         ` Yann Dupont
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2011-12-12 13:57 UTC (permalink / raw)
  To: Xupeng Yun; +Cc: XFS group

On Mon, Dec 12, 2011 at 10:00:18AM +0800, Xupeng Yun wrote:
> On Mon, Dec 12, 2011 at 09:00, Dave Chinner <david@fromorbit.com> wrote:
> 
> > Oh, of course, now I remember what the problem is - it's a locking
> > issue that was fixed in 3.0.11, 3.1.5 and 3.2-rc1.
> >
> 
> Got it, thanks.

Btw, I'd recommend to stay on Linux 3.0-stable instead of 2.6.38 or
2.6.39 as it will actively get all bugfixes backported for a while.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-12  2:00       ` Xupeng Yun
  2011-12-12 13:57         ` Christoph Hellwig
@ 2011-12-21  9:08         ` Yann Dupont
  2011-12-21 15:10           ` Stan Hoeppner
  1 sibling, 1 reply; 20+ messages in thread
From: Yann Dupont @ 2011-12-21  9:08 UTC (permalink / raw)
  To: xfs

Le 12/12/2011 03:00, Xupeng Yun a écrit :
>
>
> On Mon, Dec 12, 2011 at 09:00, Dave Chinner <david@fromorbit.com
> <mailto:david@fromorbit.com>> wrote:
>
>     Oh, of course, now I remember what the problem is - it's a locking
>     issue that was fixed in 3.0.11, 3.1.5 and 3.2-rc1.
>
>
> Got it, thanks.
>
> --
> Xupeng Yun
> http://about.me/xupeng
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

I'm seeing more or less the same here.

Generally speaking XFS code in recent kernels seems to decrease CPU 
usage and be faster, which is a very good thing (good works, guy). But...

On two particular server, with recent kernels, I experience a much 
higher load than expected, but it's very hard to tell what's wrong. The 
system seems more in I/O wait. Older kernels (2.6.32.xx and 2.6.26.xx) 
gives better results.

Following this thread, I thought I have the same problems, but it's 
probably not the case, as I have tested 2.6.38.xx, 3.0.13, 3.1.5 with 
same results.

Thoses servers are mail (dovecot) servers, with lots of simultaneous 
imap clients (5000+) an lots of simultaneous message delivery.

These are linux-vservers, on top of LVM volumes. The storage is SAN with 
15k RPM SAS drives (and battery backup).

I know barriers were disabled in older kernels, so with recents kernels, 
XFS volumes were mounted with nobarrier.

As those servers are critical for us, I can't really test, hardly give 
you more precise numbers, and I don't know how to accurately reproduce 
this platform to test what's wrong. I know this is NOT a precise bug 
report and it won't help much.

All I can say IS :

- read operations seems no slower with recent kernels, backups take 
approximatively the same time ;
- I'd say (but I have no proof) that delivery of new mails takes more 
time and is more synchronous than before, like nobarrier have no effect.

Does this ring a bell to some of you ?

Thanks,
-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-21  9:08         ` Yann Dupont
@ 2011-12-21 15:10           ` Stan Hoeppner
  2011-12-21 17:56             ` Yann Dupont
  0 siblings, 1 reply; 20+ messages in thread
From: Stan Hoeppner @ 2011-12-21 15:10 UTC (permalink / raw)
  To: Yann Dupont; +Cc: xfs

On 12/21/2011 3:08 AM, Yann Dupont wrote:
> Le 12/12/2011 03:00, Xupeng Yun a écrit :
>>
>>
>> On Mon, Dec 12, 2011 at 09:00, Dave Chinner <david@fromorbit.com
>> <mailto:david@fromorbit.com>> wrote:
>>
>>     Oh, of course, now I remember what the problem is - it's a locking
>>     issue that was fixed in 3.0.11, 3.1.5 and 3.2-rc1.
>>
>>
>> Got it, thanks.
>>
>> -- 
>> Xupeng Yun
>> http://about.me/xupeng
>>
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
> 
> I'm seeing more or less the same here.
> 
> Generally speaking XFS code in recent kernels seems to decrease CPU
> usage and be faster, which is a very good thing (good works, guy). But...
> 
> On two particular server, with recent kernels, I experience a much
> higher load than expected, but it's very hard to tell what's wrong. The
> system seems more in I/O wait. Older kernels (2.6.32.xx and 2.6.26.xx)
> gives better results.
> 
> Following this thread, I thought I have the same problems, but it's
> probably not the case, as I have tested 2.6.38.xx, 3.0.13, 3.1.5 with
> same results.
> 
> Thoses servers are mail (dovecot) servers, with lots of simultaneous
> imap clients (5000+) an lots of simultaneous message delivery.
> 
> These are linux-vservers, on top of LVM volumes. The storage is SAN with
> 15k RPM SAS drives (and battery backup).
> 
> I know barriers were disabled in older kernels, so with recents kernels,
> XFS volumes were mounted with nobarrier.
> 
> As those servers are critical for us, I can't really test, hardly give
> you more precise numbers, and I don't know how to accurately reproduce
> this platform to test what's wrong. I know this is NOT a precise bug
> report and it won't help much.
> 
> All I can say IS :
> 
> - read operations seems no slower with recent kernels, backups take
> approximatively the same time ;
> - I'd say (but I have no proof) that delivery of new mails takes more
> time and is more synchronous than before, like nobarrier have no effect.
> 
> Does this ring a bell to some of you ?

1.  What mailbox format are you using?  Is this a constant or variable?
2.  Is the Dovecot rev and config the same everywhere, before/after?
3.  Are Dovecot instances using NFS to access the XFS volumes?
4.  Is this a  Dovecot 2.x cluster with director and NFS storage?

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-21 15:10           ` Stan Hoeppner
@ 2011-12-21 17:56             ` Yann Dupont
  2011-12-21 22:26               ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Yann Dupont @ 2011-12-21 17:56 UTC (permalink / raw)
  To: stan; +Cc: xfs

Le 21/12/2011 16:10, Stan Hoeppner a écrit :


> 1.  What mailbox format are you using?  Is this a constant or variable?

Maildir++

> 2.  Is the Dovecot rev and config the same everywhere, before/after?
Yes
> 3.  Are Dovecot instances using NFS to access the XFS volumes?
NO. direct LVM volumes from SAN
> 4.  Is this a  Dovecot 2.x cluster with director and NFS storage?
>
NO. This is dovecot plain & simple.

When I go back to older kernels, the load go down. With newer kernel, 
all is working well too, but load (as reported by uptime) is higher.

Thanks,

-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-21 17:56             ` Yann Dupont
@ 2011-12-21 22:26               ` Dave Chinner
  2011-12-22  9:23                 ` Yann Dupont
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2011-12-21 22:26 UTC (permalink / raw)
  To: Yann Dupont; +Cc: stan, xfs

On Wed, Dec 21, 2011 at 06:56:34PM +0100, Yann Dupont wrote:
> Le 21/12/2011 16:10, Stan Hoeppner a écrit :
> 
> 
> >1.  What mailbox format are you using?  Is this a constant or variable?
> 
> Maildir++

Which is what?

> >2.  Is the Dovecot rev and config the same everywhere, before/after?
> Yes
> >3.  Are Dovecot instances using NFS to access the XFS volumes?
> NO. direct LVM volumes from SAN
> >4.  Is this a  Dovecot 2.x cluster with director and NFS storage?
> >
> NO. This is dovecot plain & simple.
> 
> When I go back to older kernels, the load go down. With newer
> kernel, all is working well too, but load (as reported by uptime) is
> higher.

Can you run a block trace on both kernels (for say five minutes)
when the load differential is showing up and provide that to us so
we can see how the IO patterns are differing?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-21 22:26               ` Dave Chinner
@ 2011-12-22  9:23                 ` Yann Dupont
  2011-12-22 11:02                   ` Yann Dupont
  0 siblings, 1 reply; 20+ messages in thread
From: Yann Dupont @ 2011-12-22  9:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: stan, xfs

Le 21/12/2011 23:26, Dave Chinner a écrit :
> On Wed, Dec 21, 2011 at 06:56:34PM +0100, Yann Dupont wrote:
>> Le 21/12/2011 16:10, Stan Hoeppner a écrit :
>>
>>
>>> 1.  What mailbox format are you using?  Is this a constant or variable?
>>
>> Maildir++
>
> Which is what?

Each message is saved as an individual files. With loaded mailboxex, you 
can end with thousands of files under one directory.

>
> Can you run a block trace on both kernels (for say five minutes)
> when the load differential is showing up and provide that to us so
> we can see how the IO patterns are differing?
>

Yes. Keep in mind that I can have very different load in 5 minutes distance.
I have 2 options here :

Case 1 : For server 1, running with kernel 2.6.26, blktrace it, reboot 
in 3.1.5, blktrace it. OR

Case 2 : Server 1 running 2.6.26 and Server 2 running 3.1.5

in Case 1 you have the same set of users and the same config garantied.

in case 2, you don't have the same set of users, but the load can be 
quite symetric (think of mailing list)


I will go for case 2 I think.

Last question :

The volumes are on top of lvm2 (sitting on top of multipath dm)


I'll try to blktrace the volume itself, but I can end blktracing the 
scsi primary path (don't know if it make a difference)

Is there any option you want in blktrace ???

Thanks,


-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-22  9:23                 ` Yann Dupont
@ 2011-12-22 11:02                   ` Yann Dupont
  2012-01-02 10:06                     ` Yann Dupont
  0 siblings, 1 reply; 20+ messages in thread
From: Yann Dupont @ 2011-12-22 11:02 UTC (permalink / raw)
  To: Yann Dupont; +Cc: stan, xfs

Le 22/12/2011 10:23, Yann Dupont a écrit :
>
>> Can you run a block trace on both kernels (for say five minutes)
>> when the load differential is showing up and provide that to us so
>> we can see how the IO patterns are differing?


here we go.

1st server : Birnie, is running 2.6.26. This is normally the more loaded 
server (more active users)

2nd server : Penderyn, is runing a freshly compiled 3.1.6.


blktrace of relevent volumes during 10 minutes. The 2 machines are 
identical (poweredge M1610) : same mem & proc, disks, fibre channel 
cards, SAN disks ...

birnie:~/TRACE# uptime
  11:48:34 up 17:18,  3 users,  load average: 0.04, 0.18, 0.23

penderyn:~/TRACE# uptime
  11:48:30 up 23 min,  3 users,  load average: 4.03, 3.82, 3.21

As you can see, very sensible load difference. keep in mind my 
university is on holiday right now, so the load is really _very much 
lower_ than usual. In normal times, with 2.6.26 kernels, birnie has a 
load in 2 .. 6 range.

here are the results :


birnie:~/TRACE# blktrace /dev/gromelac/gromelac 
/dev/POMEROL-R0-P0/gromeldi -w 600
=== dm-18 ===
   CPU  0:                26787 events,     1256 KiB data
   CPU  1:                  530 events,       25 KiB data
   CPU  2:                 1811 events,       85 KiB data
   CPU  3:                  104 events,        5 KiB data
   CPU  4:                 5824 events,      274 KiB data
   CPU  5:                  146 events,        7 KiB data
   CPU  6:                 1958 events,       92 KiB data
   CPU  7:                  176 events,        9 KiB data
   CPU  8:                 5456 events,      256 KiB data
   CPU  9:                  175 events,        9 KiB data
   CPU 10:                 1161 events,       55 KiB data
   CPU 11:                  216 events,       11 KiB data
   CPU 12:                  118 events,        6 KiB data
   CPU 13:                   25 events,        2 KiB data
   CPU 14:                  287 events,       14 KiB data
   CPU 15:                  425 events,       20 KiB data
   Total:                 45199 events (dropped 0),     2119 KiB data
=== dm-16 ===
   CPU  0:                27966 events,     1311 KiB data
   CPU  1:                  311 events,       15 KiB data
   CPU  2:                 1403 events,       66 KiB data
   CPU  3:                 1699 events,       80 KiB data
   CPU  4:                 1706 events,       80 KiB data
   CPU  5:                 1515 events,       72 KiB data
   CPU  6:                   30 events,        2 KiB data
   CPU  7:                  428 events,       21 KiB data
   CPU  8:                 6774 events,      318 KiB data
   CPU  9:                  252 events,       12 KiB data
   CPU 10:                 1299 events,       61 KiB data
   CPU 11:                 1391 events,       66 KiB data
   CPU 12:                  111 events,        6 KiB data
   CPU 13:                 2317 events,      109 KiB data
   CPU 14:                  130 events,        7 KiB data
   CPU 15:                  504 events,       24 KiB data
   Total:                 47836 events (dropped 0),     2243 KiB data


and

penderyn:~/TRACE# blktrace /dev/gromeljo/gromeljo /dev/gromelpz/gromelpz 
/dev/POMEROL-R1-P0/gromelpz -w 600
=== dm-14 ===
   CPU  0:                12672 events,      595 KiB data
   CPU  1:                13248 events,      621 KiB data
   CPU  2:                  545 events,       26 KiB data
   CPU  3:                  285 events,       14 KiB data
   CPU  4:                  574 events,       27 KiB data
   CPU  5:                   94 events,        5 KiB data
   CPU  6:                  569 events,       27 KiB data
   CPU  7:                  172 events,        9 KiB data
   CPU  8:                  666 events,       32 KiB data
   CPU  9:                 3231 events,      152 KiB data
   CPU 10:                  610 events,       29 KiB data
   CPU 11:                  221 events,       11 KiB data
   CPU 12:                   11 events,        1 KiB data
   CPU 13:                   20 events,        1 KiB data
   CPU 14:                    6 events,        1 KiB data
   CPU 15:                   30 events,        2 KiB data
   Total:                 32954 events (dropped 0),     1545 KiB data
=== dm-13 ===
   CPU  0:                    0 events,        0 KiB data
   CPU  1:                    0 events,        0 KiB data
   CPU  2:                    1 events,        1 KiB data
   CPU  3:                    0 events,        0 KiB data
   CPU  4:                    0 events,        0 KiB data
   CPU  5:                    0 events,        0 KiB data
   CPU  6:                    0 events,        0 KiB data
   CPU  7:                    0 events,        0 KiB data
   CPU  8:                    0 events,        0 KiB data
   CPU  9:                    0 events,        0 KiB data
   CPU 10:                    0 events,        0 KiB data
   CPU 11:                    0 events,        0 KiB data
   CPU 12:                    0 events,        0 KiB data
   CPU 13:                    0 events,        0 KiB data
   CPU 14:                    0 events,        0 KiB data
   CPU 15:                    0 events,        0 KiB data
   Total:                     1 events (dropped 0),        1 KiB data
=== dm-16 ===
   CPU  0:                17499 events,      821 KiB data
   CPU  1:                15320 events,      719 KiB data
   CPU  2:                 1037 events,       49 KiB data
   CPU  3:                  667 events,       32 KiB data
   CPU  4:                  278 events,       14 KiB data
   CPU  5:                   91 events,        5 KiB data
   CPU  6:                  888 events,       42 KiB data
   CPU  7:                   67 events,        4 KiB data
   CPU  8:                 2317 events,      109 KiB data
   CPU  9:                 3662 events,      172 KiB data
   CPU 10:                 1756 events,       83 KiB data
   CPU 11:                  801 events,       38 KiB data
   CPU 12:                   20 events,        1 KiB data
   CPU 13:                  618 events,       29 KiB data
   CPU 14:                    3 events,        1 KiB data
   CPU 15:                   18 events,        1 KiB data
   Total:                 45042 events (dropped 0),     2112 KiB data



And The blktrace files are there  (for five days) :

http://filex.univ-nantes.fr/get?k=RDxGitXYOf4HKHd7Tan

Hope it can be helpfull,
Thanks,
-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2011-12-22 11:02                   ` Yann Dupont
@ 2012-01-02 10:06                     ` Yann Dupont
  2012-01-02 16:08                       ` Peter Grandi
  2012-01-02 20:35                       ` Dave Chinner
  0 siblings, 2 replies; 20+ messages in thread
From: Yann Dupont @ 2012-01-02 10:06 UTC (permalink / raw)
  To: Yann Dupont; +Cc: stan, xfs

Le 22/12/2011 12:02, Yann Dupont a écrit :
> Le 22/12/2011 10:23, Yann Dupont a écrit :
>>
>>> Can you run a block trace on both kernels (for say five minutes)
>>> when the load differential is showing up and provide that to us so
>>> we can see how the IO patterns are differing?
>
>
> here we go.
>

Hello, happy new year everybody ,

Did someone had time to examine the 2 blktrace ? (and, by chance, can 
see the root cause of the increased load ?)

One of my server is still running 3.1.6. In the coming days I'll see a 
very important load increase (today is still calm). Is there anything I 
can do to go further  ?

Thanks,




-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2012-01-02 10:06                     ` Yann Dupont
@ 2012-01-02 16:08                       ` Peter Grandi
  2012-01-02 18:02                         ` Peter Grandi
  2012-01-04 10:54                         ` Yann Dupont
  2012-01-02 20:35                       ` Dave Chinner
  1 sibling, 2 replies; 20+ messages in thread
From: Peter Grandi @ 2012-01-02 16:08 UTC (permalink / raw)
  To: Linux fs XFS

[ ... ]

>> On two particular server, with recent kernels, I experience a
>> much higher load than expected, but it's very hard to tell
>> what's wrong. The system seems more in I/O wait. Older
>> kernels (2.6.32.xx and 2.6.26.xx) gives better results.
[ ... ]
> When I go back to older kernels, the load go down. With newer
> kernel, all is working well too, but load (as reported by
> uptime) is higher.
[ ... ]
>> birnie:~/TRACE# uptime
>>   11:48:34 up 17:18,  3 users,  load average: 0.04, 0.18, 0.23

>> penderyn:~/TRACE# uptime
>>   11:48:30 up 23 min,  3 users,  load average: 4.03, 3.82, 3.21
[ ... ]

But 'uptime' reports the load average, which is (roughly)
processes actually running on the CPU. If the load average is
higher, that usually means that the file system is running
better, not worse. It looks as if you are not clear whether you
have a regression or an improvement.

For a mail server the relevant metric is messages processed per
second, or alternatively median and maximum times to process a
message, rather than "average" processes running.

[ ... ]

>> As those servers are critical for us, I can't really test,
>> hardly give you more precise numbers, and I don't know how to
>> accurately reproduce this platform to test what's wrong. I
>> know this is NOT a precise bug report and it won't help much.
>> All I can say IS : - read operations seems no slower with
>> recent kernels, backups take approximatively the same time ;
>> - I'd say (but I have no proof) that delivery of new mails
>> takes more time and is more synchronous than before, like
>> nobarrier have no effect.

> Did someone had time to examine the 2 blktrace ? (and, by
> chance, can see the root cause of the increased load ?)

So you are expecting for a large system critical problem for
which you yourself do not have the resource to do testing to see
quick response times over the Christmas and New Year period.
What's your XFS Platinum Psychic Support Account number? :-)

> One of my server is still running 3.1.6. In the coming days
> I'll see a very important load increase (today is still
> calm). Is there anything I can do to go further ?

As it is not clear whether you are complaining about better XFS
performance, it is hard to help.

However you can probably test a bit your systems by running
while things are still calmer Postmark on both machines, as that
reports relevant metrics.

[ ... ]

BTW rereading the description of the setup:

>>>>> Thoses servers are mail (dovecot) servers, with lots of
>>>>> simultaneous imap clients (5000+) an lots of simultaneous
>>>>> message delivery. These are linux-vservers, on top of LVM
>>>>> volumes. The storage is SAN with 15k RPM SAS drives (and
>>>>> battery backup). I know barriers were disabled in older
>>>>> kernels, so with recents kernels, XFS volumes were mounted
>>>>> with nobarrier.

>>>> 1. What mailbox format are you using?  Is this a constant
>>>> or variable?
>>> Maildir++

I am stunned by the sheer (euphemism alert) audacity of it all.
This setup is (euphemism alert) amazing.

However at least it is Linux-VServers, while there are clueless
sysadms who setup mail servers over virtual machines (and
amazingly VMware encourages that for Zimbra, which is a terrible
combination as Zimbra also uses something like Maildir for the
IMAP mailstore). The use of 15k drives is also commendable.

Unfortunately the problem of large busy mailstores is vastly
underestimated by many, and XFS has little to do with it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2012-01-02 16:08                       ` Peter Grandi
@ 2012-01-02 18:02                         ` Peter Grandi
  2012-01-04 10:54                         ` Yann Dupont
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Grandi @ 2012-01-02 18:02 UTC (permalink / raw)
  To: Linux fs XFS


>>> The system seems more in I/O wait. Older
>>> kernels (2.6.32.xx and 2.6.26.xx) gives better results.
> [ ... ]
>> When I go back to older kernels, the load go down. With newer
>> kernel, all is working well too, but load (as reported by
>> uptime) is higher.
> [ ... ]
>>> birnie:~/TRACE# uptime
>>> 11:48:34 up 17:18,  3 users,  load average: 0.04, 0.18, 0.23
>>> penderyn:~/TRACE# uptime
>>> 11:48:30 up 23 min,  3 users,  load average: 4.03, 3.82, 3.21
> [ ... ]

> But 'uptime' reports the load average, which is (roughly)
> processes actually running on the CPU. If the load average is
> higher, that usually means that the file system is running
> better, not worse. It looks as if you are not clear whether you
> have a regression or an improvement.

Perhaps it would be useful to see the output of something like
'iostat -d -x 10' and 'vmstat 10' to see if the load average is
higher because of processes waiting less and running more or
whether it is processes running in 'iowait'. It can help also
using 'htop' with an '.htoprc' like this:

--------------------------------------------------------------
fields=0 48 2 17 38 39 13 14 46 62 63 1 
sort_key=63
sort_direction=1
hide_threads=0
hide_kernel_threads=0
hide_userland_threads=0
shadow_other_users=1
highlight_base_name=1
highlight_megabytes=1
highlight_threads=0
tree_view=0
header_margin=0
detailed_cpu_time=1
color_scheme=2
delay=15
left_meters=Memory AllCPUs 
left_meter_modes=1 1 
right_meters=Swap AllCPUs 
right_meter_modes=1 2 
--------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2012-01-02 10:06                     ` Yann Dupont
  2012-01-02 16:08                       ` Peter Grandi
@ 2012-01-02 20:35                       ` Dave Chinner
  2012-01-03  8:20                         ` Yann Dupont
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2012-01-02 20:35 UTC (permalink / raw)
  To: Yann Dupont; +Cc: stan, xfs

On Mon, Jan 02, 2012 at 11:06:26AM +0100, Yann Dupont wrote:
> Le 22/12/2011 12:02, Yann Dupont a écrit :
> >Le 22/12/2011 10:23, Yann Dupont a écrit :
> >>
> >>>Can you run a block trace on both kernels (for say five minutes)
> >>>when the load differential is showing up and provide that to us so
> >>>we can see how the IO patterns are differing?
> >
> >
> >here we go.
> >
> 
> Hello, happy new year everybody ,
> 
> Did someone had time to examine the 2 blktrace ? (and, by chance,
> can see the root cause of the increased load ?)

I've had a bit of a look, but most peopl ehave been on holidays.

As it is, I can't see any material difference between the traces.
both reads and writes are taking the same amount of time to service,
so I don't think there's any problem here.

I do recall that some years ago that we changed one of the ways we
slept in XFS which meant those blocked IOs contributed to load
average (as tehy are supposed to). That meant that more IO
contributed to the load average (it might have been read related),
so load averages were then higher for exactly the same workloads.

Indeed:

load average: 0.64, 0.15, 0.09

(start 40 concurrent directory traversals w/ unlinks)

(wait a bit)

load average: 39.96, 23.75, 10.06

Yup, that is spot on - 40 processes doing blocking IO.....

So absent any measurable performance problem, I don't think the
change in load average is something to be concerned about.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2012-01-02 20:35                       ` Dave Chinner
@ 2012-01-03  8:20                         ` Yann Dupont
  2012-01-04 12:33                           ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Yann Dupont @ 2012-01-03  8:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: stan, xfs

Le 02/01/2012 21:35, Dave Chinner a écrit :
> On Mon, Jan 02, 2012 at 11:06:26AM +0100, Yann Dupont wrote:

>> Hello, happy new year everybody ,
>>
>> Did someone had time to examine the 2 blktrace ? (and, by chance,
>> can see the root cause of the increased load ?)
>
> I've had a bit of a look, but most peopl ehave been on holidays.

yep, of course, I was too :)

>
> As it is, I can't see any material difference between the traces.
> both reads and writes are taking the same amount of time to service,
> so I don't think there's any problem here.

ok,
>
> I do recall that some years ago that we changed one of the ways we

Do you recall exactly what some years ago means ? Is this post 2.6.26 era ?

> slept in XFS which meant those blocked IOs contributed to load
> average (as tehy are supposed to). That meant that more IO
> contributed to the load average (it might have been read related),
> so load averages were then higher for exactly the same workloads.
>

> Indeed:
>
> load average: 0.64, 0.15, 0.09
>
> (start 40 concurrent directory traversals w/ unlinks)
>
> (wait a bit)
>
> load average: 39.96, 23.75, 10.06
>
> Yup, that is spot on - 40 processes doing blocking IO.....
>
> So absent any measurable performance problem, I don't think the
> change in load average is something to be concerned about.

You're probably right : I have a graph on cacti showing load average 
usage and detailed load usage (System/User/Nice,Wait, etc...). The load 
average is much higher now with 3.1.6 , but the detailed load seems not 
different than before.

And for the moment, in real world usage (that is, storing mail in 
folders and serving imap) the server seems no slower than before.

I'll keep an eye on it during high load.

Thanks for your answer,
Cheers,
-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2012-01-02 16:08                       ` Peter Grandi
  2012-01-02 18:02                         ` Peter Grandi
@ 2012-01-04 10:54                         ` Yann Dupont
  1 sibling, 0 replies; 20+ messages in thread
From: Yann Dupont @ 2012-01-04 10:54 UTC (permalink / raw)
  To: Peter Grandi; +Cc: Linux fs XFS

On 02/01/2012 17:08, Peter Grandi wrote:
> [ ... ]
>
>>> On two particular server, with recent kernels, I experience a
>>> much higher load than expected, but it's very hard to tell
>>> what's wrong. The system seems more in I/O wait. Older
>>> kernels (2.6.32.xx and 2.6.26.xx) gives better results.
> [ ... ]
>> When I go back to older kernels, the load go down. With newer
>> kernel, all is working well too, but load (as reported by
>> uptime) is higher.
> [ ... ]
>>> birnie:~/TRACE# uptime
>>>    11:48:34 up 17:18,  3 users,  load average: 0.04, 0.18, 0.23
>
>>> penderyn:~/TRACE# uptime
>>>    11:48:30 up 23 min,  3 users,  load average: 4.03, 3.82, 3.21
> [ ... ]
>
> But 'uptime' reports the load average, which is (roughly)
> processes actually running on the CPU. If the load average is

More or less. I generally have 5000+ processes on those servers. The 
load generally reflect a mix between CPU usage (which is unchanged as 
dovecot setup is unchanged) and I/O wait. So naively, I'll say if load 
average is higher than usual, that's because I/O WAIT is higher.

As kernel had big changes, it could be XFS, but DM, or I/O scheduler as 
well.

But it don't seems the case.

> higher, that usually means that the file system is running
> better, not worse.

If delivery is I/O bound, yes but that's not the case in this particular 
setup.

  It looks as if you are not clear whether you
> have a regression or an improvement.

I was just signaling an unusual load average, nothing else. As far as I 
can see, response times are still correct. I'm not experiencing a 
performance proble. I'm not the first author of the thread. I probably 
should have changed the name of the thread, sorry for that.
>
> For a mail server the relevant metric is messages processed per
> second, or alternatively median and maximum times to process a
> message, rather than "average" processes running.
>
...
> So you are expecting for a large system critical problem for
> which you yourself do not have the resource to do testing to see
> quick response times over the Christmas and New Year period.
> What's your XFS Platinum Psychic Support Account number? :-)

I'm not expecting anything. I know open source. All is working fine, 
thank you. I was just "upping" because I saw that my traces have been 
downloaded last week. It's not always easy for non native speakers to 
send mails without sounding agressive/offendant . If that was the case,I 
can assure that was not the intent.


>
> BTW rereading the description of the setup:
>
>>>>>> Thoses servers are mail (dovecot) servers, with lots of
>>>>>> simultaneous imap clients (5000+) an lots of simultaneous
>>>>>> message delivery. These are linux-vservers, on top of LVM
>>>>>> volumes. The storage is SAN with 15k RPM SAS drives (and
>>>>>> battery backup). I know barriers were disabled in older
>>>>>> kernels, so with recents kernels, XFS volumes were mounted
>>>>>> with nobarrier.
>
>>>>> 1. What mailbox format are you using?  Is this a constant
>>>>> or variable?
>>>> Maildir++
>
> I am stunned by the sheer (euphemism alert) audacity of it all.
> This setup is (euphemism alert) amazing.

Can you elaborate, please ?? This particular setup is running fine for 7 
years now , has very finely scaled up (up to 70k mailboxes with a 
similar setup for students) with little modifications (replacing 
courrier by dovecot, and upgrading servers for example) and has proved 
very stable since, despite numerous power outages, for example...

I can give you detailed setup if you want, off list, I think it has 
nothing to do with xfs.

>
> Unfortunately the problem of large busy mailstores is vastly
> underestimated by many, and XFS has little to do with it.
>

really not sure I underestimate it, but I'll glad to hear your 
recommendations. Offlist, I think.

Cheers,


-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2012-01-03  8:20                         ` Yann Dupont
@ 2012-01-04 12:33                           ` Christoph Hellwig
  2012-01-04 13:06                             ` Yann Dupont
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2012-01-04 12:33 UTC (permalink / raw)
  To: Yann Dupont; +Cc: stan, xfs

On Tue, Jan 03, 2012 at 09:20:05AM +0100, Yann Dupont wrote:
> >As it is, I can't see any material difference between the traces.
> >both reads and writes are taking the same amount of time to service,
> >so I don't think there's any problem here.
> 
> ok,
> >
> >I do recall that some years ago that we changed one of the ways we
> 
> Do you recall exactly what some years ago means ? Is this post 2.6.26 era ?

The only thing that I remember is Jens switching xfs_buf_wait_unpin from
schedule to io_schedule in "block: remove per-queue plugging", which
went into Linux 2.6.39.  With this processed that wait for buffers to
be unpinned now count towards the load average.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Bad performance with XFS + 2.6.38 / 2.6.39
  2012-01-04 12:33                           ` Christoph Hellwig
@ 2012-01-04 13:06                             ` Yann Dupont
  0 siblings, 0 replies; 20+ messages in thread
From: Yann Dupont @ 2012-01-04 13:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: stan, xfs

On 04/01/2012 13:33, Christoph Hellwig wrote:
> On Tue, Jan 03, 2012 at 09:20:05AM +0100, Yann Dupont wrote:
>>> As it is, I can't see any material difference between the traces.
>>> both reads and writes are taking the same amount of time to service,
>>> so I don't think there's any problem here.
>>
>> ok,
>>>
>>> I do recall that some years ago that we changed one of the ways we
>>
>> Do you recall exactly what some years ago means ? Is this post 2.6.26 era ?
>
> The only thing that I remember is Jens switching xfs_buf_wait_unpin from
> schedule to io_schedule in "block: remove per-queue plugging", which
> went into Linux 2.6.39.  With this processed that wait for buffers to
> be unpinned now count towards the load average.
>

Ok, that's probably the root cause. As I already said, I don't 
experience performance regression right now.

Thanks a lot for the explaination.

Cheers,



-- 
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-01-04 13:09 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-11 12:45 Bad performance with XFS + 2.6.38 / 2.6.39 Xupeng Yun
2011-12-11 23:39 ` Dave Chinner
2011-12-12  0:40   ` Xupeng Yun
2011-12-12  1:00     ` Dave Chinner
2011-12-12  2:00       ` Xupeng Yun
2011-12-12 13:57         ` Christoph Hellwig
2011-12-21  9:08         ` Yann Dupont
2011-12-21 15:10           ` Stan Hoeppner
2011-12-21 17:56             ` Yann Dupont
2011-12-21 22:26               ` Dave Chinner
2011-12-22  9:23                 ` Yann Dupont
2011-12-22 11:02                   ` Yann Dupont
2012-01-02 10:06                     ` Yann Dupont
2012-01-02 16:08                       ` Peter Grandi
2012-01-02 18:02                         ` Peter Grandi
2012-01-04 10:54                         ` Yann Dupont
2012-01-02 20:35                       ` Dave Chinner
2012-01-03  8:20                         ` Yann Dupont
2012-01-04 12:33                           ` Christoph Hellwig
2012-01-04 13:06                             ` Yann Dupont

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.