[regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down

All of lore.kernel.org
 help / color / mirror / Atom feed

* [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down
@ 2011-05-30  2:06 ` Dave Chinner
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2011-05-30  2:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, xfs

Folks,

I just booted up a 3.0-rc1 kernel, and mounted an XFS filesystem
with 50M files in it. Running:

$ for i in /mnt/scratch/*; do sudo /usr/bin/time rm -rf $i 2>&1 & done

runs an 8-way parallel unlink on the files. Normally this runs at
around 80k unlinks/s, and it runs with about 500k-1m dentries and
inodes cached in the steady state.

The steady state behaviour with 3.0-rc1 is that there are around 10m
cached dentries - all negative dentries - consuming about 1.6GB of
RAM (of 4GB total). Previous steady state was, IIRC, around 200MB of
dentries. My initial suspicions are that the dentry unhashing
changeѕ may be the cause of this...

Performance is now a very regular peak/trough patten with a period
of about 20s, where the peak is about 80k unlinks/s, and the trough
is around 20k unlinks/s. The runtime of the 50m inode delete has
gone from around 10m on 2.6.39, to:

11.71user 470.08system 15:07.91elapsed 53%CPU (0avgtext+0avgdata 133184maxresident)k
0inputs+0outputs (30major+497228minor)pagefaults 0swaps
11.50user 468.30system 15:14.35elapsed 52%CPU (0avgtext+0avgdata 133168maxresident)k
0inputs+0outputs (42major+497268minor)pagefaults 0swaps
11.34user 466.66system 15:26.04elapsed 51%CPU (0avgtext+0avgdata 133216maxresident)k
0inputs+0outputs (18major+497121minor)pagefaults 0swaps
12.14user 470.46system 15:26.60elapsed 52%CPU (0avgtext+0avgdata 133216maxresident)k
0inputs+0outputs (44major+497309minor)pagefaults 0swaps
12.06user 463.74system 15:28.84elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
0inputs+0outputs (25major+497046minor)pagefaults 0swaps
11.37user 468.18system 15:29.07elapsed 51%CPU (0avgtext+0avgdata 133184maxresident)k
0inputs+0outputs (55major+497056minor)pagefaults 0swaps
11.69user 474.46system 15:47.45elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
0inputs+0outputs (61major+497284minor)pagefaults 0swaps
11.32user 476.93system 16:05.14elapsed 50%CPU (0avgtext+0avgdata 133184maxresident)k
0inputs+0outputs (30major+497225minor)pagefaults 0swaps

About 16 minutes. I'm not sure yet whether this change of cache
behaviour is the cause of the entire performance regression, but
it's a good chance that it is a contributing factor.

Christoph, it appears that there is a significant increase in log
forces during this unlink workload compared to 2.6.39, and that's
possibly where the performance degradation is coming from. I'm going
to have to bisect, I think.

The 8-way create rate for the 50m inodes is down by 10% as well, but I
don't think that has anything to do with dentry cache behaviour -
log write throughput is up by a factor of 3x over 2.6.39. Christoph,
I think that this is once again due to an increase in log forces,
but I need to do more analysis to be sure...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down
@ 2011-05-30  2:06 ` Dave Chinner
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2011-05-30  2:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, xfs

Folks,

I just booted up a 3.0-rc1 kernel, and mounted an XFS filesystem
with 50M files in it. Running:

$ for i in /mnt/scratch/*; do sudo /usr/bin/time rm -rf $i 2>&1 & done

runs an 8-way parallel unlink on the files. Normally this runs at
around 80k unlinks/s, and it runs with about 500k-1m dentries and
inodes cached in the steady state.

The steady state behaviour with 3.0-rc1 is that there are around 10m
cached dentries - all negative dentries - consuming about 1.6GB of
RAM (of 4GB total). Previous steady state was, IIRC, around 200MB of
dentries. My initial suspicions are that the dentry unhashing
changeѕ may be the cause of this...

Performance is now a very regular peak/trough patten with a period
of about 20s, where the peak is about 80k unlinks/s, and the trough
is around 20k unlinks/s. The runtime of the 50m inode delete has
gone from around 10m on 2.6.39, to:

11.71user 470.08system 15:07.91elapsed 53%CPU (0avgtext+0avgdata 133184maxresident)k
0inputs+0outputs (30major+497228minor)pagefaults 0swaps
11.50user 468.30system 15:14.35elapsed 52%CPU (0avgtext+0avgdata 133168maxresident)k
0inputs+0outputs (42major+497268minor)pagefaults 0swaps
11.34user 466.66system 15:26.04elapsed 51%CPU (0avgtext+0avgdata 133216maxresident)k
0inputs+0outputs (18major+497121minor)pagefaults 0swaps
12.14user 470.46system 15:26.60elapsed 52%CPU (0avgtext+0avgdata 133216maxresident)k
0inputs+0outputs (44major+497309minor)pagefaults 0swaps
12.06user 463.74system 15:28.84elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
0inputs+0outputs (25major+497046minor)pagefaults 0swaps
11.37user 468.18system 15:29.07elapsed 51%CPU (0avgtext+0avgdata 133184maxresident)k
0inputs+0outputs (55major+497056minor)pagefaults 0swaps
11.69user 474.46system 15:47.45elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
0inputs+0outputs (61major+497284minor)pagefaults 0swaps
11.32user 476.93system 16:05.14elapsed 50%CPU (0avgtext+0avgdata 133184maxresident)k
0inputs+0outputs (30major+497225minor)pagefaults 0swaps

About 16 minutes. I'm not sure yet whether this change of cache
behaviour is the cause of the entire performance regression, but
it's a good chance that it is a contributing factor.

Christoph, it appears that there is a significant increase in log
forces during this unlink workload compared to 2.6.39, and that's
possibly where the performance degradation is coming from. I'm going
to have to bisect, I think.

The 8-way create rate for the 50m inodes is down by 10% as well, but I
don't think that has anything to do with dentry cache behaviour -
log write throughput is up by a factor of 3x over 2.6.39. Christoph,
I think that this is once again due to an increase in log forces,
but I need to do more analysis to be sure...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down
  2011-05-30  2:06 ` Dave Chinner
@ 2011-05-30  3:47   ` Dave Chinner
  -1 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2011-05-30  3:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, xfs

On Mon, May 30, 2011 at 12:06:04PM +1000, Dave Chinner wrote:
> Folks,
> 
> I just booted up a 3.0-rc1 kernel, and mounted an XFS filesystem
> with 50M files in it. Running:
> 
> $ for i in /mnt/scratch/*; do sudo /usr/bin/time rm -rf $i 2>&1 & done
> 
> runs an 8-way parallel unlink on the files. Normally this runs at
> around 80k unlinks/s, and it runs with about 500k-1m dentries and
> inodes cached in the steady state.
> 
> The steady state behaviour with 3.0-rc1 is that there are around 10m
> cached dentries - all negative dentries - consuming about 1.6GB of
> RAM (of 4GB total). Previous steady state was, IIRC, around 200MB of
> dentries. My initial suspicions are that the dentry unhashing
> changeѕ may be the cause of this...

So a bisect lands on:

$ git bisect good
79bf7c732b5ff75b96022ed9d29181afd3d2509c is the first bad commit
commit 79bf7c732b5ff75b96022ed9d29181afd3d2509c
Author: Sage Weil <sage@newdream.net>
Date:   Tue May 24 13:06:06 2011 -0700

    vfs: push dentry_unhash on rmdir into file systems

    Only a few file systems need this.  Start by pushing it down into each
    fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
    basis.

    This does not change behavior for any in-tree file systems.

    Acked-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Sage Weil <sage@newdream.net>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

:040000 040000 c45d58718d33f7ca1da87f99fa538f65eaa3fe2c ec71cbecc59e8b142a7bfcabd469fa67486bef30 M        fs

Ok, so the question has to be asked - why wasn't dentry_unhash()
pushed down into XFS?

Further, now that dentry_unhash() has been removed from most
filesystems, what is replacing the shrink_dcache_parent() call that
was cleaning up the "we can never reference again" child dentries of
the unlinked directories? It appears that they are now being left in
memory on the dentry LRU. It also appears that they have
D_REFERENCED bit set, so they do not get immediately reclaimed by
the shrinker.

Hence they much more difficult to remove from memory than in 2.6.39,
and with the rate at which they are being created the shrinker is
simply not aggressive enough to free them at the same rate as in
2.6.39 and hence the memory balance of the caches is significantly
changed.

It would seem to me that we still need the call to
shrink_dcache_parent() for unlinked directories - that part of the
dentry_unhash() still needs to be run to ensure that we don't
pollute memory with stale dentries. The original patch series
suggests that this is a per-filesystem decision; I think this
problem shows that it is really necessary for most filesystems.
So, do i just fix this in XFS, or should I re-add calls to
shrink_dcache_parent() in the VFS for rmdir and rename?

> of about 20s, where the peak is about 80k unlinks/s, and the trough
> is around 20k unlinks/s. The runtime of the 50m inode delete has
> gone from around 10m on 2.6.39, to:
> 
> 11.71user 470.08system 15:07.91elapsed 53%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (30major+497228minor)pagefaults 0swaps
> 11.50user 468.30system 15:14.35elapsed 52%CPU (0avgtext+0avgdata 133168maxresident)k
> 0inputs+0outputs (42major+497268minor)pagefaults 0swaps
> 11.34user 466.66system 15:26.04elapsed 51%CPU (0avgtext+0avgdata 133216maxresident)k
> 0inputs+0outputs (18major+497121minor)pagefaults 0swaps
> 12.14user 470.46system 15:26.60elapsed 52%CPU (0avgtext+0avgdata 133216maxresident)k
> 0inputs+0outputs (44major+497309minor)pagefaults 0swaps
> 12.06user 463.74system 15:28.84elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
> 0inputs+0outputs (25major+497046minor)pagefaults 0swaps
> 11.37user 468.18system 15:29.07elapsed 51%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (55major+497056minor)pagefaults 0swaps
> 11.69user 474.46system 15:47.45elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
> 0inputs+0outputs (61major+497284minor)pagefaults 0swaps
> 11.32user 476.93system 16:05.14elapsed 50%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (30major+497225minor)pagefaults 0swaps
> 
> About 16 minutes. I'm not sure yet whether this change of cache
> behaviour is the cause of the entire performance regression, but
> it's a good chance that it is a contributing factor.

The cache size growth bug does not appear to be responsible for any
of the performance regression.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down
@ 2011-05-30  3:47   ` Dave Chinner
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2011-05-30  3:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, xfs

On Mon, May 30, 2011 at 12:06:04PM +1000, Dave Chinner wrote:
> Folks,
> 
> I just booted up a 3.0-rc1 kernel, and mounted an XFS filesystem
> with 50M files in it. Running:
> 
> $ for i in /mnt/scratch/*; do sudo /usr/bin/time rm -rf $i 2>&1 & done
> 
> runs an 8-way parallel unlink on the files. Normally this runs at
> around 80k unlinks/s, and it runs with about 500k-1m dentries and
> inodes cached in the steady state.
> 
> The steady state behaviour with 3.0-rc1 is that there are around 10m
> cached dentries - all negative dentries - consuming about 1.6GB of
> RAM (of 4GB total). Previous steady state was, IIRC, around 200MB of
> dentries. My initial suspicions are that the dentry unhashing
> changeѕ may be the cause of this...

So a bisect lands on:

$ git bisect good
79bf7c732b5ff75b96022ed9d29181afd3d2509c is the first bad commit
commit 79bf7c732b5ff75b96022ed9d29181afd3d2509c
Author: Sage Weil <sage@newdream.net>
Date:   Tue May 24 13:06:06 2011 -0700

    vfs: push dentry_unhash on rmdir into file systems

    Only a few file systems need this.  Start by pushing it down into each
    fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
    basis.

    This does not change behavior for any in-tree file systems.

    Acked-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Sage Weil <sage@newdream.net>
    Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

:040000 040000 c45d58718d33f7ca1da87f99fa538f65eaa3fe2c ec71cbecc59e8b142a7bfcabd469fa67486bef30 M        fs

Ok, so the question has to be asked - why wasn't dentry_unhash()
pushed down into XFS?

Further, now that dentry_unhash() has been removed from most
filesystems, what is replacing the shrink_dcache_parent() call that
was cleaning up the "we can never reference again" child dentries of
the unlinked directories? It appears that they are now being left in
memory on the dentry LRU. It also appears that they have
D_REFERENCED bit set, so they do not get immediately reclaimed by
the shrinker.

Hence they much more difficult to remove from memory than in 2.6.39,
and with the rate at which they are being created the shrinker is
simply not aggressive enough to free them at the same rate as in
2.6.39 and hence the memory balance of the caches is significantly
changed.

It would seem to me that we still need the call to
shrink_dcache_parent() for unlinked directories - that part of the
dentry_unhash() still needs to be run to ensure that we don't
pollute memory with stale dentries. The original patch series
suggests that this is a per-filesystem decision; I think this
problem shows that it is really necessary for most filesystems.
So, do i just fix this in XFS, or should I re-add calls to
shrink_dcache_parent() in the VFS for rmdir and rename?

> of about 20s, where the peak is about 80k unlinks/s, and the trough
> is around 20k unlinks/s. The runtime of the 50m inode delete has
> gone from around 10m on 2.6.39, to:
> 
> 11.71user 470.08system 15:07.91elapsed 53%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (30major+497228minor)pagefaults 0swaps
> 11.50user 468.30system 15:14.35elapsed 52%CPU (0avgtext+0avgdata 133168maxresident)k
> 0inputs+0outputs (42major+497268minor)pagefaults 0swaps
> 11.34user 466.66system 15:26.04elapsed 51%CPU (0avgtext+0avgdata 133216maxresident)k
> 0inputs+0outputs (18major+497121minor)pagefaults 0swaps
> 12.14user 470.46system 15:26.60elapsed 52%CPU (0avgtext+0avgdata 133216maxresident)k
> 0inputs+0outputs (44major+497309minor)pagefaults 0swaps
> 12.06user 463.74system 15:28.84elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
> 0inputs+0outputs (25major+497046minor)pagefaults 0swaps
> 11.37user 468.18system 15:29.07elapsed 51%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (55major+497056minor)pagefaults 0swaps
> 11.69user 474.46system 15:47.45elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
> 0inputs+0outputs (61major+497284minor)pagefaults 0swaps
> 11.32user 476.93system 16:05.14elapsed 50%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (30major+497225minor)pagefaults 0swaps
> 
> About 16 minutes. I'm not sure yet whether this change of cache
> behaviour is the cause of the entire performance regression, but
> it's a good chance that it is a contributing factor.

The cache size growth bug does not appear to be responsible for any
of the performance regression.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down
  2011-05-30  3:47   ` Dave Chinner
@ 2011-05-30  4:20     ` Sage Weil
  -1 siblings, 0 replies; 16+ messages in thread
From: Sage Weil @ 2011-05-30  4:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3830 bytes --]

Hey Dave,

On Mon, 30 May 2011, Dave Chinner wrote:
> On Mon, May 30, 2011 at 12:06:04PM +1000, Dave Chinner wrote:
> > Folks,
> > 
> > I just booted up a 3.0-rc1 kernel, and mounted an XFS filesystem
> > with 50M files in it. Running:
> > 
> > $ for i in /mnt/scratch/*; do sudo /usr/bin/time rm -rf $i 2>&1 & done
> > 
> > runs an 8-way parallel unlink on the files. Normally this runs at
> > around 80k unlinks/s, and it runs with about 500k-1m dentries and
> > inodes cached in the steady state.
> > 
> > The steady state behaviour with 3.0-rc1 is that there are around 10m
> > cached dentries - all negative dentries - consuming about 1.6GB of
> > RAM (of 4GB total). Previous steady state was, IIRC, around 200MB of
> > dentries. My initial suspicions are that the dentry unhashing
> > changeÿÿ may be the cause of this...
> 
> So a bisect lands on:
> 
> $ git bisect good
> 79bf7c732b5ff75b96022ed9d29181afd3d2509c is the first bad commit
> commit 79bf7c732b5ff75b96022ed9d29181afd3d2509c
> Author: Sage Weil <sage@newdream.net>
> Date:   Tue May 24 13:06:06 2011 -0700
> 
>     vfs: push dentry_unhash on rmdir into file systems
> 
>     Only a few file systems need this.  Start by pushing it down into each
>     fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
>     basis.
> 
>     This does not change behavior for any in-tree file systems.
> 
>     Acked-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Sage Weil <sage@newdream.net>
>     Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> :040000 040000 c45d58718d33f7ca1da87f99fa538f65eaa3fe2c ec71cbecc59e8b142a7bfcabd469fa67486bef30 M        fs
> 
> Ok, so the question has to be asked - why wasn't dentry_unhash()
> pushed down into XFS?

Christoph asked me to leave it out to avoid the push-down + remove 
noise.  I missed it in v1, added it in v2, then took it out again.  
Ultimately that isn't the real problem, though:

> Further, now that dentry_unhash() has been removed from most
> filesystems, what is replacing the shrink_dcache_parent() call that
> was cleaning up the "we can never reference again" child dentries of
> the unlinked directories? It appears that they are now being left in
> memory on the dentry LRU. It also appears that they have
> D_REFERENCED bit set, so they do not get immediately reclaimed by
> the shrinker.

Ah, yeah, that makes sense.  I missed the shrink_dcache_parent side 
effect.  I suspect we just need something like the below?  (Very lightly 
tested!)

Thanks-
sage


From c1fac19b662b02ab4aea98ee2a8d0098bc985bc8 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@newdream.net>
Date: Sun, 29 May 2011 20:35:44 -0700
Subject: [PATCH 1/3] vfs: shrink_dcache_parent before rmdir, dir rename

The dentry_unhash push-down series missed that shink_dcache_parent needs to
be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and
allow efficient dentry reclaim.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sage Weil <sage@newdream.net>
---
 fs/namei.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1ab641f..e2e4e8d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2579,6 +2579,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
 	if (error)
 		goto out;
 
+	shrink_dcache_parent(dentry);
 	error = dir->i_op->rmdir(dir, dentry);
 	if (error)
 		goto out;
@@ -2993,6 +2994,8 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
 	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
 		goto out;
 
+	if (target)
+		shrink_dcache_parent(new_dentry);
 	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (error)
 		goto out;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down
@ 2011-05-30  4:20     ` Sage Weil
  0 siblings, 0 replies; 16+ messages in thread
From: Sage Weil @ 2011-05-30  4:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-fsdevel, linux-kernel, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3830 bytes --]

Hey Dave,

On Mon, 30 May 2011, Dave Chinner wrote:
> On Mon, May 30, 2011 at 12:06:04PM +1000, Dave Chinner wrote:
> > Folks,
> > 
> > I just booted up a 3.0-rc1 kernel, and mounted an XFS filesystem
> > with 50M files in it. Running:
> > 
> > $ for i in /mnt/scratch/*; do sudo /usr/bin/time rm -rf $i 2>&1 & done
> > 
> > runs an 8-way parallel unlink on the files. Normally this runs at
> > around 80k unlinks/s, and it runs with about 500k-1m dentries and
> > inodes cached in the steady state.
> > 
> > The steady state behaviour with 3.0-rc1 is that there are around 10m
> > cached dentries - all negative dentries - consuming about 1.6GB of
> > RAM (of 4GB total). Previous steady state was, IIRC, around 200MB of
> > dentries. My initial suspicions are that the dentry unhashing
> > changeÿÿ may be the cause of this...
> 
> So a bisect lands on:
> 
> $ git bisect good
> 79bf7c732b5ff75b96022ed9d29181afd3d2509c is the first bad commit
> commit 79bf7c732b5ff75b96022ed9d29181afd3d2509c
> Author: Sage Weil <sage@newdream.net>
> Date:   Tue May 24 13:06:06 2011 -0700
> 
>     vfs: push dentry_unhash on rmdir into file systems
> 
>     Only a few file systems need this.  Start by pushing it down into each
>     fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
>     basis.
> 
>     This does not change behavior for any in-tree file systems.
> 
>     Acked-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Sage Weil <sage@newdream.net>
>     Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> :040000 040000 c45d58718d33f7ca1da87f99fa538f65eaa3fe2c ec71cbecc59e8b142a7bfcabd469fa67486bef30 M        fs
> 
> Ok, so the question has to be asked - why wasn't dentry_unhash()
> pushed down into XFS?

Christoph asked me to leave it out to avoid the push-down + remove 
noise.  I missed it in v1, added it in v2, then took it out again.  
Ultimately that isn't the real problem, though:

> Further, now that dentry_unhash() has been removed from most
> filesystems, what is replacing the shrink_dcache_parent() call that
> was cleaning up the "we can never reference again" child dentries of
> the unlinked directories? It appears that they are now being left in
> memory on the dentry LRU. It also appears that they have
> D_REFERENCED bit set, so they do not get immediately reclaimed by
> the shrinker.

Ah, yeah, that makes sense.  I missed the shrink_dcache_parent side 
effect.  I suspect we just need something like the below?  (Very lightly 
tested!)

Thanks-
sage


From c1fac19b662b02ab4aea98ee2a8d0098bc985bc8 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@newdream.net>
Date: Sun, 29 May 2011 20:35:44 -0700
Subject: [PATCH 1/3] vfs: shrink_dcache_parent before rmdir, dir rename

The dentry_unhash push-down series missed that shink_dcache_parent needs to
be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and
allow efficient dentry reclaim.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Sage Weil <sage@newdream.net>
---
 fs/namei.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1ab641f..e2e4e8d 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2579,6 +2579,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
 	if (error)
 		goto out;
 
+	shrink_dcache_parent(dentry);
 	error = dir->i_op->rmdir(dir, dentry);
 	if (error)
 		goto out;
@@ -2993,6 +2994,8 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
 	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
 		goto out;
 
+	if (target)
+		shrink_dcache_parent(new_dentry);
 	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (error)
 		goto out;
-- 
1.7.1

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [git pull] dentry_unhash() breakage
  2011-05-30  4:20     ` Sage Weil
@ 2011-05-30  5:56       ` Al Viro
  -1 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2011-05-30  5:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Dave Chinner, linux-kernel, linux-fsdevel, xfs, Sage Weil

A couple of dentry_unhash fallout fixes

Please, pull from usual place -
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ for-linus

Al Viro (1):
      autofs4: bogus dentry_unhash() added in ->unlink()

Sage Weil (1):
      vfs: shrink_dcache_parent before rmdir, dir rename

 fs/autofs4/root.c |    2 --
 fs/namei.c        |    3 +++
 2 files changed, 3 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [git pull] dentry_unhash() breakage
@ 2011-05-30  5:56       ` Al Viro
  0 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2011-05-30  5:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Sage Weil, linux-kernel, xfs

A couple of dentry_unhash fallout fixes

Please, pull from usual place -
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ for-linus

Al Viro (1):
      autofs4: bogus dentry_unhash() added in ->unlink()

Sage Weil (1):
      vfs: shrink_dcache_parent before rmdir, dir rename

 fs/autofs4/root.c |    2 --
 fs/namei.c        |    3 +++
 2 files changed, 3 insertions(+), 2 deletions(-)

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [git pull] dentry_unhash() breakage
  2011-05-30  5:56       ` Al Viro
@ 2011-05-30  8:59         ` Christoph Hellwig
  -1 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2011-05-30  8:59 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-fsdevel, Sage Weil, linux-kernel, xfs

On Mon, May 30, 2011 at 06:56:01AM +0100, Al Viro wrote:
> A couple of dentry_unhash fallout fixes

Shouldn't we do the shrink_dcache_parent only after a successfull
rmdir or rename?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [git pull] dentry_unhash() breakage
@ 2011-05-30  8:59         ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2011-05-30  8:59 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Sage Weil, Linus Torvalds, linux-kernel, xfs

On Mon, May 30, 2011 at 06:56:01AM +0100, Al Viro wrote:
> A couple of dentry_unhash fallout fixes

Shouldn't we do the shrink_dcache_parent only after a successfull
rmdir or rename?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down
  2011-05-30  2:06 ` Dave Chinner
@ 2011-05-30 10:07   ` Dave Chinner
  -1 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2011-05-30 10:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, xfs

On Mon, May 30, 2011 at 12:06:04PM +1000, Dave Chinner wrote:
.....
> Performance is now a very regular peak/trough patten with a period
> of about 20s, where the peak is about 80k unlinks/s, and the trough
> is around 20k unlinks/s. The runtime of the 50m inode delete has
> gone from around 10m on 2.6.39, to:
> 
> 11.71user 470.08system 15:07.91elapsed 53%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (30major+497228minor)pagefaults 0swaps
> 11.50user 468.30system 15:14.35elapsed 52%CPU (0avgtext+0avgdata 133168maxresident)k
> 0inputs+0outputs (42major+497268minor)pagefaults 0swaps
> 11.34user 466.66system 15:26.04elapsed 51%CPU (0avgtext+0avgdata 133216maxresident)k
> 0inputs+0outputs (18major+497121minor)pagefaults 0swaps
> 12.14user 470.46system 15:26.60elapsed 52%CPU (0avgtext+0avgdata 133216maxresident)k
> 0inputs+0outputs (44major+497309minor)pagefaults 0swaps
> 12.06user 463.74system 15:28.84elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
> 0inputs+0outputs (25major+497046minor)pagefaults 0swaps
> 11.37user 468.18system 15:29.07elapsed 51%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (55major+497056minor)pagefaults 0swaps
> 11.69user 474.46system 15:47.45elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
> 0inputs+0outputs (61major+497284minor)pagefaults 0swaps
> 11.32user 476.93system 16:05.14elapsed 50%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (30major+497225minor)pagefaults 0swaps
> 
> About 16 minutes. I'm not sure yet whether this change of cache
> behaviour is the cause of the entire performance regression, but
> it's a good chance that it is a contributing factor.

I'm not sure about this one, now. I suspect I've got a repeat of a
recent "scrambled RAID controller" problem where it decided to
silently change the BBWC mode on all the LUNS. I've re-run the tests
on 2.6.39-rc4 where I know everything was running fine, and I'm
getting the same results as above and not what I was getting a month
ago.

[ one cold reset of the server and disk array later ]

Yeah, that appears to be the problem.

> Christoph, it appears that there is a significant increase in log
> forces during this unlink workload compared to 2.6.39, and that's
> possibly where the performance degradation is coming from. I'm going
> to have to bisect, I think.
> 
> The 8-way create rate for the 50m inodes is down by 10% as well, but I
> don't think that has anything to do with dentry cache behaviour -
> log write throughput is up by a factor of 3x over 2.6.39. Christoph,
> I think that this is once again due to an increase in log forces,
> but I need to do more analysis to be sure...

The increase in log forces was only a side effect of the slower IO
subsystem resulting in the AIL tail pushing hitting more pinned
buffers and issuing more log forces. Pretty much back to the same
level as previously with the reset raid array. False alarm.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down
@ 2011-05-30 10:07   ` Dave Chinner
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2011-05-30 10:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, xfs

On Mon, May 30, 2011 at 12:06:04PM +1000, Dave Chinner wrote:
.....
> Performance is now a very regular peak/trough patten with a period
> of about 20s, where the peak is about 80k unlinks/s, and the trough
> is around 20k unlinks/s. The runtime of the 50m inode delete has
> gone from around 10m on 2.6.39, to:
> 
> 11.71user 470.08system 15:07.91elapsed 53%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (30major+497228minor)pagefaults 0swaps
> 11.50user 468.30system 15:14.35elapsed 52%CPU (0avgtext+0avgdata 133168maxresident)k
> 0inputs+0outputs (42major+497268minor)pagefaults 0swaps
> 11.34user 466.66system 15:26.04elapsed 51%CPU (0avgtext+0avgdata 133216maxresident)k
> 0inputs+0outputs (18major+497121minor)pagefaults 0swaps
> 12.14user 470.46system 15:26.60elapsed 52%CPU (0avgtext+0avgdata 133216maxresident)k
> 0inputs+0outputs (44major+497309minor)pagefaults 0swaps
> 12.06user 463.74system 15:28.84elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
> 0inputs+0outputs (25major+497046minor)pagefaults 0swaps
> 11.37user 468.18system 15:29.07elapsed 51%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (55major+497056minor)pagefaults 0swaps
> 11.69user 474.46system 15:47.45elapsed 51%CPU (0avgtext+0avgdata 133232maxresident)k
> 0inputs+0outputs (61major+497284minor)pagefaults 0swaps
> 11.32user 476.93system 16:05.14elapsed 50%CPU (0avgtext+0avgdata 133184maxresident)k
> 0inputs+0outputs (30major+497225minor)pagefaults 0swaps
> 
> About 16 minutes. I'm not sure yet whether this change of cache
> behaviour is the cause of the entire performance regression, but
> it's a good chance that it is a contributing factor.

I'm not sure about this one, now. I suspect I've got a repeat of a
recent "scrambled RAID controller" problem where it decided to
silently change the BBWC mode on all the LUNS. I've re-run the tests
on 2.6.39-rc4 where I know everything was running fine, and I'm
getting the same results as above and not what I was getting a month
ago.

[ one cold reset of the server and disk array later ]

Yeah, that appears to be the problem.

> Christoph, it appears that there is a significant increase in log
> forces during this unlink workload compared to 2.6.39, and that's
> possibly where the performance degradation is coming from. I'm going
> to have to bisect, I think.
> 
> The 8-way create rate for the 50m inodes is down by 10% as well, but I
> don't think that has anything to do with dentry cache behaviour -
> log write throughput is up by a factor of 3x over 2.6.39. Christoph,
> I think that this is once again due to an increase in log forces,
> but I need to do more analysis to be sure...

The increase in log forces was only a side effect of the slower IO
subsystem resulting in the AIL tail pushing hitting more pinned
buffers and issuing more log forces. Pretty much back to the same
level as previously with the reset raid array. False alarm.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [git pull] dentry_unhash() breakage
  2011-05-30  8:59         ` Christoph Hellwig
@ 2011-05-31 16:26           ` Sage Weil
  -1 siblings, 0 replies; 16+ messages in thread
From: Sage Weil @ 2011-05-31 16:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Linus Torvalds, linux-fsdevel, linux-kernel, xfs

On Mon, 30 May 2011, Christoph Hellwig wrote:
> On Mon, May 30, 2011 at 06:56:01AM +0100, Al Viro wrote:
> > A couple of dentry_unhash fallout fixes
> 
> Shouldn't we do the shrink_dcache_parent only after a successfull
> rmdir or rename?

Yeah, that makes more sense to me...

sage


>From 4e9be5f3fc5f9995b0b1966cda95bb5386e20444 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@newdream.net>
Date: Tue, 31 May 2011 09:26:13 -0700
Subject: [PATCH] vfs: shrink_dcache_parent on rmdir, dir rename only on success

Only prune the dentries of the rmdir or dir rename succeeds.  Doing so on
failure makes no sense (though it's mostly harmless).

Signed-off-by: Sage Weil <sage@newdream.net>
---
 fs/namei.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e2e4e8d..72b0370 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2579,11 +2579,11 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
 	if (error)
 		goto out;
 
-	shrink_dcache_parent(dentry);
 	error = dir->i_op->rmdir(dir, dentry);
 	if (error)
 		goto out;
 
+	shrink_dcache_parent(dentry);
 	dentry->d_inode->i_flags |= S_DEAD;
 	dont_mount(dentry);
 
@@ -2994,13 +2994,12 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
 	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
 		goto out;
 
-	if (target)
-		shrink_dcache_parent(new_dentry);
 	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (error)
 		goto out;
 
 	if (target) {
+		shrink_dcache_parent(new_dentry);
 		target->i_flags |= S_DEAD;
 		dont_mount(new_dentry);
 	}
-- 
1.7.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [git pull] dentry_unhash() breakage
@ 2011-05-31 16:26           ` Sage Weil
  0 siblings, 0 replies; 16+ messages in thread
From: Sage Weil @ 2011-05-31 16:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel, xfs, Linus Torvalds, Al Viro, linux-kernel

On Mon, 30 May 2011, Christoph Hellwig wrote:
> On Mon, May 30, 2011 at 06:56:01AM +0100, Al Viro wrote:
> > A couple of dentry_unhash fallout fixes
> 
> Shouldn't we do the shrink_dcache_parent only after a successfull
> rmdir or rename?

Yeah, that makes more sense to me...

sage


>From 4e9be5f3fc5f9995b0b1966cda95bb5386e20444 Mon Sep 17 00:00:00 2001
From: Sage Weil <sage@newdream.net>
Date: Tue, 31 May 2011 09:26:13 -0700
Subject: [PATCH] vfs: shrink_dcache_parent on rmdir, dir rename only on success

Only prune the dentries of the rmdir or dir rename succeeds.  Doing so on
failure makes no sense (though it's mostly harmless).

Signed-off-by: Sage Weil <sage@newdream.net>
---
 fs/namei.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index e2e4e8d..72b0370 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2579,11 +2579,11 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
 	if (error)
 		goto out;
 
-	shrink_dcache_parent(dentry);
 	error = dir->i_op->rmdir(dir, dentry);
 	if (error)
 		goto out;
 
+	shrink_dcache_parent(dentry);
 	dentry->d_inode->i_flags |= S_DEAD;
 	dont_mount(dentry);
 
@@ -2994,13 +2994,12 @@ static int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
 	if (d_mountpoint(old_dentry) || d_mountpoint(new_dentry))
 		goto out;
 
-	if (target)
-		shrink_dcache_parent(new_dentry);
 	error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (error)
 		goto out;
 
 	if (target) {
+		shrink_dcache_parent(new_dentry);
 		target->i_flags |= S_DEAD;
 		dont_mount(new_dentry);
 	}
-- 
1.7.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [git pull] dentry_unhash() breakage
  2011-05-31 16:26           ` Sage Weil
@ 2011-05-31 17:33             ` Al Viro
  -1 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2011-05-31 17:33 UTC (permalink / raw)
  To: Sage Weil
  Cc: Christoph Hellwig, Linus Torvalds, linux-fsdevel, linux-kernel, xfs

On Tue, May 31, 2011 at 09:26:52AM -0700, Sage Weil wrote:
> On Mon, 30 May 2011, Christoph Hellwig wrote:
> > On Mon, May 30, 2011 at 06:56:01AM +0100, Al Viro wrote:
> > > A couple of dentry_unhash fallout fixes
> > 
> > Shouldn't we do the shrink_dcache_parent only after a successfull
> > rmdir or rename?
> 
> Yeah, that makes more sense to me...

No, it doesn't.  Let's keep changes to minimum.  I *really* don't want
to audit autofs and hell knows how many other places for more or less
subtle breakage caused by that.  It's bad enough as it is...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [git pull] dentry_unhash() breakage
@ 2011-05-31 17:33             ` Al Viro
  0 siblings, 0 replies; 16+ messages in thread
From: Al Viro @ 2011-05-31 17:33 UTC (permalink / raw)
  To: Sage Weil
  Cc: Christoph Hellwig, linux-fsdevel, Linus Torvalds, linux-kernel, xfs

On Tue, May 31, 2011 at 09:26:52AM -0700, Sage Weil wrote:
> On Mon, 30 May 2011, Christoph Hellwig wrote:
> > On Mon, May 30, 2011 at 06:56:01AM +0100, Al Viro wrote:
> > > A couple of dentry_unhash fallout fixes
> > 
> > Shouldn't we do the shrink_dcache_parent only after a successfull
> > rmdir or rename?
> 
> Yeah, that makes more sense to me...

No, it doesn't.  Let's keep changes to minimum.  I *really* don't want
to audit autofs and hell knows how many other places for more or less
subtle breakage caused by that.  It's bad enough as it is...

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-05-31 17:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-30  2:06 [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down Dave Chinner
2011-05-30  2:06 ` Dave Chinner
2011-05-30  3:47 ` Dave Chinner
2011-05-30  3:47   ` Dave Chinner
2011-05-30  4:20   ` Sage Weil
2011-05-30  4:20     ` Sage Weil
2011-05-30  5:56     ` [git pull] dentry_unhash() breakage Al Viro
2011-05-30  5:56       ` Al Viro
2011-05-30  8:59       ` Christoph Hellwig
2011-05-30  8:59         ` Christoph Hellwig
2011-05-31 16:26         ` Sage Weil
2011-05-31 16:26           ` Sage Weil
2011-05-31 17:33           ` Al Viro
2011-05-31 17:33             ` Al Viro
2011-05-30 10:07 ` [regression, 3.0-rc1] dentry cache growth during unlinks, XFS performance way down Dave Chinner
2011-05-30 10:07   ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.