* [PATCH] xfs: implement ->dirty_inode callout
@ 2009-06-23 15:38 Felix Blyakher
2009-06-23 21:30 ` Christoph Hellwig
2009-06-29 8:14 ` Michael Weissenbacher
0 siblings, 2 replies; 3+ messages in thread
From: Felix Blyakher @ 2009-06-23 15:38 UTC (permalink / raw)
To: xfs mailing list
I'd like to (re)propose Dave's patch from the last October to
address the problem of atime never making to the disk. Many
people complained about it.
I may have slightly adjusted the patch to fit the latest
code, and verified it addresses the issue.
The reference to the discussion on this matter on xfs mailing
list is here:
http://oss.sgi.com/archives/xfs/2008-10/msg02102.html
---
From Dave Chinner <david@fromorbit.com>
XFS: Implement ->dirty_inode callout
Hook up ->dirty_inode so that when the VFS dirties an inode we
can mark the XFS inode as "dirty with unlogged changes". This
allows events such as touch_atime() to propagate the dirty state
right through to XFS so it gets written back to disk.
Signed-off-by: Dave Chinner <david@fromorbit.com>
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 7ec89fc..ee6863e 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -215,7 +215,6 @@ xfs_setfilesize(
if (ip->i_d.di_size < isize) {
ip->i_d.di_size = isize;
- ip->i_update_core = 1;
ip->i_update_size = 1;
xfs_mark_inode_dirty_sync(ip);
}
diff --git a/fs/xfs/linux-2.6/xfs_iops.c b/fs/xfs/linux-2.6/xfs_iops.c
index 58973bb..7a51612 100644
--- a/fs/xfs/linux-2.6/xfs_iops.c
+++ b/fs/xfs/linux-2.6/xfs_iops.c
@@ -120,19 +120,11 @@ xfs_ichgtime(
}
/*
- * We update the i_update_core field _after_ changing
- * the timestamps in order to coordinate properly with
- * xfs_iflush() so that we don't lose timestamp updates.
- * This keeps us from having to hold the inode lock
- * while doing this. We use the SYNCHRONIZE macro to
- * ensure that the compiler does not reorder the update
- * of i_update_core above the timestamp updates above.
+ * Update complete - now make sure everyone knows that the inode
+ * is dirty.
*/
- if (sync_it) {
- SYNCHRONIZE();
- ip->i_update_core = 1;
+ if (sync_it)
xfs_mark_inode_dirty_sync(ip);
- }
}
/*
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 36fb8a6..3cc9ef2 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -973,6 +973,28 @@ xfs_fs_inode_init_once(
}
/*
+ * Dirty the XFS inode when mark_inode_dirty_sync() is called so that
+ * we catch unlogged VFS level updates to the inode. Care must be taken
+ * here - the transaction code calls mark_inode_dirty_sync() to mark
the
+ * VFS inode dirty in a transaction and clears the i_update_core field;
+ * it must clear the field after calling mark_inode_dirty_sync() to
+ * correctly indicate that the dirty state has been propagated into the
+ * inode log item.
+ *
+ * We need the barrier() to maintain correct ordering between unlogged
+ * updates and the transaction commit code that clears the
i_update_core
+ * field. This requires all updates to be completed before marking the
+ * inode dirty.
+ */
+STATIC void
+xfs_fs_dirty_inode(
+ struct inode *inode)
+{
+ barrier();
+ XFS_I(inode)->i_update_core = 1;
+}
+
+/*
* Attempt to flush the inode, this will actually fail
* if the inode is pinned, but we dirty the inode again
* at the point when it is unpinned after a log write,
@@ -1546,6 +1568,7 @@ xfs_fs_get_sb(
static struct super_operations xfs_super_operations = {
.alloc_inode = xfs_fs_alloc_inode,
.destroy_inode = xfs_fs_destroy_inode,
+ .dirty_inode = xfs_fs_dirty_inode,
.write_inode = xfs_fs_write_inode,
.clear_inode = xfs_fs_clear_inode,
.put_super = xfs_fs_put_super,
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 977c4ae..43c31c0 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -232,6 +232,15 @@ xfs_inode_item_format(
nvecs = 1;
/*
+ * Make sure the linux inode is dirty. We do this before
+ * clearing i_update_core as the VFS will call back into
+ * XFS here and set i_update_core, so we need to dirty the
+ * inode first so that the ordering of i_update_core and
+ * unlogged modifications still works as described below.
+ */
+ xfs_mark_inode_dirty_sync(ip);
+
+ /*
* Clear i_update_core if the timestamps (or any other
* non-transactional modification) need flushing/logging
* and we're about to log them with the rest of the core.
@@ -275,11 +284,6 @@ xfs_inode_item_format(
*/
xfs_synchronize_atime(ip);
- /*
- * make sure the linux inode is dirty
- */
- xfs_mark_inode_dirty_sync(ip);
-
vecp->i_addr = (xfs_caddr_t)&ip->i_d;
vecp->i_len = sizeof(struct xfs_icdinode);
XLOG_VEC_SET_TYPE(vecp, XLOG_REG_TYPE_ICORE);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: implement ->dirty_inode callout
2009-06-23 15:38 [PATCH] xfs: implement ->dirty_inode callout Felix Blyakher
@ 2009-06-23 21:30 ` Christoph Hellwig
2009-06-29 8:14 ` Michael Weissenbacher
1 sibling, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2009-06-23 21:30 UTC (permalink / raw)
To: Felix Blyakher; +Cc: xfs mailing list
On Tue, Jun 23, 2009 at 10:38:43AM -0500, Felix Blyakher wrote:
> I'd like to (re)propose Dave's patch from the last October to
> address the problem of atime never making to the disk. Many
> people complained about it.
> I may have slightly adjusted the patch to fit the latest
> code, and verified it addresses the issue.
>
> The reference to the discussion on this matter on xfs mailing
> list is here:
>
> http://oss.sgi.com/archives/xfs/2008-10/msg02102.html
Can you run some benchmarks to see what impact it has with the
new relatime default?
The only places where we actual look at i_update_core are fsync and
the decision wether to flush out the inode fully in reclaim and
I we should sync out the inode there.
The VFS ends up calling into ->dirty_inode from the following places:
- set_page_dirty
- mark_buffer_dirty
- touch_atime
- file_update_time
- generic_file_direct_write
And it doesn't distinguish between setting I_DIRTY_SYNC and
I_DIRTY_DATASYNC which means we'll also get it for the first two
events which only affect the data and not the inode core.
I'd be much more comfortable if we'd pass down those flags to
->dirty_inode and optimize based on that.
On the bright side this patch will also allow us to hack around the VFS
layering violation which currently causes missed c/mtime updates on
mmaped writes.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: implement ->dirty_inode callout
2009-06-23 15:38 [PATCH] xfs: implement ->dirty_inode callout Felix Blyakher
2009-06-23 21:30 ` Christoph Hellwig
@ 2009-06-29 8:14 ` Michael Weissenbacher
1 sibling, 0 replies; 3+ messages in thread
From: Michael Weissenbacher @ 2009-06-29 8:14 UTC (permalink / raw)
To: Felix Blyakher; +Cc: xfs mailing list
Hi Felix!
> I'd like to (re)propose Dave's patch from the last October to
> address the problem of atime never making to the disk. Many
> people complained about it.
Tested here on 2.6.30 & works perfectly. I noticed that "strictatime" and "relatime" mounts options can't be used with XFS - had to change the default directly in fs/namespace.c.
> Can you run some benchmarks to see what impact it has with the
> new relatime default?
I did some benchmarks here and wasn't able to see any significant differences, even when atime is fully enabled. The tests were done with kernel 2.6.30 on a VMWare ESXi Server 3.5.0. Real Hardware is a dual Quad Xeon 2.66GHz, 6-disk 146GB 15k RAID5 on PERC6 with 128MB BBU cache.
xfs_info of test fs
--------------------
meta-data=/dev/sda3 isize=256 agcount=4, agsize=2365440 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=9461760, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
bonnie++ results
----------------
bonnie_relatime_nopatch.txt
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
gentoo-x64-xfste 2G 1261 97 191145 34 95765 27 2951 97 234814 22 1169 17
Latency 12375us 271ms 927ms 11581us 17640us 54677us
Version 1.93c ------Sequential Create------ --------Random Create--------
gentoo-x64-xfstest -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
32:100000:10/64 3951 42 1842 26 15114 87 4240 44 170 2 7004 57
Latency 622ms 19046us 156ms 543ms 106ms 1080ms
1.93c,1.93c,gentoo-x64-xfstest,1,1246251173,2G,,1261,97,191145,34,95765,27,2951,97,234814,22,1169,17,32,100000,10,,64,3951,42,1842,26,15114,87,4240,44,170,2,7004,57,12375us,271ms,927ms,11581us,17640us,54677us,622ms,19046us,156ms,543ms,106ms,1080ms
bonnie_relatime_withpach.txt
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
gentoo-x64-xfste 2G 1271 99 284571 57 120562 27 2925 98 234804 24 1227 19
Latency 12436us 429ms 145ms 13323us 11892us 48055us
Version 1.93c ------Sequential Create------ --------Random Create--------
gentoo-x64-xfstest -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
32:100000:10/64 3178 46 2308 32 17881 96 4451 54 171 2 7380 60
Latency 573ms 17378us 28025us 77271us 326ms 928ms
1.93c,1.93c,gentoo-x64-xfstest,1,1245948051,2G,,1271,99,284571,57,120562,27,2925,98,234804,24,1227,19,32,100000,10,,64,3178,46,2308,32,17881,96,4451,54,171,2,7380,60,12436us,429ms,145ms,13323us,11892us,48055us,573ms,17378us,28025us,77271us,326ms,928ms
bonnie_atime_withpatch.txt
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
gentoo-x64-xfste 2G 1309 99 166258 21 117374 29 3015 98 223034 25 1182 22
Latency 8626us 496ms 23729us 9696us 28558us 43342us
Version 1.93c ------Sequential Create------ --------Random Create--------
gentoo-x64-xfstest -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
32:100000:10/64 3696 52 1806 27 16913 89 4072 60 167 2 6838 58
Latency 600ms 21008us 26798us 55639us 1068ms 1045ms
1.93c,1.93c,gentoo-x64-xfstest,1,1245923465,2G,,1309,99,166258,21,117374,29,3015,98,223034,25,1182,22,32,100000,10,,64,3696,52,1806,27,16913,89,4072,60,167,2,6838,58,8626us,496ms,23729us,9696us,28558us,43342us,600ms,21008us,26798us,55639us,1068ms,1045ms
bonnie_noatime.txt
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
gentoo-x64-xfste 2G 1302 99 312421 52 115066 29 3121 99 243930 24 1282 18
Latency 7750us 562ms 24417us 10083us 12990us 50122us
Version 1.93c ------Sequential Create------ --------Random Create--------
gentoo-x64-xfstest -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
32:100000:10/64 3917 55 1942 28 16849 91 4043 59 172 2 7705 63
Latency 312ms 105ms 3530us 130ms 159ms 690ms
1.93c,1.93c,gentoo-x64-xfstest,1,1245922347,2G,,1302,99,312421,52,115066,29,3121,99,243930,24,1282,18,32,100000,10,,64,3917,55,1942,28,16849,91,4043,59,172,2,7705,63,7750us,562ms,24417us,10083us,12990us,50122us,312ms,105ms,3530us,130ms,159ms,690ms
hth,
Michael
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-06-29 8:13 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-23 15:38 [PATCH] xfs: implement ->dirty_inode callout Felix Blyakher
2009-06-23 21:30 ` Christoph Hellwig
2009-06-29 8:14 ` Michael Weissenbacher
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.