All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] fs: new inode i_state corruption fix
@ 2009-03-05  6:45 Nick Piggin
  2009-03-05 10:00 ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2009-03-05  6:45 UTC (permalink / raw)
  To: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton
  Cc: Jorge Boncompte [DTI2], Adrian Hunter, Jan Kara, stable


There was a report of a data corruption http://lkml.org/lkml/2008/11/14/121.
There is a script included to reproduce the problem.

During testing, I encountered a number of strange things with ext3, so I
tried ext2 to attempt to reduce complexity of the problem. I found that
fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
cleared, even though instrumentation showed that unlock_new_inode had
already been called for that inode. This points to memory scribble, or
synchronisation problme.

i_state of I_NEW inodes is not protected by inode_lock because other
processes are not supposed to touch them until I_LOCK (and I_NEW) is
cleared. Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
i_state revealed that generic_sync_sb_inodes is picking up new inodes
from the inode lists and passing them to __writeback_single_inode without
waiting for I_NEW. Subsequently modifying i_state causes corruption. In
my case it would look like this:

CPU0                            CPU1
unlock_new_inode()              __sync_single_inode()
 reg <- inode->i_state
 reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
 reg -> inode->i_state          reg -> reg | I_SYNC
                                reg -> inode->i_state

Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

Fix for this is rather than wait for I_NEW inodes, just skip over them:
inodes concurrently being created are not subject to data integrity
operations, and should not significantly contribute to dirty memory either.

After this change, I'm unable to reproduce any of the added warnings or hangs
after ~1hour of running. Previously, the new warnings would start immediately
and hang would happen in under 5 minutes.

I'm also testing on ext3 now, and so far no problems there either. I don't
know whether this fixes the problem reported above, but it fixes a real
problem for me.

Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Cc: Adrian Hunter <ext-adrian.hunter@nokia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c	2009-03-05 14:08:11.000000000 +1100
+++ linux-2.6/fs/inode.c	2009-03-05 17:20:35.000000000 +1100
@@ -359,6 +359,7 @@ static int invalidate_list(struct list_h
 		invalidate_inode_buffers(inode);
 		if (!atomic_read(&inode->i_count)) {
 			list_move(&inode->i_list, dispose);
+			WARN_ON(inode->i_state & I_NEW);
 			inode->i_state |= I_FREEING;
 			count++;
 			continue;
@@ -460,6 +461,7 @@ static void prune_icache(int nr_to_scan)
 				continue;
 		}
 		list_move(&inode->i_list, &freeable);
+		WARN_ON(inode->i_state & I_NEW);
 		inode->i_state |= I_FREEING;
 		nr_pruned++;
 	}
@@ -656,6 +658,7 @@ void unlock_new_inode(struct inode *inod
 	 * just created it (so there can be no old holders
 	 * that haven't tested I_LOCK).
 	 */
+	WARN_ON((inode->i_state & (I_LOCK|I_NEW)) != (I_LOCK|I_NEW));
 	inode->i_state &= ~(I_LOCK|I_NEW);
 	wake_up_inode(inode);
 }
@@ -1145,6 +1148,7 @@ void generic_delete_inode(struct inode *
 
 	list_del_init(&inode->i_list);
 	list_del_init(&inode->i_sb_list);
+	WARN_ON(inode->i_state & I_NEW);
 	inode->i_state |= I_FREEING;
 	inodes_stat.nr_inodes--;
 	spin_unlock(&inode_lock);
@@ -1186,16 +1190,19 @@ static void generic_forget_inode(struct 
 			spin_unlock(&inode_lock);
 			return;
 		}
+		WARN_ON(inode->i_state & I_NEW);
 		inode->i_state |= I_WILL_FREE;
 		spin_unlock(&inode_lock);
 		write_inode_now(inode, 1);
 		spin_lock(&inode_lock);
+		WARN_ON(inode->i_state & I_NEW);
 		inode->i_state &= ~I_WILL_FREE;
 		inodes_stat.nr_unused--;
 		hlist_del_init(&inode->i_hash);
 	}
 	list_del_init(&inode->i_list);
 	list_del_init(&inode->i_sb_list);
+	WARN_ON(inode->i_state & I_NEW);
 	inode->i_state |= I_FREEING;
 	inodes_stat.nr_inodes--;
 	spin_unlock(&inode_lock);
Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c	2009-03-05 16:33:22.000000000 +1100
+++ linux-2.6/fs/fs-writeback.c	2009-03-05 17:17:59.000000000 +1100
@@ -274,6 +274,7 @@ __sync_single_inode(struct inode *inode,
 	int ret;
 
 	BUG_ON(inode->i_state & I_SYNC);
+	WARN_ON(inode->i_state & I_NEW);
 
 	/* Set I_SYNC, reset I_DIRTY */
 	dirty = inode->i_state & I_DIRTY;
@@ -298,6 +299,7 @@ __sync_single_inode(struct inode *inode,
 	}
 
 	spin_lock(&inode_lock);
+	WARN_ON(inode->i_state & I_NEW);
 	inode->i_state &= ~I_SYNC;
 	if (!(inode->i_state & I_FREEING)) {
 		if (!(inode->i_state & I_DIRTY) &&
@@ -470,6 +472,11 @@ void generic_sync_sb_inodes(struct super
 			break;
 		}
 
+		if (inode->i_state & I_NEW) {
+			requeue_io(inode);
+			continue;
+		}
+
 		if (wbc->nonblocking && bdi_write_congested(bdi)) {
 			wbc->encountered_congestion = 1;
 			if (!sb_is_blkdev_sb(sb))
@@ -531,7 +538,7 @@ void generic_sync_sb_inodes(struct super
 		list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
 			struct address_space *mapping;
 
-			if (inode->i_state & (I_FREEING|I_WILL_FREE))
+			if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
 				continue;
 			mapping = inode->i_mapping;
 			if (mapping->nrpages == 0)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: new inode i_state corruption fix
  2009-03-05  6:45 [patch] fs: new inode i_state corruption fix Nick Piggin
@ 2009-03-05 10:00 ` Jan Kara
  2009-03-05 10:16   ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2009-03-05 10:00 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, Jan Kara, stable

On Thu 05-03-09 07:45:54, Nick Piggin wrote:
> 
> There was a report of a data corruption http://lkml.org/lkml/2008/11/14/121.
> There is a script included to reproduce the problem.
> 
> During testing, I encountered a number of strange things with ext3, so I
> tried ext2 to attempt to reduce complexity of the problem. I found that
> fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
> cleared, even though instrumentation showed that unlock_new_inode had
> already been called for that inode. This points to memory scribble, or
> synchronisation problme.
> 
> i_state of I_NEW inodes is not protected by inode_lock because other
> processes are not supposed to touch them until I_LOCK (and I_NEW) is
> cleared. Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
> i_state revealed that generic_sync_sb_inodes is picking up new inodes
> from the inode lists and passing them to __writeback_single_inode without
> waiting for I_NEW. Subsequently modifying i_state causes corruption. In
> my case it would look like this:
  Good catch.

> CPU0                            CPU1
> unlock_new_inode()              __sync_single_inode()
>  reg <- inode->i_state
>  reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
>  reg -> inode->i_state          reg -> reg | I_SYNC
>                                 reg -> inode->i_state
> 
> Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.
> 
> Fix for this is rather than wait for I_NEW inodes, just skip over them:
> inodes concurrently being created are not subject to data integrity
> operations, and should not significantly contribute to dirty memory either.
> 
> After this change, I'm unable to reproduce any of the added warnings or hangs
> after ~1hour of running. Previously, the new warnings would start immediately
> and hang would happen in under 5 minutes.
  A quick grep seems to indicate that you've still missed a few cases,
haven't you? I still see the same problem in
drop_caches.c:drop_pagecache_sb() scanning, inode.c:invalidate_inodes()
scanning, and dquot.c:add_dquot_ref() scanning.
  Otherwise the patch looks fine.

> I'm also testing on ext3 now, and so far no problems there either. I don't
> know whether this fixes the problem reported above, but it fixes a real
> problem for me.
> 
> Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
> Cc: Adrian Hunter <ext-adrian.hunter@nokia.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: stable@kernel.org
> Signed-off-by: Nick Piggin <npiggin@suse.de>

									Honza

> 
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c	2009-03-05 14:08:11.000000000 +1100
> +++ linux-2.6/fs/inode.c	2009-03-05 17:20:35.000000000 +1100
> @@ -359,6 +359,7 @@ static int invalidate_list(struct list_h
>  		invalidate_inode_buffers(inode);
>  		if (!atomic_read(&inode->i_count)) {
>  			list_move(&inode->i_list, dispose);
> +			WARN_ON(inode->i_state & I_NEW);
>  			inode->i_state |= I_FREEING;
>  			count++;
>  			continue;
> @@ -460,6 +461,7 @@ static void prune_icache(int nr_to_scan)
>  				continue;
>  		}
>  		list_move(&inode->i_list, &freeable);
> +		WARN_ON(inode->i_state & I_NEW);
>  		inode->i_state |= I_FREEING;
>  		nr_pruned++;
>  	}
> @@ -656,6 +658,7 @@ void unlock_new_inode(struct inode *inod
>  	 * just created it (so there can be no old holders
>  	 * that haven't tested I_LOCK).
>  	 */
> +	WARN_ON((inode->i_state & (I_LOCK|I_NEW)) != (I_LOCK|I_NEW));
>  	inode->i_state &= ~(I_LOCK|I_NEW);
>  	wake_up_inode(inode);
>  }
> @@ -1145,6 +1148,7 @@ void generic_delete_inode(struct inode *
>  
>  	list_del_init(&inode->i_list);
>  	list_del_init(&inode->i_sb_list);
> +	WARN_ON(inode->i_state & I_NEW);
>  	inode->i_state |= I_FREEING;
>  	inodes_stat.nr_inodes--;
>  	spin_unlock(&inode_lock);
> @@ -1186,16 +1190,19 @@ static void generic_forget_inode(struct 
>  			spin_unlock(&inode_lock);
>  			return;
>  		}
> +		WARN_ON(inode->i_state & I_NEW);
>  		inode->i_state |= I_WILL_FREE;
>  		spin_unlock(&inode_lock);
>  		write_inode_now(inode, 1);
>  		spin_lock(&inode_lock);
> +		WARN_ON(inode->i_state & I_NEW);
>  		inode->i_state &= ~I_WILL_FREE;
>  		inodes_stat.nr_unused--;
>  		hlist_del_init(&inode->i_hash);
>  	}
>  	list_del_init(&inode->i_list);
>  	list_del_init(&inode->i_sb_list);
> +	WARN_ON(inode->i_state & I_NEW);
>  	inode->i_state |= I_FREEING;
>  	inodes_stat.nr_inodes--;
>  	spin_unlock(&inode_lock);
> Index: linux-2.6/fs/fs-writeback.c
> ===================================================================
> --- linux-2.6.orig/fs/fs-writeback.c	2009-03-05 16:33:22.000000000 +1100
> +++ linux-2.6/fs/fs-writeback.c	2009-03-05 17:17:59.000000000 +1100
> @@ -274,6 +274,7 @@ __sync_single_inode(struct inode *inode,
>  	int ret;
>  
>  	BUG_ON(inode->i_state & I_SYNC);
> +	WARN_ON(inode->i_state & I_NEW);
>  
>  	/* Set I_SYNC, reset I_DIRTY */
>  	dirty = inode->i_state & I_DIRTY;
> @@ -298,6 +299,7 @@ __sync_single_inode(struct inode *inode,
>  	}
>  
>  	spin_lock(&inode_lock);
> +	WARN_ON(inode->i_state & I_NEW);
>  	inode->i_state &= ~I_SYNC;
>  	if (!(inode->i_state & I_FREEING)) {
>  		if (!(inode->i_state & I_DIRTY) &&
> @@ -470,6 +472,11 @@ void generic_sync_sb_inodes(struct super
>  			break;
>  		}
>  
> +		if (inode->i_state & I_NEW) {
> +			requeue_io(inode);
> +			continue;
> +		}
> +
>  		if (wbc->nonblocking && bdi_write_congested(bdi)) {
>  			wbc->encountered_congestion = 1;
>  			if (!sb_is_blkdev_sb(sb))
> @@ -531,7 +538,7 @@ void generic_sync_sb_inodes(struct super
>  		list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
>  			struct address_space *mapping;
>  
> -			if (inode->i_state & (I_FREEING|I_WILL_FREE))
> +			if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
>  				continue;
>  			mapping = inode->i_mapping;
>  			if (mapping->nrpages == 0)
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: new inode i_state corruption fix
  2009-03-05 10:00 ` Jan Kara
@ 2009-03-05 10:16   ` Nick Piggin
  2009-03-05 11:12     ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2009-03-05 10:16 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, stable

On Thu, Mar 05, 2009 at 11:00:01AM +0100, Jan Kara wrote:
> On Thu 05-03-09 07:45:54, Nick Piggin wrote:
> > after ~1hour of running. Previously, the new warnings would start immediately
> > and hang would happen in under 5 minutes.
>   A quick grep seems to indicate that you've still missed a few cases,
> haven't you? I still see the same problem in
> drop_caches.c:drop_pagecache_sb() scanning, inode.c:invalidate_inodes()
> scanning, and dquot.c:add_dquot_ref() scanning.
>   Otherwise the patch looks fine.

I thought they should be OK; drop_pagecache_sb doesn't play with flags,
invalidate_inodes won't if refcount is elevated, and I think add_dquot_ref
won't if writecount is not elevated...

But maybe that's  abit fragile and it would be better policy to always
skip I_NEW in these traverals?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: new inode i_state corruption fix
  2009-03-05 10:16   ` Nick Piggin
@ 2009-03-05 11:12     ` Jan Kara
  2009-03-10 13:41       ` [patch] fs: avoid I_NEW inodes Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2009-03-05 11:12 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, stable

On Thu 05-03-09 11:16:37, Nick Piggin wrote:
> On Thu, Mar 05, 2009 at 11:00:01AM +0100, Jan Kara wrote:
> > On Thu 05-03-09 07:45:54, Nick Piggin wrote:
> > > after ~1hour of running. Previously, the new warnings would start immediately
> > > and hang would happen in under 5 minutes.
> >   A quick grep seems to indicate that you've still missed a few cases,
> > haven't you? I still see the same problem in
> > drop_caches.c:drop_pagecache_sb() scanning, inode.c:invalidate_inodes()
> > scanning, and dquot.c:add_dquot_ref() scanning.
> >   Otherwise the patch looks fine.
> 
> I thought they should be OK; drop_pagecache_sb doesn't play with flags,
> invalidate_inodes won't if refcount is elevated, and I think add_dquot_ref
> won't if writecount is not elevated...
  Ah, ok, you are probably right.

> But maybe that's  abit fragile and it would be better policy to always
> skip I_NEW in these traverals?
  Yes, it seems too fragile to me. I'm not saying we have to forbid
everything for I_NEW inodes but I think we should set clear simple rules
what is protected by I_NEW and then verify that all sites which can come
across such inodes obey them.

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch] fs: avoid I_NEW inodes
  2009-03-05 11:12     ` Jan Kara
@ 2009-03-10 13:41       ` Nick Piggin
  2009-03-10 16:03         ` Jan Kara
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2009-03-10 13:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, stable

On Thu, Mar 05, 2009 at 12:12:26PM +0100, Jan Kara wrote:
> On Thu 05-03-09 11:16:37, Nick Piggin wrote:
> > On Thu, Mar 05, 2009 at 11:00:01AM +0100, Jan Kara wrote:
> > > On Thu 05-03-09 07:45:54, Nick Piggin wrote:
> > > > after ~1hour of running. Previously, the new warnings would start immediately
> > > > and hang would happen in under 5 minutes.
> > >   A quick grep seems to indicate that you've still missed a few cases,
> > > haven't you? I still see the same problem in
> > > drop_caches.c:drop_pagecache_sb() scanning, inode.c:invalidate_inodes()
> > > scanning, and dquot.c:add_dquot_ref() scanning.
> > >   Otherwise the patch looks fine.
> > 
> > I thought they should be OK; drop_pagecache_sb doesn't play with flags,
> > invalidate_inodes won't if refcount is elevated, and I think add_dquot_ref
> > won't if writecount is not elevated...
>   Ah, ok, you are probably right.
> 
> > But maybe that's  abit fragile and it would be better policy to always
> > skip I_NEW in these traverals?
>   Yes, it seems too fragile to me. I'm not saying we have to forbid
> everything for I_NEW inodes but I think we should set clear simple rules
> what is protected by I_NEW and then verify that all sites which can come
> across such inodes obey them.

OK, sorry for the delay, what do you think of the following patch on top
of the last?

---

To be on the safe side, it should be less fragile to exclude I_NEW inodes
from inode list scans by default (unless there is an important reason to
have them).

Normally they will get excluded (eg. by zero refcount or writecount etc),
however it is a bit fragile for list walkers to know exactly what parts of
the inode state is set up and valid to test when in I_NEW. So along these
lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
checks with them too -- this shouldn't be a problem should it?)

Signed-off-by: Nick Piggin <npiggin@suse.de>

---
 fs/dquot.c                  |    6 ++++--
 fs/drop_caches.c            |    2 +-
 fs/inode.c                  |    2 ++
 fs/notify/inotify/inotify.c |   16 ++++++++--------
 4 files changed, 15 insertions(+), 11 deletions(-)

Index: linux-2.6/fs/dquot.c
===================================================================
--- linux-2.6.orig/fs/dquot.c
+++ linux-2.6/fs/dquot.c
@@ -789,12 +789,12 @@ static void add_dquot_ref(struct super_b
 
 	spin_lock(&inode_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
+			continue;
 		if (!atomic_read(&inode->i_writecount))
 			continue;
 		if (!dqinit_needed(inode, type))
 			continue;
-		if (inode->i_state & (I_FREEING|I_WILL_FREE))
-			continue;
 
 		__iget(inode);
 		spin_unlock(&inode_lock);
@@ -870,6 +870,8 @@ static void remove_dquot_ref(struct supe
 
 	spin_lock(&inode_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		if (inode->i_state & I_NEW)
+			continue;
 		if (!IS_NOQUOTA(inode))
 			remove_inode_dquot_ref(inode, type, tofree_head);
 	}
Index: linux-2.6/fs/drop_caches.c
===================================================================
--- linux-2.6.orig/fs/drop_caches.c
+++ linux-2.6/fs/drop_caches.c
@@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
 
 	spin_lock(&inode_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		if (inode->i_state & (I_FREEING|I_WILL_FREE))
+		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
 			continue;
 		if (inode->i_mapping->nrpages == 0)
 			continue;
Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -356,6 +356,8 @@ static int invalidate_list(struct list_h
 		if (tmp == head)
 			break;
 		inode = list_entry(tmp, struct inode, i_sb_list);
+		if (inode->i_state & I_NEW)
+			continue;
 		invalidate_inode_buffers(inode);
 		if (!atomic_read(&inode->i_count)) {
 			list_move(&inode->i_list, dispose);
Index: linux-2.6/fs/notify/inotify/inotify.c
===================================================================
--- linux-2.6.orig/fs/notify/inotify/inotify.c
+++ linux-2.6/fs/notify/inotify/inotify.c
@@ -380,6 +380,14 @@ void inotify_unmount_inodes(struct list_
 		struct list_head *watches;
 
 		/*
+		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
+		 * I_WILL_FREE which is fine because by that point the inode
+		 * cannot have any associated watches.
+		 */
+		if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))
+			continue;
+
+		/*
 		 * If i_count is zero, the inode cannot have any watches and
 		 * doing an __iget/iput with MS_ACTIVE clear would actually
 		 * evict all inodes with zero i_count from icache which is
@@ -388,14 +396,6 @@ void inotify_unmount_inodes(struct list_
 		if (!atomic_read(&inode->i_count))
 			continue;
 
-		/*
-		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
-		 * I_WILL_FREE which is fine because by that point the inode
-		 * cannot have any associated watches.
-		 */
-		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
-			continue;
-
 		need_iput_tmp = need_iput;
 		need_iput = NULL;
 		/* In case inotify_remove_watch_locked() drops a reference. */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: avoid I_NEW inodes
  2009-03-10 13:41       ` [patch] fs: avoid I_NEW inodes Nick Piggin
@ 2009-03-10 16:03         ` Jan Kara
  2009-03-11  2:34           ` Nick Piggin
  2009-03-11  3:29           ` Nick Piggin
  0 siblings, 2 replies; 13+ messages in thread
From: Jan Kara @ 2009-03-10 16:03 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, stable

  Hi,

On Tue 10-03-09 14:41:06, Nick Piggin wrote:
> On Thu, Mar 05, 2009 at 12:12:26PM +0100, Jan Kara wrote:
> > On Thu 05-03-09 11:16:37, Nick Piggin wrote:
> > > On Thu, Mar 05, 2009 at 11:00:01AM +0100, Jan Kara wrote:
> > > > On Thu 05-03-09 07:45:54, Nick Piggin wrote:
> > > > > after ~1hour of running. Previously, the new warnings would start immediately
> > > > > and hang would happen in under 5 minutes.
> > > >   A quick grep seems to indicate that you've still missed a few cases,
> > > > haven't you? I still see the same problem in
> > > > drop_caches.c:drop_pagecache_sb() scanning, inode.c:invalidate_inodes()
> > > > scanning, and dquot.c:add_dquot_ref() scanning.
> > > >   Otherwise the patch looks fine.
> > > 
> > > I thought they should be OK; drop_pagecache_sb doesn't play with flags,
> > > invalidate_inodes won't if refcount is elevated, and I think add_dquot_ref
> > > won't if writecount is not elevated...
> >   Ah, ok, you are probably right.
> > 
> > > But maybe that's  abit fragile and it would be better policy to always
> > > skip I_NEW in these traverals?
> >   Yes, it seems too fragile to me. I'm not saying we have to forbid
> > everything for I_NEW inodes but I think we should set clear simple rules
> > what is protected by I_NEW and then verify that all sites which can come
> > across such inodes obey them.
> 
> OK, sorry for the delay, what do you think of the following patch on top
> of the last?
  Thanks for the patch. I have a few comments. See below.

> ---
> 
> To be on the safe side, it should be less fragile to exclude I_NEW inodes
> from inode list scans by default (unless there is an important reason to
> have them).
> 
> Normally they will get excluded (eg. by zero refcount or writecount etc),
> however it is a bit fragile for list walkers to know exactly what parts of
> the inode state is set up and valid to test when in I_NEW. So along these
> lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
> checks with them too -- this shouldn't be a problem should it?)
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> 
> ---
>  fs/dquot.c                  |    6 ++++--
>  fs/drop_caches.c            |    2 +-
>  fs/inode.c                  |    2 ++
>  fs/notify/inotify/inotify.c |   16 ++++++++--------
>  4 files changed, 15 insertions(+), 11 deletions(-)
> 
> Index: linux-2.6/fs/dquot.c
> ===================================================================
> --- linux-2.6.orig/fs/dquot.c
> +++ linux-2.6/fs/dquot.c
> @@ -789,12 +789,12 @@ static void add_dquot_ref(struct super_b
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> +			continue;
>  		if (!atomic_read(&inode->i_writecount))
>  			continue;
>  		if (!dqinit_needed(inode, type))
>  			continue;
> -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> -			continue;
>  
>  		__iget(inode);
>  		spin_unlock(&inode_lock);
> @@ -870,6 +870,8 @@ static void remove_dquot_ref(struct supe
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		if (inode->i_state & I_NEW)
> +			continue;
>  		if (!IS_NOQUOTA(inode))
>  			remove_inode_dquot_ref(inode, type, tofree_head);
>  	}
  Hmm, in this scan, we have to scan also I_NEW inodes because they can
already have quota pointers initialized and so we could leave some dangling
quota references if we skipped I_NEW inodes. Nasty. So just add a comment
here like this one here:
/*
 *  We have to scan also I_NEW inodes because they can already have quota
 *  pointer initialized. Luckily, we need to touch only quota pointers and
 *  these have separate locking (dqptr_sem).
 */

> Index: linux-2.6/fs/drop_caches.c
> ===================================================================
> --- linux-2.6.orig/fs/drop_caches.c
> +++ linux-2.6/fs/drop_caches.c
> @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
>  			continue;
>  		if (inode->i_mapping->nrpages == 0)
>  			continue;
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c
> +++ linux-2.6/fs/inode.c
> @@ -356,6 +356,8 @@ static int invalidate_list(struct list_h
>  		if (tmp == head)
>  			break;
>  		inode = list_entry(tmp, struct inode, i_sb_list);
> +		if (inode->i_state & I_NEW)
> +			continue;
  If somebody is setting up inodes at this point, we are in serious
trouble I think. So WARN_ON would be more appropriate I think.

>  		invalidate_inode_buffers(inode);
>  		if (!atomic_read(&inode->i_count)) {
>  			list_move(&inode->i_list, dispose);
> Index: linux-2.6/fs/notify/inotify/inotify.c
> ===================================================================
> --- linux-2.6.orig/fs/notify/inotify/inotify.c
> +++ linux-2.6/fs/notify/inotify/inotify.c
> @@ -380,6 +380,14 @@ void inotify_unmount_inodes(struct list_
>  		struct list_head *watches;
>  
>  		/*
> +		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> +		 * I_WILL_FREE which is fine because by that point the inode
> +		 * cannot have any associated watches.
> +		 */
  Update the comment?

> +		if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))
> +			continue;
> +
> +		/*
>  		 * If i_count is zero, the inode cannot have any watches and
>  		 * doing an __iget/iput with MS_ACTIVE clear would actually
>  		 * evict all inodes with zero i_count from icache which is
> @@ -388,14 +396,6 @@ void inotify_unmount_inodes(struct list_
>  		if (!atomic_read(&inode->i_count))
>  			continue;
>  
> -		/*
> -		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> -		 * I_WILL_FREE which is fine because by that point the inode
> -		 * cannot have any associated watches.
> -		 */
> -		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
> -			continue;
> -
>  		need_iput_tmp = need_iput;
>  		need_iput = NULL;
>  		/* In case inotify_remove_watch_locked() drops a reference. */

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: avoid I_NEW inodes
  2009-03-10 16:03         ` Jan Kara
@ 2009-03-11  2:34           ` Nick Piggin
  2009-03-11 12:22             ` Jan Kara
  2009-03-11  3:29           ` Nick Piggin
  1 sibling, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2009-03-11  2:34 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, stable

On Tue, Mar 10, 2009 at 05:03:21PM +0100, Jan Kara wrote:
> > OK, sorry for the delay, what do you think of the following patch on top
> > of the last?
>   Thanks for the patch. I have a few comments. See below.
> 
> > ---
> > 
> > To be on the safe side, it should be less fragile to exclude I_NEW inodes
> > from inode list scans by default (unless there is an important reason to
> > have them).
> > 
> > Normally they will get excluded (eg. by zero refcount or writecount etc),
> > however it is a bit fragile for list walkers to know exactly what parts of
> > the inode state is set up and valid to test when in I_NEW. So along these
> > lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
> > checks with them too -- this shouldn't be a problem should it?)
> > 
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > 
> > ---
> >  fs/dquot.c                  |    6 ++++--
> >  fs/drop_caches.c            |    2 +-
> >  fs/inode.c                  |    2 ++
> >  fs/notify/inotify/inotify.c |   16 ++++++++--------
> >  4 files changed, 15 insertions(+), 11 deletions(-)
> > 
> > Index: linux-2.6/fs/dquot.c
> > ===================================================================
> > --- linux-2.6.orig/fs/dquot.c
> > +++ linux-2.6/fs/dquot.c
> > @@ -789,12 +789,12 @@ static void add_dquot_ref(struct super_b
> >  
> >  	spin_lock(&inode_lock);
> >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> > +			continue;
> >  		if (!atomic_read(&inode->i_writecount))
> >  			continue;
> >  		if (!dqinit_needed(inode, type))
> >  			continue;
> > -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > -			continue;
> >  
> >  		__iget(inode);
> >  		spin_unlock(&inode_lock);
> > @@ -870,6 +870,8 @@ static void remove_dquot_ref(struct supe
> >  
> >  	spin_lock(&inode_lock);
> >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > +		if (inode->i_state & I_NEW)
> > +			continue;
> >  		if (!IS_NOQUOTA(inode))
> >  			remove_inode_dquot_ref(inode, type, tofree_head);
> >  	}
>   Hmm, in this scan, we have to scan also I_NEW inodes because they can
> already have quota pointers initialized and so we could leave some dangling
> quota references if we skipped I_NEW inodes. Nasty. So just add a comment
> here like this one here:
> /*
>  *  We have to scan also I_NEW inodes because they can already have quota
>  *  pointer initialized. Luckily, we need to touch only quota pointers and
>  *  these have separate locking (dqptr_sem).
>  */

OK, thanks. This is what I was unsure of.

 
> > Index: linux-2.6/fs/drop_caches.c
> > ===================================================================
> > --- linux-2.6.orig/fs/drop_caches.c
> > +++ linux-2.6/fs/drop_caches.c
> > @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
> >  
> >  	spin_lock(&inode_lock);
> >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> >  			continue;
> >  		if (inode->i_mapping->nrpages == 0)
> >  			continue;
> > Index: linux-2.6/fs/inode.c
> > ===================================================================
> > --- linux-2.6.orig/fs/inode.c
> > +++ linux-2.6/fs/inode.c
> > @@ -356,6 +356,8 @@ static int invalidate_list(struct list_h
> >  		if (tmp == head)
> >  			break;
> >  		inode = list_entry(tmp, struct inode, i_sb_list);
> > +		if (inode->i_state & I_NEW)
> > +			continue;
>   If somebody is setting up inodes at this point, we are in serious
> trouble I think. So WARN_ON would be more appropriate I think.

Really? Hmm, this is also called via flush_disk which seems like it
can operate under a mounted filesystem?

 
> >  		invalidate_inode_buffers(inode);
> >  		if (!atomic_read(&inode->i_count)) {
> >  			list_move(&inode->i_list, dispose);
> > Index: linux-2.6/fs/notify/inotify/inotify.c
> > ===================================================================
> > --- linux-2.6.orig/fs/notify/inotify/inotify.c
> > +++ linux-2.6/fs/notify/inotify/inotify.c
> > @@ -380,6 +380,14 @@ void inotify_unmount_inodes(struct list_
> >  		struct list_head *watches;
> >  
> >  		/*
> > +		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> > +		 * I_WILL_FREE which is fine because by that point the inode
> > +		 * cannot have any associated watches.
> > +		 */
>   Update the comment?

Will do.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: avoid I_NEW inodes
  2009-03-10 16:03         ` Jan Kara
  2009-03-11  2:34           ` Nick Piggin
@ 2009-03-11  3:29           ` Nick Piggin
  2009-03-11 12:24             ` Jan Kara
  1 sibling, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2009-03-11  3:29 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, stable

How about this?

--
To be on the safe side, it should be less fragile to exclude I_NEW inodes
from inode list scans by default (unless there is an important reason to
have them).

Normally they will get excluded (eg. by zero refcount or writecount etc),
however it is a bit fragile for list walkers to know exactly what parts of
the inode state is set up and valid to test when in I_NEW. So along these
lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
checks with them too -- this shouldn't be a problem should it?)

Signed-off-by: Nick Piggin <npiggin@suse.de>

---
 fs/dquot.c                  |   10 ++++++++--
 fs/drop_caches.c            |    2 +-
 fs/inode.c                  |    2 ++
 fs/notify/inotify/inotify.c |   16 ++++++++--------
 4 files changed, 19 insertions(+), 11 deletions(-)

Index: linux-2.6/fs/dquot.c
===================================================================
--- linux-2.6.orig/fs/dquot.c
+++ linux-2.6/fs/dquot.c
@@ -789,12 +789,12 @@ static void add_dquot_ref(struct super_b
 
 	spin_lock(&inode_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
+			continue;
 		if (!atomic_read(&inode->i_writecount))
 			continue;
 		if (!dqinit_needed(inode, type))
 			continue;
-		if (inode->i_state & (I_FREEING|I_WILL_FREE))
-			continue;
 
 		__iget(inode);
 		spin_unlock(&inode_lock);
@@ -870,6 +870,12 @@ static void remove_dquot_ref(struct supe
 
 	spin_lock(&inode_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+		/*
+		 *  We have to scan also I_NEW inodes because they can already
+		 *  have quota pointer initialized. Luckily, we need to touch
+		 *  only quota pointers and these have separate locking
+		 *  (dqptr_sem).
+		 */
 		if (!IS_NOQUOTA(inode))
 			remove_inode_dquot_ref(inode, type, tofree_head);
 	}
Index: linux-2.6/fs/drop_caches.c
===================================================================
--- linux-2.6.orig/fs/drop_caches.c
+++ linux-2.6/fs/drop_caches.c
@@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
 
 	spin_lock(&inode_lock);
 	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
-		if (inode->i_state & (I_FREEING|I_WILL_FREE))
+		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
 			continue;
 		if (inode->i_mapping->nrpages == 0)
 			continue;
Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -356,6 +356,8 @@ static int invalidate_list(struct list_h
 		if (tmp == head)
 			break;
 		inode = list_entry(tmp, struct inode, i_sb_list);
+		if (inode->i_state & I_NEW)
+			continue;
 		invalidate_inode_buffers(inode);
 		if (!atomic_read(&inode->i_count)) {
 			list_move(&inode->i_list, dispose);
Index: linux-2.6/fs/notify/inotify/inotify.c
===================================================================
--- linux-2.6.orig/fs/notify/inotify/inotify.c
+++ linux-2.6/fs/notify/inotify/inotify.c
@@ -380,6 +380,14 @@ void inotify_unmount_inodes(struct list_
 		struct list_head *watches;
 
 		/*
+		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING,
+		 * I_WILL_FREE, or I_NEW which is fine because by that point
+		 * the inode cannot have any associated watches.
+		 */
+		if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))
+			continue;
+
+		/*
 		 * If i_count is zero, the inode cannot have any watches and
 		 * doing an __iget/iput with MS_ACTIVE clear would actually
 		 * evict all inodes with zero i_count from icache which is
@@ -388,14 +396,6 @@ void inotify_unmount_inodes(struct list_
 		if (!atomic_read(&inode->i_count))
 			continue;
 
-		/*
-		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
-		 * I_WILL_FREE which is fine because by that point the inode
-		 * cannot have any associated watches.
-		 */
-		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
-			continue;
-
 		need_iput_tmp = need_iput;
 		need_iput = NULL;
 		/* In case inotify_remove_watch_locked() drops a reference. */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: avoid I_NEW inodes
  2009-03-11  2:34           ` Nick Piggin
@ 2009-03-11 12:22             ` Jan Kara
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Kara @ 2009-03-11 12:22 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, stable

On Wed 11-03-09 03:34:30, Nick Piggin wrote:
> On Tue, Mar 10, 2009 at 05:03:21PM +0100, Jan Kara wrote:
> > > Index: linux-2.6/fs/drop_caches.c
> > > ===================================================================
> > > --- linux-2.6.orig/fs/drop_caches.c
> > > +++ linux-2.6/fs/drop_caches.c
> > > @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
> > >  
> > >  	spin_lock(&inode_lock);
> > >  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > > -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > > +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> > >  			continue;
> > >  		if (inode->i_mapping->nrpages == 0)
> > >  			continue;
> > > Index: linux-2.6/fs/inode.c
> > > ===================================================================
> > > --- linux-2.6.orig/fs/inode.c
> > > +++ linux-2.6/fs/inode.c
> > > @@ -356,6 +356,8 @@ static int invalidate_list(struct list_h
> > >  		if (tmp == head)
> > >  			break;
> > >  		inode = list_entry(tmp, struct inode, i_sb_list);
> > > +		if (inode->i_state & I_NEW)
> > > +			continue;
> >   If somebody is setting up inodes at this point, we are in serious
> > trouble I think. So WARN_ON would be more appropriate I think.
> 
> Really? Hmm, this is also called via flush_disk which seems like it
> can operate under a mounted filesystem?
  Ah, I was not following calls far enough. You're right.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: avoid I_NEW inodes
  2009-03-11  3:29           ` Nick Piggin
@ 2009-03-11 12:24             ` Jan Kara
  2009-03-11 12:57               ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Kara @ 2009-03-11 12:24 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, stable

On Wed 11-03-09 04:29:18, Nick Piggin wrote:
> How about this?
  Looks fine to me.

> --
> To be on the safe side, it should be less fragile to exclude I_NEW inodes
> from inode list scans by default (unless there is an important reason to
> have them).
> 
> Normally they will get excluded (eg. by zero refcount or writecount etc),
> however it is a bit fragile for list walkers to know exactly what parts of
> the inode state is set up and valid to test when in I_NEW. So along these
> lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
> checks with them too -- this shouldn't be a problem should it?)
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>
  Acked-by: Jan Kara <jack@suse.cz>

									Honza

> ---
>  fs/dquot.c                  |   10 ++++++++--
>  fs/drop_caches.c            |    2 +-
>  fs/inode.c                  |    2 ++
>  fs/notify/inotify/inotify.c |   16 ++++++++--------
>  4 files changed, 19 insertions(+), 11 deletions(-)
> 
> Index: linux-2.6/fs/dquot.c
> ===================================================================
> --- linux-2.6.orig/fs/dquot.c
> +++ linux-2.6/fs/dquot.c
> @@ -789,12 +789,12 @@ static void add_dquot_ref(struct super_b
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
> +			continue;
>  		if (!atomic_read(&inode->i_writecount))
>  			continue;
>  		if (!dqinit_needed(inode, type))
>  			continue;
> -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> -			continue;
>  
>  		__iget(inode);
>  		spin_unlock(&inode_lock);
> @@ -870,6 +870,12 @@ static void remove_dquot_ref(struct supe
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> +		/*
> +		 *  We have to scan also I_NEW inodes because they can already
> +		 *  have quota pointer initialized. Luckily, we need to touch
> +		 *  only quota pointers and these have separate locking
> +		 *  (dqptr_sem).
> +		 */
>  		if (!IS_NOQUOTA(inode))
>  			remove_inode_dquot_ref(inode, type, tofree_head);
>  	}
> Index: linux-2.6/fs/drop_caches.c
> ===================================================================
> --- linux-2.6.orig/fs/drop_caches.c
> +++ linux-2.6/fs/drop_caches.c
> @@ -18,7 +18,7 @@ static void drop_pagecache_sb(struct sup
>  
>  	spin_lock(&inode_lock);
>  	list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> -		if (inode->i_state & (I_FREEING|I_WILL_FREE))
> +		if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW))
>  			continue;
>  		if (inode->i_mapping->nrpages == 0)
>  			continue;
> Index: linux-2.6/fs/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/inode.c
> +++ linux-2.6/fs/inode.c
> @@ -356,6 +356,8 @@ static int invalidate_list(struct list_h
>  		if (tmp == head)
>  			break;
>  		inode = list_entry(tmp, struct inode, i_sb_list);
> +		if (inode->i_state & I_NEW)
> +			continue;
>  		invalidate_inode_buffers(inode);
>  		if (!atomic_read(&inode->i_count)) {
>  			list_move(&inode->i_list, dispose);
> Index: linux-2.6/fs/notify/inotify/inotify.c
> ===================================================================
> --- linux-2.6.orig/fs/notify/inotify/inotify.c
> +++ linux-2.6/fs/notify/inotify/inotify.c
> @@ -380,6 +380,14 @@ void inotify_unmount_inodes(struct list_
>  		struct list_head *watches;
>  
>  		/*
> +		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING,
> +		 * I_WILL_FREE, or I_NEW which is fine because by that point
> +		 * the inode cannot have any associated watches.
> +		 */
> +		if (inode->i_state & (I_CLEAR|I_FREEING|I_WILL_FREE|I_NEW))
> +			continue;
> +
> +		/*
>  		 * If i_count is zero, the inode cannot have any watches and
>  		 * doing an __iget/iput with MS_ACTIVE clear would actually
>  		 * evict all inodes with zero i_count from icache which is
> @@ -388,14 +396,6 @@ void inotify_unmount_inodes(struct list_
>  		if (!atomic_read(&inode->i_count))
>  			continue;
>  
> -		/*
> -		 * We cannot __iget() an inode in state I_CLEAR, I_FREEING, or
> -		 * I_WILL_FREE which is fine because by that point the inode
> -		 * cannot have any associated watches.
> -		 */
> -		if (inode->i_state & (I_CLEAR | I_FREEING | I_WILL_FREE))
> -			continue;
> -
>  		need_iput_tmp = need_iput;
>  		need_iput = NULL;
>  		/* In case inotify_remove_watch_locked() drops a reference. */
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: avoid I_NEW inodes
  2009-03-11 12:24             ` Jan Kara
@ 2009-03-11 12:57               ` Nick Piggin
  2009-03-11 20:19                 ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2009-03-11 12:57 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Linux Kernel Mailing List, Andrew Morton,
	Jorge Boncompte [DTI2],
	Adrian Hunter, stable

On Wed, Mar 11, 2009 at 01:24:20PM +0100, Jan Kara wrote:
> On Wed 11-03-09 04:29:18, Nick Piggin wrote:
> > How about this?
>   Looks fine to me.

Thanks for the good review. Andrew, do you think you can apply this
on top of the previous patch? I'm undecided as to whether they should
go together or not. Probably the first one is a minimal fix that
doesn't alter behaviour as much, but things seem more robust after this
2nd patch. I think both would probably be suitable for 2.6.29, being a
nasty bug, but it isn't a recent regression AFAIKS.

> 
> > --
> > To be on the safe side, it should be less fragile to exclude I_NEW inodes
> > from inode list scans by default (unless there is an important reason to
> > have them).
> > 
> > Normally they will get excluded (eg. by zero refcount or writecount etc),
> > however it is a bit fragile for list walkers to know exactly what parts of
> > the inode state is set up and valid to test when in I_NEW. So along these
> > lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
> > checks with them too -- this shouldn't be a problem should it?)
> > 
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
>   Acked-by: Jan Kara <jack@suse.cz>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: avoid I_NEW inodes
  2009-03-11 12:57               ` Nick Piggin
@ 2009-03-11 20:19                 ` Andrew Morton
  2009-03-12  3:09                   ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2009-03-11 20:19 UTC (permalink / raw)
  To: Nick Piggin
  Cc: jack, linux-fsdevel, linux-kernel, jorge, ext-adrian.hunter, stable

On Wed, 11 Mar 2009 13:57:48 +0100
Nick Piggin <npiggin@suse.de> wrote:

> On Wed, Mar 11, 2009 at 01:24:20PM +0100, Jan Kara wrote:
> > On Wed 11-03-09 04:29:18, Nick Piggin wrote:
> > > How about this?
> >   Looks fine to me.
> 
> Thanks for the good review. Andrew, do you think you can apply this
> on top of the previous patch? I'm undecided as to whether they should
> go together or not. Probably the first one is a minimal fix that
> doesn't alter behaviour as much, but things seem more robust after this
> 2nd patch. I think both would probably be suitable for 2.6.29, being a
> nasty bug, but it isn't a recent regression AFAIKS.
> 

How's about we do fs-new-inode-i_state-corruption-fix.patch in 2.6.29
and fs-avoid-i_new-inodes.patch in 2.6.30?  We could backport
fs-avoid-i_new-inodes.patch into 2.6.29.x if needed.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch] fs: avoid I_NEW inodes
  2009-03-11 20:19                 ` Andrew Morton
@ 2009-03-12  3:09                   ` Nick Piggin
  0 siblings, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2009-03-12  3:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: jack, linux-fsdevel, linux-kernel, jorge, ext-adrian.hunter, stable

On Wed, Mar 11, 2009 at 01:19:15PM -0700, Andrew Morton wrote:
> On Wed, 11 Mar 2009 13:57:48 +0100
> Nick Piggin <npiggin@suse.de> wrote:
> 
> > On Wed, Mar 11, 2009 at 01:24:20PM +0100, Jan Kara wrote:
> > > On Wed 11-03-09 04:29:18, Nick Piggin wrote:
> > > > How about this?
> > >   Looks fine to me.
> > 
> > Thanks for the good review. Andrew, do you think you can apply this
> > on top of the previous patch? I'm undecided as to whether they should
> > go together or not. Probably the first one is a minimal fix that
> > doesn't alter behaviour as much, but things seem more robust after this
> > 2nd patch. I think both would probably be suitable for 2.6.29, being a
> > nasty bug, but it isn't a recent regression AFAIKS.
> > 
> 
> How's about we do fs-new-inode-i_state-corruption-fix.patch in 2.6.29
> and fs-avoid-i_new-inodes.patch in 2.6.30?  We could backport
> fs-avoid-i_new-inodes.patch into 2.6.29.x if needed.

Yes that's probably best.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-03-12  3:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-05  6:45 [patch] fs: new inode i_state corruption fix Nick Piggin
2009-03-05 10:00 ` Jan Kara
2009-03-05 10:16   ` Nick Piggin
2009-03-05 11:12     ` Jan Kara
2009-03-10 13:41       ` [patch] fs: avoid I_NEW inodes Nick Piggin
2009-03-10 16:03         ` Jan Kara
2009-03-11  2:34           ` Nick Piggin
2009-03-11 12:22             ` Jan Kara
2009-03-11  3:29           ` Nick Piggin
2009-03-11 12:24             ` Jan Kara
2009-03-11 12:57               ` Nick Piggin
2009-03-11 20:19                 ` Andrew Morton
2009-03-12  3:09                   ` Nick Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.