linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* The bug of iput() removal from flusher thread?
@ 2012-11-17  8:42 OGAWA Hirofumi
  2012-11-19  8:56 ` OGAWA Hirofumi
  0 siblings, 1 reply; 15+ messages in thread
From: OGAWA Hirofumi @ 2012-11-17  8:42 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, linux-fsdevel, linux-kernel

Hi,

In 169ebd90131b2ffca74bb2dbe7eeacd39fb83714 commit, writeback doesn't
__iget()/iput() anymore.

This means nobody moves the inode to lru list. I.e.

	new_inode()
	dirty_inode()
	iput_final()
		/* keep inode without adding lru */
	flush indoes
        /* clean inode is not on lru */

I noticed this situation in my FS though, I think the same bug is on all
FSes of linus tree too, after this commit.

Am I missing the something?

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-17  8:42 The bug of iput() removal from flusher thread? OGAWA Hirofumi
@ 2012-11-19  8:56 ` OGAWA Hirofumi
  2012-11-19 14:51   ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: OGAWA Hirofumi @ 2012-11-19  8:56 UTC (permalink / raw)
  To: Al Viro; +Cc: Jan Kara, linux-fsdevel, linux-kernel

OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:

> Hi,
>
> In 169ebd90131b2ffca74bb2dbe7eeacd39fb83714 commit, writeback doesn't
> __iget()/iput() anymore.
>
> This means nobody moves the inode to lru list. I.e.
>
> 	new_inode()
> 	dirty_inode()
> 	iput_final()
> 		/* keep inode without adding lru */
> 	flush indoes
>         /* clean inode is not on lru */
>
> I noticed this situation in my FS though, I think the same bug is on all
> FSes of linus tree too, after this commit.
>
> Am I missing the something?

This seems to be reproducible by the following,

#!/bin/sh

for i in $(seq -w 1000); do
	for j in $(seq -w 1000); do
        	for k in $(seq -w 1000); do
                	mkdir -p $i/$j
                        echo $i/$j/$k > $i/$j/$k
                        echo 2 > /proc/sys/vm/drop_caches
                done
        done
done

Some inodes never be reclaimed, and ls -l frees those inodes (stat(2)
does iget/iput).
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-19  8:56 ` OGAWA Hirofumi
@ 2012-11-19 14:51   ` Jan Kara
  2012-11-19 19:41     ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2012-11-19 14:51 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Al Viro, Jan Kara, linux-fsdevel, linux-kernel

On Mon 19-11-12 17:56:22, OGAWA Hirofumi wrote:
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:
> 
> > Hi,
> >
> > In 169ebd90131b2ffca74bb2dbe7eeacd39fb83714 commit, writeback doesn't
> > __iget()/iput() anymore.
> >
> > This means nobody moves the inode to lru list. I.e.
> >
> > 	new_inode()
> > 	dirty_inode()
> > 	iput_final()
> > 		/* keep inode without adding lru */
> > 	flush indoes
> >         /* clean inode is not on lru */
> >
> > I noticed this situation in my FS though, I think the same bug is on all
> > FSes of linus tree too, after this commit.
> >
> > Am I missing the something?
> 
> This seems to be reproducible by the following,
> 
> #!/bin/sh
> 
> for i in $(seq -w 1000); do
> 	for j in $(seq -w 1000); do
>         	for k in $(seq -w 1000); do
>                 	mkdir -p $i/$j
>                         echo $i/$j/$k > $i/$j/$k
>                         echo 2 > /proc/sys/vm/drop_caches
>                 done
>         done
> done
> 
> Some inodes never be reclaimed, and ls -l frees those inodes (stat(2)
> does iget/iput).
  So looking into the code I agree we won't put inode into the LRU when it
is dirty or under writeback and after writeback is done it won't happen
either. That's certainly a bug. But I have hard time reproducing your
results because on my kernels even dcache doesn't get shrunk thus inodes
are pinned in memory by it. Not sure what's going on yet but I'll
investigate. Thanks for report!

								Honza
  
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-19 14:51   ` Jan Kara
@ 2012-11-19 19:41     ` Jan Kara
  2012-11-19 20:51       ` OGAWA Hirofumi
  2012-11-20 22:37       ` Dave Chinner
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Kara @ 2012-11-19 19:41 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Al Viro, Jan Kara, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1783 bytes --]

On Mon 19-11-12 15:51:40, Jan Kara wrote:
> On Mon 19-11-12 17:56:22, OGAWA Hirofumi wrote:
> > OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> writes:
> > 
> > > Hi,
> > >
> > > In 169ebd90131b2ffca74bb2dbe7eeacd39fb83714 commit, writeback doesn't
> > > __iget()/iput() anymore.
> > >
> > > This means nobody moves the inode to lru list. I.e.
> > >
> > > 	new_inode()
> > > 	dirty_inode()
> > > 	iput_final()
> > > 		/* keep inode without adding lru */
> > > 	flush indoes
> > >         /* clean inode is not on lru */
> > >
> > > I noticed this situation in my FS though, I think the same bug is on all
> > > FSes of linus tree too, after this commit.
> > >
> > > Am I missing the something?
> > 
> > This seems to be reproducible by the following,
> > 
> > #!/bin/sh
> > 
> > for i in $(seq -w 1000); do
> > 	for j in $(seq -w 1000); do
> >         	for k in $(seq -w 1000); do
> >                 	mkdir -p $i/$j
> >                         echo $i/$j/$k > $i/$j/$k
> >                         echo 2 > /proc/sys/vm/drop_caches
> >                 done
> >         done
> > done
> > 
> > Some inodes never be reclaimed, and ls -l frees those inodes (stat(2)
> > does iget/iput).
>   So looking into the code I agree we won't put inode into the LRU when it
> is dirty or under writeback and after writeback is done it won't happen
> either. That's certainly a bug. But I have hard time reproducing your
> results because on my kernels even dcache doesn't get shrunk thus inodes
> are pinned in memory by it. Not sure what's going on yet but I'll
> investigate. Thanks for report!
  OK, that was just reclaim batching code standing in my way. After
figuring that out I could reproduce the issue and test my fix. It is
attached.
								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-writeback-Put-unused-inodes-to-LRU-after-writeback-c.patch --]
[-- Type: text/x-patch, Size: 2702 bytes --]

>From 4fdc5d9a66dfe0286ef4f4a7f53fd3b15086470f Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 19 Nov 2012 20:01:16 +0100
Subject: [PATCH] writeback: Put unused inodes to LRU after writeback completion

Commit 169ebd90 removed iget-iput pair from inode writeback. As a side effect,
inodes that are dirty during iput_final() call won't be ever added to inode LRU
(iput_final() doesn't add dirty inodes to LRU and later when the inode is
cleaned there's noone to add the inode there). Thus inodes are effectively
unreclaimable until someone looks them up again.

Practical effect of this bug is limited by the fact that inodes are
pinned by a dentry for long enough that the inode gets cleaned. But still
the bug can have nasty consequences leading up to OOM conditions under
certain circumstances. Following can easily reproduce the problem:

for (( i = 0; i < 1000; i++ )); do
  mkdir $i
  for (( j = 0; j < 1000; j++ )); do
    touch $i/$j
    echo 2 > /proc/sys/vm/drop_caches
  done
done

then one needs to run 'sync; ls -lR' to make inodes reclaimable again.

We fix the issue by inserting unused clean inodes into the LRU after writeback
finishes in inode_sync_complete().

CC: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |    3 +++
 fs/inode.c        |    2 +-
 fs/internal.h     |    1 +
 3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 51ea267..ed7613b 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -227,6 +227,9 @@ static void requeue_io(struct inode *inode, struct bdi_writeback *wb)
 
 static void inode_sync_complete(struct inode *inode)
 {
+	/* If inode is clean an unused, put it into LRU now.  */
+	if (!(inode->i_state & I_DIRTY) && !atomic_read(&inode->i_count))
+		inode_lru_list_add(inode);
 	inode->i_state &= ~I_SYNC;
 	/* Waiters must see I_SYNC cleared before being woken up */
 	smp_mb();
diff --git a/fs/inode.c b/fs/inode.c
index b03c719..275e447 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -397,7 +397,7 @@ void ihold(struct inode *inode)
 }
 EXPORT_SYMBOL(ihold);
 
-static void inode_lru_list_add(struct inode *inode)
+void inode_lru_list_add(struct inode *inode)
 {
 	spin_lock(&inode->i_sb->s_inode_lru_lock);
 	if (list_empty(&inode->i_lru)) {
diff --git a/fs/internal.h b/fs/internal.h
index 916b7cb..3ecf43d 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -110,6 +110,7 @@ extern int open_check_o_direct(struct file *f);
  * inode.c
  */
 extern spinlock_t inode_sb_list_lock;
+extern void inode_lru_list_add(struct inode *inode);
 
 /*
  * fs-writeback.c
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-19 19:41     ` Jan Kara
@ 2012-11-19 20:51       ` OGAWA Hirofumi
  2012-11-19 21:24         ` Jan Kara
  2012-11-20 22:37       ` Dave Chinner
  1 sibling, 1 reply; 15+ messages in thread
From: OGAWA Hirofumi @ 2012-11-19 20:51 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, linux-kernel

Jan Kara <jack@suse.cz> writes:

>>From 4fdc5d9a66dfe0286ef4f4a7f53fd3b15086470f Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Mon, 19 Nov 2012 20:01:16 +0100
> Subject: [PATCH] writeback: Put unused inodes to LRU after writeback completion
>
> Commit 169ebd90 removed iget-iput pair from inode writeback. As a side effect,
> inodes that are dirty during iput_final() call won't be ever added to inode LRU
> (iput_final() doesn't add dirty inodes to LRU and later when the inode is
> cleaned there's noone to add the inode there). Thus inodes are effectively
> unreclaimable until someone looks them up again.
>
> Practical effect of this bug is limited by the fact that inodes are
> pinned by a dentry for long enough that the inode gets cleaned. But still
> the bug can have nasty consequences leading up to OOM conditions under
> certain circumstances. Following can easily reproduce the problem:
>
> for (( i = 0; i < 1000; i++ )); do
>   mkdir $i
>   for (( j = 0; j < 1000; j++ )); do
>     touch $i/$j
>     echo 2 > /proc/sys/vm/drop_caches
>   done
> done
>
> then one needs to run 'sync; ls -lR' to make inodes reclaimable again.
>
> We fix the issue by inserting unused clean inodes into the LRU after writeback
> finishes in inode_sync_complete().
>
> CC: Al Viro <viro@zeniv.linux.org.uk>
> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> Signed-off-by: Jan Kara <jack@suse.cz>

Need to Cc to stable@

> ---
>  fs/fs-writeback.c |    3 +++
>  fs/inode.c        |    2 +-
>  fs/internal.h     |    1 +
>  3 files changed, 5 insertions(+), 1 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 51ea267..ed7613b 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -227,6 +227,9 @@ static void requeue_io(struct inode *inode, struct bdi_writeback *wb)
>  
>  static void inode_sync_complete(struct inode *inode)
>  {
> +	/* If inode is clean an unused, put it into LRU now.  */
> +	if (!(inode->i_state & I_DIRTY) && !atomic_read(&inode->i_count))
> +		inode_lru_list_add(inode);

IMHO, open coding this would be bad idea. And another one is
I_REFERENCED. We really want to remove I_REFERENCED?

Thanks.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-19 20:51       ` OGAWA Hirofumi
@ 2012-11-19 21:24         ` Jan Kara
  2012-11-19 21:53           ` OGAWA Hirofumi
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2012-11-19 21:24 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Jan Kara, Al Viro, linux-fsdevel, linux-kernel

On Tue 20-11-12 05:51:15, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >>From 4fdc5d9a66dfe0286ef4f4a7f53fd3b15086470f Mon Sep 17 00:00:00 2001
> > From: Jan Kara <jack@suse.cz>
> > Date: Mon, 19 Nov 2012 20:01:16 +0100
> > Subject: [PATCH] writeback: Put unused inodes to LRU after writeback completion
> >
> > Commit 169ebd90 removed iget-iput pair from inode writeback. As a side effect,
> > inodes that are dirty during iput_final() call won't be ever added to inode LRU
> > (iput_final() doesn't add dirty inodes to LRU and later when the inode is
> > cleaned there's noone to add the inode there). Thus inodes are effectively
> > unreclaimable until someone looks them up again.
> >
> > Practical effect of this bug is limited by the fact that inodes are
> > pinned by a dentry for long enough that the inode gets cleaned. But still
> > the bug can have nasty consequences leading up to OOM conditions under
> > certain circumstances. Following can easily reproduce the problem:
> >
> > for (( i = 0; i < 1000; i++ )); do
> >   mkdir $i
> >   for (( j = 0; j < 1000; j++ )); do
> >     touch $i/$j
> >     echo 2 > /proc/sys/vm/drop_caches
> >   done
> > done
> >
> > then one needs to run 'sync; ls -lR' to make inodes reclaimable again.
> >
> > We fix the issue by inserting unused clean inodes into the LRU after writeback
> > finishes in inode_sync_complete().
> >
> > CC: Al Viro <viro@zeniv.linux.org.uk>
> > Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> Need to Cc to stable@
  Yup, I'll add that. Thanks for reminder.

> > ---
> >  fs/fs-writeback.c |    3 +++
> >  fs/inode.c        |    2 +-
> >  fs/internal.h     |    1 +
> >  3 files changed, 5 insertions(+), 1 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 51ea267..ed7613b 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -227,6 +227,9 @@ static void requeue_io(struct inode *inode, struct bdi_writeback *wb)
> >  
> >  static void inode_sync_complete(struct inode *inode)
> >  {
> > +	/* If inode is clean an unused, put it into LRU now.  */
> > +	if (!(inode->i_state & I_DIRTY) && !atomic_read(&inode->i_count))
> > +		inode_lru_list_add(inode);
> 
> IMHO, open coding this would be bad idea.
  Do you mean creating a separate function for the above two lines?

> And another one is I_REFERENCED. We really want to remove I_REFERENCED?
  We don't want I_REFERENCED set - noone used the inode. But looking into
the code with fresh eyes, the fix isn't as simple as I thought. First I
need to check MS_ACTIVE and second I need to check I_FREEING... So the
condition will be complex enough to warrant a separate function.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-19 21:24         ` Jan Kara
@ 2012-11-19 21:53           ` OGAWA Hirofumi
  2012-11-21  1:11             ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: OGAWA Hirofumi @ 2012-11-19 21:53 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, linux-fsdevel, linux-kernel

Jan Kara <jack@suse.cz> writes:

>> >  static void inode_sync_complete(struct inode *inode)
>> >  {
>> > +	/* If inode is clean an unused, put it into LRU now.  */
>> > +	if (!(inode->i_state & I_DIRTY) && !atomic_read(&inode->i_count))
>> > +		inode_lru_list_add(inode);
>> 
>> IMHO, open coding this would be bad idea.
>   Do you mean creating a separate function for the above two lines?

Yes. And the intent is to consolidate "when adds inode to LRU" with
iput_final()'s one.

>> And another one is I_REFERENCED. We really want to remove I_REFERENCED?
>   We don't want I_REFERENCED set - noone used the inode. But looking into
> the code with fresh eyes, the fix isn't as simple as I thought. First I
> need to check MS_ACTIVE and second I need to check I_FREEING... So the
> condition will be complex enough to warrant a separate function.

I can't see the issue (sync_filesystem() will wait I_DIRTY before
MS_ACTIVE, and I_DIRTY prevents I_FREEING) though, it may be possible.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-19 19:41     ` Jan Kara
  2012-11-19 20:51       ` OGAWA Hirofumi
@ 2012-11-20 22:37       ` Dave Chinner
  2012-11-21  1:30         ` Jan Kara
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2012-11-20 22:37 UTC (permalink / raw)
  To: Jan Kara; +Cc: OGAWA Hirofumi, Al Viro, linux-fsdevel, linux-kernel

On Mon, Nov 19, 2012 at 08:41:02PM +0100, Jan Kara wrote:
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 51ea267..ed7613b 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -227,6 +227,9 @@ static void requeue_io(struct inode *inode, struct bdi_writeback *wb)
>  
>  static void inode_sync_complete(struct inode *inode)
>  {
> +	/* If inode is clean an unused, put it into LRU now.  */
> +	if (!(inode->i_state & I_DIRTY) && !atomic_read(&inode->i_count))
> +		inode_lru_list_add(inode);

Unsafe. Has to avoid I_FREEING|I_WILL_FREE (see warnings at start of
writeback_single_inode()).

Also, if you do this after clearing the I_SYNC flag, you can use
exactly the same logic as iput_final(). A helper function could be
used for both cases at that point, and....

> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -397,7 +397,7 @@ void ihold(struct inode *inode)
>  }
>  EXPORT_SYMBOL(ihold);
>  
> -static void inode_lru_list_add(struct inode *inode)
> +void inode_lru_list_add(struct inode *inode)

.... the inode lru list function can stay static.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-19 21:53           ` OGAWA Hirofumi
@ 2012-11-21  1:11             ` Jan Kara
  2012-11-21  1:48               ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2012-11-21  1:11 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Jan Kara, Al Viro, linux-fsdevel, linux-kernel

On Tue 20-11-12 06:53:12, OGAWA Hirofumi wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> >> >  static void inode_sync_complete(struct inode *inode)
> >> >  {
> >> > +	/* If inode is clean an unused, put it into LRU now.  */
> >> > +	if (!(inode->i_state & I_DIRTY) && !atomic_read(&inode->i_count))
> >> > +		inode_lru_list_add(inode);
> >> 
> >> IMHO, open coding this would be bad idea.
> >   Do you mean creating a separate function for the above two lines?
> 
> Yes. And the intent is to consolidate "when adds inode to LRU" with
> iput_final()'s one.
> 
> >> And another one is I_REFERENCED. We really want to remove I_REFERENCED?
> >   We don't want I_REFERENCED set - noone used the inode. But looking into
> > the code with fresh eyes, the fix isn't as simple as I thought. First I
> > need to check MS_ACTIVE and second I need to check I_FREEING... So the
> > condition will be complex enough to warrant a separate function.
> 
> I can't see the issue (sync_filesystem() will wait I_DIRTY before
> MS_ACTIVE, and I_DIRTY prevents I_FREEING) though, it may be possible.
  E.g. when inode is deleted it can be both I_DIRTY (and flusher thread
can be working on it) while it is also marked as I_FREEING. In such case we
must avoid adding the inode to the LRU.

  Regarding MS_ACTIVE - you are right that sync_filesystem() should clean
all dirty inodes but some filesystems dirty their internal inodes during
umount so it's better to make flusher thread safe and not add such inodes
to the LRU during umount.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-20 22:37       ` Dave Chinner
@ 2012-11-21  1:30         ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2012-11-21  1:30 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, OGAWA Hirofumi, Al Viro, linux-fsdevel, linux-kernel

On Wed 21-11-12 09:37:41, Dave Chinner wrote:
> On Mon, Nov 19, 2012 at 08:41:02PM +0100, Jan Kara wrote:
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 51ea267..ed7613b 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -227,6 +227,9 @@ static void requeue_io(struct inode *inode, struct bdi_writeback *wb)
> >  
> >  static void inode_sync_complete(struct inode *inode)
> >  {
> > +	/* If inode is clean an unused, put it into LRU now.  */
> > +	if (!(inode->i_state & I_DIRTY) && !atomic_read(&inode->i_count))
> > +		inode_lru_list_add(inode);
> 
> Unsafe. Has to avoid I_FREEING|I_WILL_FREE (see warnings at start of
> writeback_single_inode()).
> 
> Also, if you do this after clearing the I_SYNC flag, you can use
> exactly the same logic as iput_final(). A helper function could be
> used for both cases at that point, and....
  Agreed. That's what I do in the current version of the patch (not posted
yet).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-21  1:11             ` Jan Kara
@ 2012-11-21  1:48               ` Jan Kara
  2012-11-21  2:44                 ` Dave Chinner
  2012-11-21  8:05                 ` Andrew Morton
  0 siblings, 2 replies; 15+ messages in thread
From: Jan Kara @ 2012-11-21  1:48 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: Jan Kara, Al Viro, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1714 bytes --]

On Wed 21-11-12 02:11:11, Jan Kara wrote:
> On Tue 20-11-12 06:53:12, OGAWA Hirofumi wrote:
> > Jan Kara <jack@suse.cz> writes:
> > 
> > >> >  static void inode_sync_complete(struct inode *inode)
> > >> >  {
> > >> > +	/* If inode is clean an unused, put it into LRU now.  */
> > >> > +	if (!(inode->i_state & I_DIRTY) && !atomic_read(&inode->i_count))
> > >> > +		inode_lru_list_add(inode);
> > >> 
> > >> IMHO, open coding this would be bad idea.
> > >   Do you mean creating a separate function for the above two lines?
> > 
> > Yes. And the intent is to consolidate "when adds inode to LRU" with
> > iput_final()'s one.
> > 
> > >> And another one is I_REFERENCED. We really want to remove I_REFERENCED?
> > >   We don't want I_REFERENCED set - noone used the inode. But looking into
> > > the code with fresh eyes, the fix isn't as simple as I thought. First I
> > > need to check MS_ACTIVE and second I need to check I_FREEING... So the
> > > condition will be complex enough to warrant a separate function.
> > 
> > I can't see the issue (sync_filesystem() will wait I_DIRTY before
> > MS_ACTIVE, and I_DIRTY prevents I_FREEING) though, it may be possible.
>   E.g. when inode is deleted it can be both I_DIRTY (and flusher thread
> can be working on it) while it is also marked as I_FREEING. In such case we
> must avoid adding the inode to the LRU.
> 
>   Regarding MS_ACTIVE - you are right that sync_filesystem() should clean
> all dirty inodes but some filesystems dirty their internal inodes during
> umount so it's better to make flusher thread safe and not add such inodes
> to the LRU during umount.
  Here's the patch I currently have BTW.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-writeback-Put-unused-inodes-to-LRU-after-writeback-c.patch --]
[-- Type: text/x-patch, Size: 3342 bytes --]

>From 00c9878ec690bb8e493582f0109e9aa6ee734ecb Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Mon, 19 Nov 2012 20:01:16 +0100
Subject: [PATCH v2] writeback: Put unused inodes to LRU after writeback completion

Commit 169ebd90 removed iget-iput pair from inode writeback. As a side effect,
inodes that are dirty during iput_final() call won't be ever added to inode LRU
(iput_final() doesn't add dirty inodes to LRU and later when the inode is
cleaned there's noone to add the inode there). Thus inodes are effectively
unreclaimable until someone looks them up again.

Practical effect of this bug is limited by the fact that inodes are
pinned by a dentry for long enough that the inode gets cleaned. But still
the bug can have nasty consequences leading up to OOM conditions under
certain circumstances. Following can easily reproduce the problem:

for (( i = 0; i < 1000; i++ )); do
  mkdir $i
  for (( j = 0; j < 1000; j++ )); do
    touch $i/$j
    echo 2 > /proc/sys/vm/drop_caches
  done
done

then one needs to run 'sync; ls -lR' to make inodes reclaimable again.

We fix the issue by inserting unused clean inodes into the LRU after writeback
finishes in inode_sync_complete().

CC: Al Viro <viro@zeniv.linux.org.uk>
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
CC: stable@vger.kernel.org # >= 3.5
Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/fs-writeback.c |    2 ++
 fs/inode.c        |   16 ++++++++++++++--
 fs/internal.h     |    1 +
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 51ea267..3e3422f 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -228,6 +228,8 @@ static void requeue_io(struct inode *inode, struct bdi_writeback *wb)
 static void inode_sync_complete(struct inode *inode)
 {
 	inode->i_state &= ~I_SYNC;
+	/* If inode is clean an unused, put it into LRU now... */
+	inode_add_lru(inode);
 	/* Waiters must see I_SYNC cleared before being woken up */
 	smp_mb();
 	wake_up_bit(&inode->i_state, __I_SYNC);
diff --git a/fs/inode.c b/fs/inode.c
index b03c719..8f6396f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -408,6 +408,19 @@ static void inode_lru_list_add(struct inode *inode)
 	spin_unlock(&inode->i_sb->s_inode_lru_lock);
 }
 
+/*
+ * Add inode to LRU if needed (inode is unused and clean).
+ *
+ * Needs inode->i_lock held.
+ */
+void inode_add_lru(struct inode *inode)
+{
+	if (!(inode->i_state & (I_DIRTY | I_FREEING | I_SYNC)) &&
+	    !atomic_read(&inode->i_count) && inode->i_sb->s_flags & MS_ACTIVE)
+		inode_lru_list_add(inode);
+}
+
+
 static void inode_lru_list_del(struct inode *inode)
 {
 	spin_lock(&inode->i_sb->s_inode_lru_lock);
@@ -1390,8 +1403,7 @@ static void iput_final(struct inode *inode)
 
 	if (!drop && (sb->s_flags & MS_ACTIVE)) {
 		inode->i_state |= I_REFERENCED;
-		if (!(inode->i_state & (I_DIRTY|I_SYNC)))
-			inode_lru_list_add(inode);
+		inode_add_lru(inode);
 		spin_unlock(&inode->i_lock);
 		return;
 	}
diff --git a/fs/internal.h b/fs/internal.h
index 916b7cb..2f6af7f 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -110,6 +110,7 @@ extern int open_check_o_direct(struct file *f);
  * inode.c
  */
 extern spinlock_t inode_sb_list_lock;
+extern void inode_add_lru(struct inode *inode);
 
 /*
  * fs-writeback.c
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-21  1:48               ` Jan Kara
@ 2012-11-21  2:44                 ` Dave Chinner
  2012-11-21 17:08                   ` Jan Kara
  2012-11-21  8:05                 ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2012-11-21  2:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: OGAWA Hirofumi, Al Viro, linux-fsdevel, linux-kernel

On Wed, Nov 21, 2012 at 02:48:51AM +0100, Jan Kara wrote:
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 51ea267..3e3422f 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -228,6 +228,8 @@ static void requeue_io(struct inode *inode, struct bdi_writeback *wb)
>  static void inode_sync_complete(struct inode *inode)
>  {
>  	inode->i_state &= ~I_SYNC;
> +	/* If inode is clean an unused, put it into LRU now... */
> +	inode_add_lru(inode);
>  	/* Waiters must see I_SYNC cleared before being woken up */
>  	smp_mb();
>  	wake_up_bit(&inode->i_state, __I_SYNC);
> diff --git a/fs/inode.c b/fs/inode.c
> index b03c719..8f6396f 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -408,6 +408,19 @@ static void inode_lru_list_add(struct inode *inode)
>  	spin_unlock(&inode->i_sb->s_inode_lru_lock);
>  }
>  
> +/*
> + * Add inode to LRU if needed (inode is unused and clean).
> + *
> + * Needs inode->i_lock held.
> + */
> +void inode_add_lru(struct inode *inode)
> +{
> +	if (!(inode->i_state & (I_DIRTY | I_FREEING | I_SYNC)) &&
> +	    !atomic_read(&inode->i_count) && inode->i_sb->s_flags & MS_ACTIVE)
> +		inode_lru_list_add(inode);

Needs to avoid I_WILL_FREE as well. There's no point putting it on
the LRU if we are writing from iput_final()....

Otherwise looks OK.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-21  1:48               ` Jan Kara
  2012-11-21  2:44                 ` Dave Chinner
@ 2012-11-21  8:05                 ` Andrew Morton
  2012-11-21  8:22                   ` Dave Chinner
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2012-11-21  8:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: OGAWA Hirofumi, Al Viro, linux-fsdevel, linux-kernel

On Wed, 21 Nov 2012 02:48:51 +0100 Jan Kara <jack@suse.cz> wrote:

> +/*
> + * Add inode to LRU if needed (inode is unused and clean).
> + *
> + * Needs inode->i_lock held.
> + */
> +void inode_add_lru(struct inode *inode)
> +{
> +	if (!(inode->i_state & (I_DIRTY | I_FREEING | I_SYNC)) &&
> +	    !atomic_read(&inode->i_count) && inode->i_sb->s_flags & MS_ACTIVE)
> +		inode_lru_list_add(inode);
> +}

Is i_lock sufficient to stabilise i_count?

<looks at the code a bit>

Is evict_inodes() wrong to test i_count outside i_lock?

invalidate_inodes() looks better.

can_unuse() must be called under i_lock, and is.  Apparently this
requirement was sufficiently obvious to not meed documenting.

prune_icache_sb() gets it right.

iput() gets it right.

So to answer my own question: yes, it is sufficient.  But a) the
comment for inode.i_lock is out of date and b) evict_inodes() looks
fishy.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-21  8:05                 ` Andrew Morton
@ 2012-11-21  8:22                   ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2012-11-21  8:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jan Kara, OGAWA Hirofumi, Al Viro, linux-fsdevel, linux-kernel

On Wed, Nov 21, 2012 at 12:05:33AM -0800, Andrew Morton wrote:
> On Wed, 21 Nov 2012 02:48:51 +0100 Jan Kara <jack@suse.cz> wrote:
> 
> > +/*
> > + * Add inode to LRU if needed (inode is unused and clean).
> > + *
> > + * Needs inode->i_lock held.
> > + */
> > +void inode_add_lru(struct inode *inode)
> > +{
> > +	if (!(inode->i_state & (I_DIRTY | I_FREEING | I_SYNC)) &&
> > +	    !atomic_read(&inode->i_count) && inode->i_sb->s_flags & MS_ACTIVE)
> > +		inode_lru_list_add(inode);
> > +}
> 
> Is i_lock sufficient to stabilise i_count?
> 
> <looks at the code a bit>
> 
> Is evict_inodes() wrong to test i_count outside i_lock?
> 
> invalidate_inodes() looks better.
> 
> can_unuse() must be called under i_lock, and is.  Apparently this
> requirement was sufficiently obvious to not meed documenting.

It is documented. can_unuse looks at i_state and i_count, and both
are documented as requiring the i_lock at the top of the file in
the locking rules section. Also, see __iget(), also mentioned in
the locking rules....

> prune_icache_sb() gets it right.
> 
> iput() gets it right.
> 
> So to answer my own question: yes, it is sufficient.  But a) the
> comment for inode.i_lock is out of date

If you means the one in fs.h, then yeah, it's way out of date....
>
> and b) evict_inodes() looks
> fishy.

As I understand it, evict_inodes() is special - it's only called
from generic_shutdown_super() after the MS_ACTIVE flag has been
removed from the filesytem, the dcache has been pruned and all the
inodes cleaned. So there should be no new references to the inodes
occurring, and hence we don't need to hold the lock to serialise
against new references being taken....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: The bug of iput() removal from flusher thread?
  2012-11-21  2:44                 ` Dave Chinner
@ 2012-11-21 17:08                   ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2012-11-21 17:08 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Jan Kara, OGAWA Hirofumi, Al Viro, linux-fsdevel, linux-kernel

On Wed 21-11-12 13:44:05, Dave Chinner wrote:
> On Wed, Nov 21, 2012 at 02:48:51AM +0100, Jan Kara wrote:
> > 
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 51ea267..3e3422f 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -228,6 +228,8 @@ static void requeue_io(struct inode *inode, struct bdi_writeback *wb)
> >  static void inode_sync_complete(struct inode *inode)
> >  {
> >  	inode->i_state &= ~I_SYNC;
> > +	/* If inode is clean an unused, put it into LRU now... */
> > +	inode_add_lru(inode);
> >  	/* Waiters must see I_SYNC cleared before being woken up */
> >  	smp_mb();
> >  	wake_up_bit(&inode->i_state, __I_SYNC);
> > diff --git a/fs/inode.c b/fs/inode.c
> > index b03c719..8f6396f 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -408,6 +408,19 @@ static void inode_lru_list_add(struct inode *inode)
> >  	spin_unlock(&inode->i_sb->s_inode_lru_lock);
> >  }
> >  
> > +/*
> > + * Add inode to LRU if needed (inode is unused and clean).
> > + *
> > + * Needs inode->i_lock held.
> > + */
> > +void inode_add_lru(struct inode *inode)
> > +{
> > +	if (!(inode->i_state & (I_DIRTY | I_FREEING | I_SYNC)) &&
> > +	    !atomic_read(&inode->i_count) && inode->i_sb->s_flags & MS_ACTIVE)
> > +		inode_lru_list_add(inode);
> 
> Needs to avoid I_WILL_FREE as well. There's no point putting it on
> the LRU if we are writing from iput_final()....
  Yeah, it won't cause any real problems but you are right it's not useful
to put such inodes into the LRU. I'll send v3 in a moment, now as a proper
patch submission since we seem to agree...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-11-21 17:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-17  8:42 The bug of iput() removal from flusher thread? OGAWA Hirofumi
2012-11-19  8:56 ` OGAWA Hirofumi
2012-11-19 14:51   ` Jan Kara
2012-11-19 19:41     ` Jan Kara
2012-11-19 20:51       ` OGAWA Hirofumi
2012-11-19 21:24         ` Jan Kara
2012-11-19 21:53           ` OGAWA Hirofumi
2012-11-21  1:11             ` Jan Kara
2012-11-21  1:48               ` Jan Kara
2012-11-21  2:44                 ` Dave Chinner
2012-11-21 17:08                   ` Jan Kara
2012-11-21  8:05                 ` Andrew Morton
2012-11-21  8:22                   ` Dave Chinner
2012-11-20 22:37       ` Dave Chinner
2012-11-21  1:30         ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).