linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Space leak in f2fs
@ 2015-05-13  7:17 hujianyang
  2015-05-13 17:46 ` [f2fs-dev] " Jaegeuk Kim
  0 siblings, 1 reply; 11+ messages in thread
From: hujianyang @ 2015-05-13  7:17 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: Chao Yu, cm224.lee, linux-f2fs-devel, linux-fsdevel

Hi Jaegeuk,

I found a space leak problem in f2fs. This problem could lead to
ENOSPC error during stress tests, e.g. ltp.

<<<test_output>>>
growfiles(gf15): 11656 growfiles.c/2249: 16920 tlibio.c/739 write(6, buf, 1352) ret:-1, errno=28 No space left on device
gf15        1  TFAIL  :  growfiles.c:132: Test failed

...

And can be reproduced by these steps whether background_gc is on
or not:

1) format a 4GB f2fs partition
2) dd a 3G file,
3) unlink it.

Do these steps again and again. Soon, after one unlink operation,
you can see the space of the 3G file is not free.

Fs-Server:/mnt/f2fs # df .
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/sdd3        4193280 301064   3854328   8% /mnt/f2fs
Fs-Server:/mnt/f2fs # dd if=/dev/zero of=./test bs=1M count=3072
3072+0 records in
3072+0 records out
3221225472 bytes (3.2 GB) copied, 3.1892 s, 1.0 GB/s
Fs-Server:/mnt/f2fs # unlink ./test
Fs-Server:/mnt/f2fs # dd if=/dev/zero of=./test bs=1M count=3072
3072+0 records in
3072+0 records out
3221225472 bytes (3.2 GB) copied, 3.44288 s, 936 MB/s
Fs-Server:/mnt/f2fs # unlink ./test
Fs-Server:/mnt/f2fs # df .
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sdd3        4193280 3449888    705504  84% /mnt/f2fs
Fs-Server:/mnt/f2fs # ls
Fs-Server:/mnt/f2fs # ls
Fs-Server:/mnt/f2fs # df .
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sdd3        4193280 3449888    705504  84% /mnt/f2fs
Fs-Server:/mnt/f2fs # dd if=/dev/zero of=./test bs=1M count=3072
dd: writing `./test': No space left on device
689+0 records in
688+0 records out
721719296 bytes (722 MB) copied, 0.618972 s, 1.2 GB/s
Fs-Server:/mnt/f2fs # df .
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sdd3        4193280 4155392         0 100% /mnt/f2fs
Fs-Server:/mnt/f2fs # ls -l test
-rw-r--r-- 1 root root 721719296 May 13 14:52 test


We can reuse the leaking space after a sync call:


Fs-Server:/mnt/f2fs # df .
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sdd3        4193280 4155392         0 100% /mnt/f2fs
Fs-Server:/mnt/f2fs # sync
Fs-Server:/mnt/f2fs # df .
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/sdd3        4193280 1006568   3148824  25% /mnt/f2fs


I found this may caused by .drop_inode in f2fs. see f2fs_drop_inode()

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 19438f2..7646d2a 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -424,15 +424,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)

 static int f2fs_drop_inode(struct inode *inode)
 {
-       /*
-        * This is to avoid a deadlock condition like below.
-        * writeback_single_inode(inode)
-        *  - f2fs_write_data_page
-        *    - f2fs_gc -> iput -> evict
-        *       - inode_wait_for_writeback(inode)
-        */
-       if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
-               return 0;
        return generic_drop_inode(inode);
 }

After removing these code, this problem is fixed. But this function is
introduced by commit 531ad7d58c6476c5856653448b4c7d26427502b4 to fix
a deadlock problem.

I wish you and other developers in this list could help me to fix this
problem in a correct way.

Thanks,
Hu


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [f2fs-dev] Space leak in f2fs
  2015-05-13  7:17 Space leak in f2fs hujianyang
@ 2015-05-13 17:46 ` Jaegeuk Kim
  2015-05-14  0:24   ` Jaegeuk Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Jaegeuk Kim @ 2015-05-13 17:46 UTC (permalink / raw)
  To: hujianyang; +Cc: linux-fsdevel, linux-f2fs-devel

Hi Hu,

Thank you for the report.

When I tried to reproduce this, it seems that this can occur under somewhat
stressful condition. I could't reach out to this problem.

Nevertheless, I think this is a possible scenario, so I wrote a patch for this.
Could you test this patch?

Thanks,

---
 fs/f2fs/super.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 19438f2..4593cd1 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -431,8 +431,26 @@ static int f2fs_drop_inode(struct inode *inode)
 	 *    - f2fs_gc -> iput -> evict
 	 *       - inode_wait_for_writeback(inode)
 	 */
-	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
+	if (!inode_unhashed(inode) && inode->i_state & I_SYNC) {
+		if (!inode->i_nlink && !is_bad_inode(inode)) {
+			spin_unlock(&inode->i_lock);
+
+			i_size_write(inode, 0);
+
+			if (F2FS_HAS_BLOCKS(inode))
+				f2fs_truncate(inode);
+
+			f2fs_lock_op(F2FS_I_SB(inode));
+			remove_inode_page(inode);
+			f2fs_unlock_op(F2FS_I_SB(inode));
+
+			/* avoid any write_inode call */
+			clear_inode_flag(F2FS_I(inode), FI_DIRTY_INODE);
+
+			spin_lock(&inode->i_lock);
+		}
 		return 0;
+	}
 	return generic_drop_inode(inode);
 }
 
-- 
2.1.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Space leak in f2fs
  2015-05-13 17:46 ` [f2fs-dev] " Jaegeuk Kim
@ 2015-05-14  0:24   ` Jaegeuk Kim
  2015-05-14  1:40     ` hujianyang
  0 siblings, 1 reply; 11+ messages in thread
From: Jaegeuk Kim @ 2015-05-14  0:24 UTC (permalink / raw)
  To: hujianyang; +Cc: linux-fsdevel, linux-f2fs-devel

Hi Hu,

Found a bug in the previous patch.
Could you check this out?

Thanks,

---
 fs/f2fs/super.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 19438f2..647591b 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -431,8 +431,17 @@ static int f2fs_drop_inode(struct inode *inode)
 	 *    - f2fs_gc -> iput -> evict
 	 *       - inode_wait_for_writeback(inode)
 	 */
-	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
+	if (!inode_unhashed(inode) && inode->i_state & I_SYNC) {
+		if (!inode->i_nlink && !is_bad_inode(inode)) {
+			spin_unlock(&inode->i_lock);
+			i_size_write(inode, 0);
+
+			if (F2FS_HAS_BLOCKS(inode))
+				f2fs_truncate(inode);
+			spin_lock(&inode->i_lock);
+		}
 		return 0;
+	}
 	return generic_drop_inode(inode);
 }
 
-- 
2.1.1


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Space leak in f2fs
  2015-05-14  0:24   ` Jaegeuk Kim
@ 2015-05-14  1:40     ` hujianyang
  2015-05-14  1:45       ` [f2fs-dev] " Jaegeuk Kim
  2015-05-14 21:14       ` Jaegeuk Kim
  0 siblings, 2 replies; 11+ messages in thread
From: hujianyang @ 2015-05-14  1:40 UTC (permalink / raw)
  To: Jaegeuk Kim; +Cc: linux-fsdevel, linux-f2fs-devel

Hi Jaegeuk,

I've tested this patch. It's OK now. Seems this problem is fixed.

I'd like to push this patch to my local tree and run a formal
stress test next week. Will you push this patch to f2fs-dev branch?

If you have other modification to this fix, please let me know.

Thanks very much!
Hu

On 2015/5/14 8:24, Jaegeuk Kim wrote:
> Hi Hu,
> 
> Found a bug in the previous patch.
> Could you check this out?
> 
> Thanks,
> 
> ---
>  fs/f2fs/super.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 19438f2..647591b 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -431,8 +431,17 @@ static int f2fs_drop_inode(struct inode *inode)
>  	 *    - f2fs_gc -> iput -> evict
>  	 *       - inode_wait_for_writeback(inode)
>  	 */
> -	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
> +	if (!inode_unhashed(inode) && inode->i_state & I_SYNC) {
> +		if (!inode->i_nlink && !is_bad_inode(inode)) {
> +			spin_unlock(&inode->i_lock);
> +			i_size_write(inode, 0);
> +
> +			if (F2FS_HAS_BLOCKS(inode))
> +				f2fs_truncate(inode);
> +			spin_lock(&inode->i_lock);
> +		}
>  		return 0;
> +	}
>  	return generic_drop_inode(inode);
>  }
>  
> 



------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [f2fs-dev] Space leak in f2fs
  2015-05-14  1:40     ` hujianyang
@ 2015-05-14  1:45       ` Jaegeuk Kim
  2015-05-14 21:14       ` Jaegeuk Kim
  1 sibling, 0 replies; 11+ messages in thread
From: Jaegeuk Kim @ 2015-05-14  1:45 UTC (permalink / raw)
  To: hujianyang; +Cc: linux-fsdevel, linux-f2fs-devel

I'll push the patch into dev branch right now.
Let me know, if there is any problem.

Thanks,

On Thu, May 14, 2015 at 09:40:25AM +0800, hujianyang wrote:
> Hi Jaegeuk,
> 
> I've tested this patch. It's OK now. Seems this problem is fixed.
> 
> I'd like to push this patch to my local tree and run a formal
> stress test next week. Will you push this patch to f2fs-dev branch?
> 
> If you have other modification to this fix, please let me know.
> 
> Thanks very much!
> Hu
> 
> On 2015/5/14 8:24, Jaegeuk Kim wrote:
> > Hi Hu,
> > 
> > Found a bug in the previous patch.
> > Could you check this out?
> > 
> > Thanks,
> > 
> > ---
> >  fs/f2fs/super.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 19438f2..647591b 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -431,8 +431,17 @@ static int f2fs_drop_inode(struct inode *inode)
> >  	 *    - f2fs_gc -> iput -> evict
> >  	 *       - inode_wait_for_writeback(inode)
> >  	 */
> > -	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
> > +	if (!inode_unhashed(inode) && inode->i_state & I_SYNC) {
> > +		if (!inode->i_nlink && !is_bad_inode(inode)) {
> > +			spin_unlock(&inode->i_lock);
> > +			i_size_write(inode, 0);
> > +
> > +			if (F2FS_HAS_BLOCKS(inode))
> > +				f2fs_truncate(inode);
> > +			spin_lock(&inode->i_lock);
> > +		}
> >  		return 0;
> > +	}
> >  	return generic_drop_inode(inode);
> >  }
> >  
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [f2fs-dev] Space leak in f2fs
  2015-05-14  1:40     ` hujianyang
  2015-05-14  1:45       ` [f2fs-dev] " Jaegeuk Kim
@ 2015-05-14 21:14       ` Jaegeuk Kim
  2015-05-15  8:31         ` Chao Yu
  1 sibling, 1 reply; 11+ messages in thread
From: Jaegeuk Kim @ 2015-05-14 21:14 UTC (permalink / raw)
  To: hujianyang; +Cc: linux-fsdevel, linux-f2fs-devel

Hi Hu,

I've been rethinking about whole this issue differently.
And, now I'm starting to test with the below patch instead of previous one.

Thanks,

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
---
 fs/f2fs/checkpoint.c | 19 +++++++++++++++++++
 fs/f2fs/data.c       |  4 ++++
 fs/f2fs/f2fs.h       |  1 +
 fs/f2fs/super.c      | 15 ---------------
 4 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 7b7a9d8..74875fb 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
 	spin_unlock(&im->ino_lock);
 }
 
+static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
+{
+	struct inode_management *im = &sbi->im[type];
+	struct ino_entry *e;
+	bool exist = false;
+
+	spin_lock(&im->ino_lock);
+	e = radix_tree_lookup(&im->ino_root, ino);
+	if (e)
+		exist = true;
+	spin_unlock(&im->ino_lock);
+	return exist;
+}
+
 void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
 {
 	/* add new dirty ino entry into list */
@@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 	__remove_ino_entry(sbi, ino, ORPHAN_INO);
 }
 
+bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
+{
+	return __exist_ino_entry(sbi, ino, ORPHAN_INO);
+}
+
 static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 {
 	struct inode *inode = f2fs_iget(sbi->sb, ino);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b0cc2aa..1988f5f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1749,6 +1749,10 @@ write:
 		goto out;
 	}
 
+	/* if orphan inode, we don't need to write its data */
+	if (is_orphan_inode(sbi, inode->i_ino))
+		goto out;
+
 	if (!wbc->for_reclaim)
 		need_balance_fs = true;
 	else if (has_not_enough_free_secs(sbi, 0))
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8f1f21a..697346a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1726,6 +1726,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *);
 void release_orphan_inode(struct f2fs_sb_info *);
 void add_orphan_inode(struct f2fs_sb_info *, nid_t);
 void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
+bool is_orphan_inode(struct f2fs_sb_info *, nid_t);
 void recover_orphan_inodes(struct f2fs_sb_info *);
 int get_valid_checkpoint(struct f2fs_sb_info *);
 void update_dirty_page(struct inode *, struct page *);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 19438f2..1d0973a 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -422,20 +422,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
 	return &fi->vfs_inode;
 }
 
-static int f2fs_drop_inode(struct inode *inode)
-{
-	/*
-	 * This is to avoid a deadlock condition like below.
-	 * writeback_single_inode(inode)
-	 *  - f2fs_write_data_page
-	 *    - f2fs_gc -> iput -> evict
-	 *       - inode_wait_for_writeback(inode)
-	 */
-	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
-		return 0;
-	return generic_drop_inode(inode);
-}
-
 /*
  * f2fs_dirty_inode() is called from __mark_inode_dirty()
  *
@@ -759,7 +745,6 @@ restore_opts:
 
 static struct super_operations f2fs_sops = {
 	.alloc_inode	= f2fs_alloc_inode,
-	.drop_inode	= f2fs_drop_inode,
 	.destroy_inode	= f2fs_destroy_inode,
 	.write_inode	= f2fs_write_inode,
 	.dirty_inode	= f2fs_dirty_inode,
-- 
2.1.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [f2fs-dev] Space leak in f2fs
  2015-05-14 21:14       ` Jaegeuk Kim
@ 2015-05-15  8:31         ` Chao Yu
  2015-05-16  0:55           ` Jaegeuk Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Yu @ 2015-05-15  8:31 UTC (permalink / raw)
  To: 'Jaegeuk Kim', 'hujianyang'
  Cc: linux-fsdevel, linux-f2fs-devel

Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Friday, May 15, 2015 5:14 AM
> To: hujianyang
> Cc: linux-fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] Space leak in f2fs
> 
> Hi Hu,
> 
> I've been rethinking about whole this issue differently.
> And, now I'm starting to test with the below patch instead of previous one.
> 
> Thanks,
> 
> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
> ---
>  fs/f2fs/checkpoint.c | 19 +++++++++++++++++++
>  fs/f2fs/data.c       |  4 ++++
>  fs/f2fs/f2fs.h       |  1 +
>  fs/f2fs/super.c      | 15 ---------------
>  4 files changed, 24 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 7b7a9d8..74875fb 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int
> type)
>  	spin_unlock(&im->ino_lock);
>  }
> 
> +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
> +{
> +	struct inode_management *im = &sbi->im[type];
> +	struct ino_entry *e;
> +	bool exist = false;
> +
> +	spin_lock(&im->ino_lock);
> +	e = radix_tree_lookup(&im->ino_root, ino);
> +	if (e)
> +		exist = true;
> +	spin_unlock(&im->ino_lock);
> +	return exist;
> +}
> +
>  void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
>  {
>  	/* add new dirty ino entry into list */
> @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
>  	__remove_ino_entry(sbi, ino, ORPHAN_INO);
>  }
> 
> +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> +{
> +	return __exist_ino_entry(sbi, ino, ORPHAN_INO);
> +}
> +
>  static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
>  {
>  	struct inode *inode = f2fs_iget(sbi->sb, ino);
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index b0cc2aa..1988f5f 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1749,6 +1749,10 @@ write:
>  		goto out;
>  	}
> 
> +	/* if orphan inode, we don't need to write its data */
> +	if (is_orphan_inode(sbi, inode->i_ino))
> +		goto out;

When user create a temp file by invoking open with O_TMPFILE flag,
in ->tmpfile our temp file will be added into orphan list as its
nlink is zero.

If we skip writting out data for this orphan inode, later, even though
we add nlink/directory entry for orphan inode by calling linkat,
our file will contain inconsistent data between in-memory and on-disk.

So how about considering for this case?

BTW, the previous fixing patch looks good to me.

Thanks,

> +
>  	if (!wbc->for_reclaim)
>  		need_balance_fs = true;
>  	else if (has_not_enough_free_secs(sbi, 0))
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 8f1f21a..697346a 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1726,6 +1726,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *);
>  void release_orphan_inode(struct f2fs_sb_info *);
>  void add_orphan_inode(struct f2fs_sb_info *, nid_t);
>  void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
> +bool is_orphan_inode(struct f2fs_sb_info *, nid_t);
>  void recover_orphan_inodes(struct f2fs_sb_info *);
>  int get_valid_checkpoint(struct f2fs_sb_info *);
>  void update_dirty_page(struct inode *, struct page *);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 19438f2..1d0973a 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -422,20 +422,6 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
>  	return &fi->vfs_inode;
>  }
> 
> -static int f2fs_drop_inode(struct inode *inode)
> -{
> -	/*
> -	 * This is to avoid a deadlock condition like below.
> -	 * writeback_single_inode(inode)
> -	 *  - f2fs_write_data_page
> -	 *    - f2fs_gc -> iput -> evict
> -	 *       - inode_wait_for_writeback(inode)
> -	 */
> -	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
> -		return 0;
> -	return generic_drop_inode(inode);
> -}
> -
>  /*
>   * f2fs_dirty_inode() is called from __mark_inode_dirty()
>   *
> @@ -759,7 +745,6 @@ restore_opts:
> 
>  static struct super_operations f2fs_sops = {
>  	.alloc_inode	= f2fs_alloc_inode,
> -	.drop_inode	= f2fs_drop_inode,
>  	.destroy_inode	= f2fs_destroy_inode,
>  	.write_inode	= f2fs_write_inode,
>  	.dirty_inode	= f2fs_dirty_inode,
> --
> 2.1.1
> 
> 
> 
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [f2fs-dev] Space leak in f2fs
  2015-05-15  8:31         ` Chao Yu
@ 2015-05-16  0:55           ` Jaegeuk Kim
  2015-05-18  2:43             ` Chao Yu
  0 siblings, 1 reply; 11+ messages in thread
From: Jaegeuk Kim @ 2015-05-16  0:55 UTC (permalink / raw)
  To: Chao Yu; +Cc: 'hujianyang', linux-fsdevel, linux-f2fs-devel

Hi Chao,

On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 

[snip]

> > +	/* if orphan inode, we don't need to write its data */
> > +	if (is_orphan_inode(sbi, inode->i_ino))
> > +		goto out;
> 
> When user create a temp file by invoking open with O_TMPFILE flag,
> in ->tmpfile our temp file will be added into orphan list as its
> nlink is zero.
> 
> If we skip writting out data for this orphan inode, later, even though
> we add nlink/directory entry for orphan inode by calling linkat,
> our file will contain inconsistent data between in-memory and on-disk.
> 
> So how about considering for this case?

Right.
How about the below patch?

> 
> BTW, the previous fixing patch looks good to me.

But, my new concern here is a memory pressure. If we do not drop the inode
when iput was called, we need to wait for another time slot to reclaim its
memory.

Thanks,

---
 fs/f2fs/checkpoint.c | 19 +++++++++++++++++++
 fs/f2fs/data.c       |  8 ++++++++
 fs/f2fs/dir.c        |  1 +
 fs/f2fs/f2fs.h       |  2 ++
 fs/f2fs/super.c      | 14 +++++++++++++-
 5 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 7b7a9d8..74875fb 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
 	spin_unlock(&im->ino_lock);
 }
 
+static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
+{
+	struct inode_management *im = &sbi->im[type];
+	struct ino_entry *e;
+	bool exist = false;
+
+	spin_lock(&im->ino_lock);
+	e = radix_tree_lookup(&im->ino_root, ino);
+	if (e)
+		exist = true;
+	spin_unlock(&im->ino_lock);
+	return exist;
+}
+
 void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
 {
 	/* add new dirty ino entry into list */
@@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 	__remove_ino_entry(sbi, ino, ORPHAN_INO);
 }
 
+bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
+{
+	return __exist_ino_entry(sbi, ino, ORPHAN_INO);
+}
+
 static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
 {
 	struct inode *inode = f2fs_iget(sbi->sb, ino);
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b0cc2aa..d883c14 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1749,6 +1749,14 @@ write:
 		goto out;
 	}
 
+	/*
+	 * if orphan inode, we don't need to write its data,
+	 * but, tmpfile is not the case.
+	 */
+	if (is_orphan_inode(sbi, inode->i_ino) &&
+			!is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE))
+		goto out;
+
 	if (!wbc->for_reclaim)
 		need_balance_fs = true;
 	else if (has_not_enough_free_secs(sbi, 0))
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 3e92376..a2ea1b9 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir)
 	update_inode(inode, page);
 	f2fs_put_page(page, 1);
 
+	set_inode_flag(F2FS_I(inode), FI_TMP_INODE);
 	clear_inode_flag(F2FS_I(inode), FI_NEW_INODE);
 fail:
 	up_write(&F2FS_I(inode)->i_sem);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cdcae06..de21d38 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
 /* used for f2fs_inode_info->flags */
 enum {
 	FI_NEW_INODE,		/* indicate newly allocated inode */
+	FI_TMP_INODE,		/* indicate tmpfile */
 	FI_DIRTY_INODE,		/* indicate inode is dirty or not */
 	FI_DIRTY_DIR,		/* indicate directory has dirty pages */
 	FI_INC_LINK,		/* need to increment i_nlink */
@@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *);
 void release_orphan_inode(struct f2fs_sb_info *);
 void add_orphan_inode(struct f2fs_sb_info *, nid_t);
 void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
+bool is_orphan_inode(struct f2fs_sb_info *, nid_t);
 void recover_orphan_inodes(struct f2fs_sb_info *);
 int get_valid_checkpoint(struct f2fs_sb_info *);
 void update_dirty_page(struct inode *, struct page *);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 7464d08..98af3bf 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode)
 	 *  - f2fs_write_data_page
 	 *    - f2fs_gc -> iput -> evict
 	 *       - inode_wait_for_writeback(inode)
+	 * In order to avoid that, f2fs_write_data_page does not write data
+	 * pages for orphan inode except tmpfile.
+	 * Nevertheless, we need to truncate the tmpfile's data to avoid
+	 * needless cleaning.
 	 */
-	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
+	if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) &&
+						inode->i_state & I_SYNC) {
+		spin_unlock(&inode->i_lock);
+		i_size_write(inode, 0);
+
+		if (F2FS_HAS_BLOCKS(inode))
+			f2fs_truncate(inode);
+		spin_lock(&inode->i_lock);
 		return 0;
+	}
 	return generic_drop_inode(inode);
 }
 
-- 
2.1.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [f2fs-dev] Space leak in f2fs
  2015-05-16  0:55           ` Jaegeuk Kim
@ 2015-05-18  2:43             ` Chao Yu
  2015-05-18  2:50               ` Nicholas Krause
  2015-05-18  5:44               ` [f2fs-dev] " Jaegeuk Kim
  0 siblings, 2 replies; 11+ messages in thread
From: Chao Yu @ 2015-05-18  2:43 UTC (permalink / raw)
  To: 'Jaegeuk Kim'
  Cc: 'hujianyang', linux-fsdevel, linux-f2fs-devel

Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Saturday, May 16, 2015 8:56 AM
> To: Chao Yu
> Cc: 'hujianyang'; linux-fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] Space leak in f2fs
> 
> Hi Chao,
> 
> On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote:
> > Hi Jaegeuk,
> >
> 
> [snip]
> 
> > > +	/* if orphan inode, we don't need to write its data */
> > > +	if (is_orphan_inode(sbi, inode->i_ino))
> > > +		goto out;
> >
> > When user create a temp file by invoking open with O_TMPFILE flag,
> > in ->tmpfile our temp file will be added into orphan list as its
> > nlink is zero.
> >
> > If we skip writting out data for this orphan inode, later, even though
> > we add nlink/directory entry for orphan inode by calling linkat,
> > our file will contain inconsistent data between in-memory and on-disk.
> >
> > So how about considering for this case?
> 
> Right.
> How about the below patch?
> 
> >
> > BTW, the previous fixing patch looks good to me.
> 
> But, my new concern here is a memory pressure. If we do not drop the inode
> when iput was called, we need to wait for another time slot to reclaim its
> memory.

Agree. Please see below.

> 
> Thanks,
> 
> ---
>  fs/f2fs/checkpoint.c | 19 +++++++++++++++++++
>  fs/f2fs/data.c       |  8 ++++++++
>  fs/f2fs/dir.c        |  1 +
>  fs/f2fs/f2fs.h       |  2 ++
>  fs/f2fs/super.c      | 14 +++++++++++++-
>  5 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 7b7a9d8..74875fb 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int
> type)
>  	spin_unlock(&im->ino_lock);
>  }
> 
> +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
> +{
> +	struct inode_management *im = &sbi->im[type];
> +	struct ino_entry *e;
> +	bool exist = false;
> +
> +	spin_lock(&im->ino_lock);
> +	e = radix_tree_lookup(&im->ino_root, ino);
> +	if (e)
> +		exist = true;
> +	spin_unlock(&im->ino_lock);
> +	return exist;
> +}
> +
>  void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
>  {
>  	/* add new dirty ino entry into list */
> @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
>  	__remove_ino_entry(sbi, ino, ORPHAN_INO);
>  }
> 
> +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> +{
> +	return __exist_ino_entry(sbi, ino, ORPHAN_INO);
> +}
> +
>  static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
>  {
>  	struct inode *inode = f2fs_iget(sbi->sb, ino);
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index b0cc2aa..d883c14 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1749,6 +1749,14 @@ write:
>  		goto out;
>  	}
> 
> +	/*
> +	 * if orphan inode, we don't need to write its data,
> +	 * but, tmpfile is not the case.
> +	 */
> +	if (is_orphan_inode(sbi, inode->i_ino) &&
> +			!is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE))

For normal inode, all dirty pages will not be written out, and after that pages
can be reclaimed by VM any time due to they are be cleaned when flush. Then any
process who held the orphan inode may not read any original data correctly from
this inode.

And here is the unlink description in POSIX:
"If one or more processes have the file open when the last link is removed,
the link shall be removed before unlink() returns, but the removal of the
file contents shall be postponed until all references to the file are closed."

To my understanding for above description, we should keep data of helded orphan
inode in memory or on disk until it is not referenced by any processes.

How do you think of it?

using "if (is_orphan_inode(sbi, inode->i_ino) && !atomic_read(&inode->i_count))"
to skip writing at the beginning of ->writepage()?

Thanks,

> +		goto out;
> +
>  	if (!wbc->for_reclaim)
>  		need_balance_fs = true;
>  	else if (has_not_enough_free_secs(sbi, 0))
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index 3e92376..a2ea1b9 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir)
>  	update_inode(inode, page);
>  	f2fs_put_page(page, 1);
> 
> +	set_inode_flag(F2FS_I(inode), FI_TMP_INODE);
>  	clear_inode_flag(F2FS_I(inode), FI_NEW_INODE);
>  fail:
>  	up_write(&F2FS_I(inode)->i_sem);
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index cdcae06..de21d38 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
>  /* used for f2fs_inode_info->flags */
>  enum {
>  	FI_NEW_INODE,		/* indicate newly allocated inode */
> +	FI_TMP_INODE,		/* indicate tmpfile */
>  	FI_DIRTY_INODE,		/* indicate inode is dirty or not */
>  	FI_DIRTY_DIR,		/* indicate directory has dirty pages */
>  	FI_INC_LINK,		/* need to increment i_nlink */
> @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *);
>  void release_orphan_inode(struct f2fs_sb_info *);
>  void add_orphan_inode(struct f2fs_sb_info *, nid_t);
>  void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
> +bool is_orphan_inode(struct f2fs_sb_info *, nid_t);
>  void recover_orphan_inodes(struct f2fs_sb_info *);
>  int get_valid_checkpoint(struct f2fs_sb_info *);
>  void update_dirty_page(struct inode *, struct page *);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 7464d08..98af3bf 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode)
>  	 *  - f2fs_write_data_page
>  	 *    - f2fs_gc -> iput -> evict
>  	 *       - inode_wait_for_writeback(inode)
> +	 * In order to avoid that, f2fs_write_data_page does not write data
> +	 * pages for orphan inode except tmpfile.
> +	 * Nevertheless, we need to truncate the tmpfile's data to avoid
> +	 * needless cleaning.
>  	 */
> -	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
> +	if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) &&
> +						inode->i_state & I_SYNC) {
> +		spin_unlock(&inode->i_lock);
> +		i_size_write(inode, 0);
> +
> +		if (F2FS_HAS_BLOCKS(inode))
> +			f2fs_truncate(inode);
> +		spin_lock(&inode->i_lock);
>  		return 0;
> +	}
>  	return generic_drop_inode(inode);
>  }
> 
> --
> 2.1.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Space leak in f2fs
  2015-05-18  2:43             ` Chao Yu
@ 2015-05-18  2:50               ` Nicholas Krause
  2015-05-18  5:44               ` [f2fs-dev] " Jaegeuk Kim
  1 sibling, 0 replies; 11+ messages in thread
From: Nicholas Krause @ 2015-05-18  2:50 UTC (permalink / raw)
  To: Chao Yu, 'Jaegeuk Kim'; +Cc: linux-fsdevel, linux-f2fs-devel



On May 17, 2015 10:43:14 PM EDT, Chao Yu <chao2.yu@samsung.com> wrote:
>Hi Jaegeuk,
>
>> -----Original Message-----
>> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
>> Sent: Saturday, May 16, 2015 8:56 AM
>> To: Chao Yu
>> Cc: 'hujianyang'; linux-fsdevel@vger.kernel.org;
>linux-f2fs-devel@lists.sourceforge.net
>> Subject: Re: [f2fs-dev] Space leak in f2fs
>> 
>> Hi Chao,
>> 
>> On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote:
>> > Hi Jaegeuk,
>> >
>> 
>> [snip]
>> 
>> > > +	/* if orphan inode, we don't need to write its data */
>> > > +	if (is_orphan_inode(sbi, inode->i_ino))
>> > > +		goto out;
>> >
>> > When user create a temp file by invoking open with O_TMPFILE flag,
>> > in ->tmpfile our temp file will be added into orphan list as its
>> > nlink is zero.
>> >
>> > If we skip writting out data for this orphan inode, later, even
>though
>> > we add nlink/directory entry for orphan inode by calling linkat,
>> > our file will contain inconsistent data between in-memory and
>on-disk.
>> >
>> > So how about considering for this case?
>> 
>> Right.
>> How about the below patch?
>> 
>> >
>> > BTW, the previous fixing patch looks good to me.
>> 
>> But, my new concern here is a memory pressure. If we do not drop the
>inode
>> when iput was called, we need to wait for another time slot to
>reclaim its
>> memory.
>
>Agree. Please see below.
>
>> 
>> Thanks,
>> 
>> ---
>>  fs/f2fs/checkpoint.c | 19 +++++++++++++++++++
>>  fs/f2fs/data.c       |  8 ++++++++
>>  fs/f2fs/dir.c        |  1 +
>>  fs/f2fs/f2fs.h       |  2 ++
>>  fs/f2fs/super.c      | 14 +++++++++++++-
>>  5 files changed, 43 insertions(+), 1 deletion(-)
>> 
>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> index 7b7a9d8..74875fb 100644
>> --- a/fs/f2fs/checkpoint.c
>> +++ b/fs/f2fs/checkpoint.c
>> @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct
>f2fs_sb_info *sbi, nid_t ino, int
>> type)
>>  	spin_unlock(&im->ino_lock);
>>  }
>> 
>> +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino,
>int type)
>> +{
>> +	struct inode_management *im = &sbi->im[type];
>> +	struct ino_entry *e;
>> +	bool exist = false;
>> +
>> +	spin_lock(&im->ino_lock);
>> +	e = radix_tree_lookup(&im->ino_root, ino);
>> +	if (e)
>> +		exist = true;
>> +	spin_unlock(&im->ino_lock);
>> +	return exist;
>> +}
>> +
>>  void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
>>  {
>>  	/* add new dirty ino entry into list */
>> @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info
>*sbi, nid_t ino)
>>  	__remove_ino_entry(sbi, ino, ORPHAN_INO);
>>  }
>> 
>> +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
>> +{
>> +	return __exist_ino_entry(sbi, ino, ORPHAN_INO);
>> +}
>> +
>>  static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t
>ino)
>>  {
>>  	struct inode *inode = f2fs_iget(sbi->sb, ino);
>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>> index b0cc2aa..d883c14 100644
>> --- a/fs/f2fs/data.c
>> +++ b/fs/f2fs/data.c
>> @@ -1749,6 +1749,14 @@ write:
>>  		goto out;
>>  	}
>> 
>> +	/*
>> +	 * if orphan inode, we don't need to write its data,
>> +	 * but, tmpfile is not the case.
>> +	 */
>> +	if (is_orphan_inode(sbi, inode->i_ino) &&
>> +			!is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE))
>
>For normal inode, all dirty pages will not be written out, and after
>that pages
>can be reclaimed by VM any time due to they are be cleaned when flush.
>Then any
>process who held the orphan inode may not read any original data
>correctly from
>this inode.
>
>And here is the unlink description in POSIX:
>"If one or more processes have the file open when the last link is
>removed,
>the link shall be removed before unlink() returns, but the removal of
>the
>file contents shall be postponed until all references to the file are
>closed."
>
>To my understanding for above description, we should keep data of
>helded orphan
>inode in memory or on disk until it is not referenced by any processes.
>
>How do you think of it?
>
>using "if (is_orphan_inode(sbi, inode->i_ino) &&
>!atomic_read(&inode->i_count))"
>to skip writing at the beginning of ->writepage()?
>
>Thanks,
>
Chao, 
Your correct here,  I was going to recommend this but my explanation 
was pretty badly worded.  Again also
I am not that well versed in the f2fs 
code base so I wasn't sure if my 
answer was correct. 
Nick
>> +		goto out;
>> +
>>  	if (!wbc->for_reclaim)
>>  		need_balance_fs = true;
>>  	else if (has_not_enough_free_secs(sbi, 0))
>> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
>> index 3e92376..a2ea1b9 100644
>> --- a/fs/f2fs/dir.c
>> +++ b/fs/f2fs/dir.c
>> @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct
>inode *dir)
>>  	update_inode(inode, page);
>>  	f2fs_put_page(page, 1);
>> 
>> +	set_inode_flag(F2FS_I(inode), FI_TMP_INODE);
>>  	clear_inode_flag(F2FS_I(inode), FI_NEW_INODE);
>>  fail:
>>  	up_write(&F2FS_I(inode)->i_sem);
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index cdcae06..de21d38 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int
>nr, char *addr)
>>  /* used for f2fs_inode_info->flags */
>>  enum {
>>  	FI_NEW_INODE,		/* indicate newly allocated inode */
>> +	FI_TMP_INODE,		/* indicate tmpfile */
>>  	FI_DIRTY_INODE,		/* indicate inode is dirty or not */
>>  	FI_DIRTY_DIR,		/* indicate directory has dirty pages */
>>  	FI_INC_LINK,		/* need to increment i_nlink */
>> @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info
>*);
>>  void release_orphan_inode(struct f2fs_sb_info *);
>>  void add_orphan_inode(struct f2fs_sb_info *, nid_t);
>>  void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
>> +bool is_orphan_inode(struct f2fs_sb_info *, nid_t);
>>  void recover_orphan_inodes(struct f2fs_sb_info *);
>>  int get_valid_checkpoint(struct f2fs_sb_info *);
>>  void update_dirty_page(struct inode *, struct page *);
>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> index 7464d08..98af3bf 100644
>> --- a/fs/f2fs/super.c
>> +++ b/fs/f2fs/super.c
>> @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode)
>>  	 *  - f2fs_write_data_page
>>  	 *    - f2fs_gc -> iput -> evict
>>  	 *       - inode_wait_for_writeback(inode)
>> +	 * In order to avoid that, f2fs_write_data_page does not write data
>> +	 * pages for orphan inode except tmpfile.
>> +	 * Nevertheless, we need to truncate the tmpfile's data to avoid
>> +	 * needless cleaning.
>>  	 */
>> -	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
>> +	if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) &&
>> +						inode->i_state & I_SYNC) {
>> +		spin_unlock(&inode->i_lock);
>> +		i_size_write(inode, 0);
>> +
>> +		if (F2FS_HAS_BLOCKS(inode))
>> +			f2fs_truncate(inode);
>> +		spin_lock(&inode->i_lock);
>>  		return 0;
>> +	}
>>  	return generic_drop_inode(inode);
>>  }
>> 
>> --
>> 2.1.1
>
>
>
>------------------------------------------------------------------------------
>One dashboard for servers and applications across
>Physical-Virtual-Cloud 
>Widest out-of-the-box monitoring support with 50+ applications
>Performance metrics, stats and reports that give you Actionable
>Insights
>Deep dive visibility with transaction tracing using APM Insight.
>http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>_______________________________________________
>Linux-f2fs-devel mailing list
>Linux-f2fs-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [f2fs-dev] Space leak in f2fs
  2015-05-18  2:43             ` Chao Yu
  2015-05-18  2:50               ` Nicholas Krause
@ 2015-05-18  5:44               ` Jaegeuk Kim
  1 sibling, 0 replies; 11+ messages in thread
From: Jaegeuk Kim @ 2015-05-18  5:44 UTC (permalink / raw)
  To: Chao Yu; +Cc: 'hujianyang', linux-fsdevel, linux-f2fs-devel

Hi Chao,

On Mon, May 18, 2015 at 10:43:14AM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > Sent: Saturday, May 16, 2015 8:56 AM
> > To: Chao Yu
> > Cc: 'hujianyang'; linux-fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net
> > Subject: Re: [f2fs-dev] Space leak in f2fs
> > 
> > Hi Chao,
> > 
> > On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote:
> > > Hi Jaegeuk,
> > >
> > 
> > [snip]
> > 
> > > > +	/* if orphan inode, we don't need to write its data */
> > > > +	if (is_orphan_inode(sbi, inode->i_ino))
> > > > +		goto out;
> > >
> > > When user create a temp file by invoking open with O_TMPFILE flag,
> > > in ->tmpfile our temp file will be added into orphan list as its
> > > nlink is zero.
> > >
> > > If we skip writting out data for this orphan inode, later, even though
> > > we add nlink/directory entry for orphan inode by calling linkat,
> > > our file will contain inconsistent data between in-memory and on-disk.
> > >
> > > So how about considering for this case?
> > 
> > Right.
> > How about the below patch?
> > 
> > >
> > > BTW, the previous fixing patch looks good to me.
> > 
> > But, my new concern here is a memory pressure. If we do not drop the inode
> > when iput was called, we need to wait for another time slot to reclaim its
> > memory.
> 
> Agree. Please see below.
> 
> > 
> > Thanks,
> > 
> > ---
> >  fs/f2fs/checkpoint.c | 19 +++++++++++++++++++
> >  fs/f2fs/data.c       |  8 ++++++++
> >  fs/f2fs/dir.c        |  1 +
> >  fs/f2fs/f2fs.h       |  2 ++
> >  fs/f2fs/super.c      | 14 +++++++++++++-
> >  5 files changed, 43 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 7b7a9d8..74875fb 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int
> > type)
> >  	spin_unlock(&im->ino_lock);
> >  }
> > 
> > +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
> > +{
> > +	struct inode_management *im = &sbi->im[type];
> > +	struct ino_entry *e;
> > +	bool exist = false;
> > +
> > +	spin_lock(&im->ino_lock);
> > +	e = radix_tree_lookup(&im->ino_root, ino);
> > +	if (e)
> > +		exist = true;
> > +	spin_unlock(&im->ino_lock);
> > +	return exist;
> > +}
> > +
> >  void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
> >  {
> >  	/* add new dirty ino entry into list */
> > @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> >  	__remove_ino_entry(sbi, ino, ORPHAN_INO);
> >  }
> > 
> > +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> > +{
> > +	return __exist_ino_entry(sbi, ino, ORPHAN_INO);
> > +}
> > +
> >  static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> >  {
> >  	struct inode *inode = f2fs_iget(sbi->sb, ino);
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index b0cc2aa..d883c14 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -1749,6 +1749,14 @@ write:
> >  		goto out;
> >  	}
> > 
> > +	/*
> > +	 * if orphan inode, we don't need to write its data,
> > +	 * but, tmpfile is not the case.
> > +	 */
> > +	if (is_orphan_inode(sbi, inode->i_ino) &&
> > +			!is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE))
> 
> For normal inode, all dirty pages will not be written out, and after that pages
> can be reclaimed by VM any time due to they are be cleaned when flush. Then any
> process who held the orphan inode may not read any original data correctly from
> this inode.

Urg, right.
Indeed, I have not to do this for orphan inodes.

> 
> And here is the unlink description in POSIX:
> "If one or more processes have the file open when the last link is removed,
> the link shall be removed before unlink() returns, but the removal of the
> file contents shall be postponed until all references to the file are closed."
> 
> To my understanding for above description, we should keep data of helded orphan
> inode in memory or on disk until it is not referenced by any processes.
> 
> How do you think of it?
> 
> using "if (is_orphan_inode(sbi, inode->i_ino) && !atomic_read(&inode->i_count))"
> to skip writing at the beginning of ->writepage()?

Hmm, IMO, we can't use i_count without i_lock. And this doesn't clearly address
the original race condition.

For now, simply we'd better keep v2 which only truncates data blocks in
f2fs_drop_inode.

Thank you for pointing this out.

Thanks,

> 
> Thanks,
> 
> > +		goto out;
> > +
> >  	if (!wbc->for_reclaim)
> >  		need_balance_fs = true;
> >  	else if (has_not_enough_free_secs(sbi, 0))
> > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> > index 3e92376..a2ea1b9 100644
> > --- a/fs/f2fs/dir.c
> > +++ b/fs/f2fs/dir.c
> > @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir)
> >  	update_inode(inode, page);
> >  	f2fs_put_page(page, 1);
> > 
> > +	set_inode_flag(F2FS_I(inode), FI_TMP_INODE);
> >  	clear_inode_flag(F2FS_I(inode), FI_NEW_INODE);
> >  fail:
> >  	up_write(&F2FS_I(inode)->i_sem);
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index cdcae06..de21d38 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
> >  /* used for f2fs_inode_info->flags */
> >  enum {
> >  	FI_NEW_INODE,		/* indicate newly allocated inode */
> > +	FI_TMP_INODE,		/* indicate tmpfile */
> >  	FI_DIRTY_INODE,		/* indicate inode is dirty or not */
> >  	FI_DIRTY_DIR,		/* indicate directory has dirty pages */
> >  	FI_INC_LINK,		/* need to increment i_nlink */
> > @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *);
> >  void release_orphan_inode(struct f2fs_sb_info *);
> >  void add_orphan_inode(struct f2fs_sb_info *, nid_t);
> >  void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
> > +bool is_orphan_inode(struct f2fs_sb_info *, nid_t);
> >  void recover_orphan_inodes(struct f2fs_sb_info *);
> >  int get_valid_checkpoint(struct f2fs_sb_info *);
> >  void update_dirty_page(struct inode *, struct page *);
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 7464d08..98af3bf 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode)
> >  	 *  - f2fs_write_data_page
> >  	 *    - f2fs_gc -> iput -> evict
> >  	 *       - inode_wait_for_writeback(inode)
> > +	 * In order to avoid that, f2fs_write_data_page does not write data
> > +	 * pages for orphan inode except tmpfile.
> > +	 * Nevertheless, we need to truncate the tmpfile's data to avoid
> > +	 * needless cleaning.
> >  	 */
> > -	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
> > +	if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) &&
> > +						inode->i_state & I_SYNC) {
> > +		spin_unlock(&inode->i_lock);
> > +		i_size_write(inode, 0);
> > +
> > +		if (F2FS_HAS_BLOCKS(inode))
> > +			f2fs_truncate(inode);
> > +		spin_lock(&inode->i_lock);
> >  		return 0;
> > +	}
> >  	return generic_drop_inode(inode);
> >  }
> > 
> > --
> > 2.1.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-05-18  5:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-13  7:17 Space leak in f2fs hujianyang
2015-05-13 17:46 ` [f2fs-dev] " Jaegeuk Kim
2015-05-14  0:24   ` Jaegeuk Kim
2015-05-14  1:40     ` hujianyang
2015-05-14  1:45       ` [f2fs-dev] " Jaegeuk Kim
2015-05-14 21:14       ` Jaegeuk Kim
2015-05-15  8:31         ` Chao Yu
2015-05-16  0:55           ` Jaegeuk Kim
2015-05-18  2:43             ` Chao Yu
2015-05-18  2:50               ` Nicholas Krause
2015-05-18  5:44               ` [f2fs-dev] " Jaegeuk Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).