linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chao Yu <chao2.yu@samsung.com>
To: 'Jaegeuk Kim' <jaegeuk@kernel.org>
Cc: 'hujianyang' <hujianyang@huawei.com>,
	linux-fsdevel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net
Subject: RE: [f2fs-dev] Space leak in f2fs
Date: Mon, 18 May 2015 10:43:14 +0800	[thread overview]
Message-ID: <000001d09114$7e0318d0$7a094a70$@samsung.com> (raw)
In-Reply-To: <20150516005436.GA10530@jaegeuk-mac02.mot.com>

Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> Sent: Saturday, May 16, 2015 8:56 AM
> To: Chao Yu
> Cc: 'hujianyang'; linux-fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net
> Subject: Re: [f2fs-dev] Space leak in f2fs
> 
> Hi Chao,
> 
> On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote:
> > Hi Jaegeuk,
> >
> 
> [snip]
> 
> > > +	/* if orphan inode, we don't need to write its data */
> > > +	if (is_orphan_inode(sbi, inode->i_ino))
> > > +		goto out;
> >
> > When user create a temp file by invoking open with O_TMPFILE flag,
> > in ->tmpfile our temp file will be added into orphan list as its
> > nlink is zero.
> >
> > If we skip writting out data for this orphan inode, later, even though
> > we add nlink/directory entry for orphan inode by calling linkat,
> > our file will contain inconsistent data between in-memory and on-disk.
> >
> > So how about considering for this case?
> 
> Right.
> How about the below patch?
> 
> >
> > BTW, the previous fixing patch looks good to me.
> 
> But, my new concern here is a memory pressure. If we do not drop the inode
> when iput was called, we need to wait for another time slot to reclaim its
> memory.

Agree. Please see below.

> 
> Thanks,
> 
> ---
>  fs/f2fs/checkpoint.c | 19 +++++++++++++++++++
>  fs/f2fs/data.c       |  8 ++++++++
>  fs/f2fs/dir.c        |  1 +
>  fs/f2fs/f2fs.h       |  2 ++
>  fs/f2fs/super.c      | 14 +++++++++++++-
>  5 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 7b7a9d8..74875fb 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int
> type)
>  	spin_unlock(&im->ino_lock);
>  }
> 
> +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
> +{
> +	struct inode_management *im = &sbi->im[type];
> +	struct ino_entry *e;
> +	bool exist = false;
> +
> +	spin_lock(&im->ino_lock);
> +	e = radix_tree_lookup(&im->ino_root, ino);
> +	if (e)
> +		exist = true;
> +	spin_unlock(&im->ino_lock);
> +	return exist;
> +}
> +
>  void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
>  {
>  	/* add new dirty ino entry into list */
> @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
>  	__remove_ino_entry(sbi, ino, ORPHAN_INO);
>  }
> 
> +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> +{
> +	return __exist_ino_entry(sbi, ino, ORPHAN_INO);
> +}
> +
>  static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
>  {
>  	struct inode *inode = f2fs_iget(sbi->sb, ino);
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index b0cc2aa..d883c14 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1749,6 +1749,14 @@ write:
>  		goto out;
>  	}
> 
> +	/*
> +	 * if orphan inode, we don't need to write its data,
> +	 * but, tmpfile is not the case.
> +	 */
> +	if (is_orphan_inode(sbi, inode->i_ino) &&
> +			!is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE))

For normal inode, all dirty pages will not be written out, and after that pages
can be reclaimed by VM any time due to they are be cleaned when flush. Then any
process who held the orphan inode may not read any original data correctly from
this inode.

And here is the unlink description in POSIX:
"If one or more processes have the file open when the last link is removed,
the link shall be removed before unlink() returns, but the removal of the
file contents shall be postponed until all references to the file are closed."

To my understanding for above description, we should keep data of helded orphan
inode in memory or on disk until it is not referenced by any processes.

How do you think of it?

using "if (is_orphan_inode(sbi, inode->i_ino) && !atomic_read(&inode->i_count))"
to skip writing at the beginning of ->writepage()?

Thanks,

> +		goto out;
> +
>  	if (!wbc->for_reclaim)
>  		need_balance_fs = true;
>  	else if (has_not_enough_free_secs(sbi, 0))
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index 3e92376..a2ea1b9 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir)
>  	update_inode(inode, page);
>  	f2fs_put_page(page, 1);
> 
> +	set_inode_flag(F2FS_I(inode), FI_TMP_INODE);
>  	clear_inode_flag(F2FS_I(inode), FI_NEW_INODE);
>  fail:
>  	up_write(&F2FS_I(inode)->i_sem);
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index cdcae06..de21d38 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
>  /* used for f2fs_inode_info->flags */
>  enum {
>  	FI_NEW_INODE,		/* indicate newly allocated inode */
> +	FI_TMP_INODE,		/* indicate tmpfile */
>  	FI_DIRTY_INODE,		/* indicate inode is dirty or not */
>  	FI_DIRTY_DIR,		/* indicate directory has dirty pages */
>  	FI_INC_LINK,		/* need to increment i_nlink */
> @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *);
>  void release_orphan_inode(struct f2fs_sb_info *);
>  void add_orphan_inode(struct f2fs_sb_info *, nid_t);
>  void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
> +bool is_orphan_inode(struct f2fs_sb_info *, nid_t);
>  void recover_orphan_inodes(struct f2fs_sb_info *);
>  int get_valid_checkpoint(struct f2fs_sb_info *);
>  void update_dirty_page(struct inode *, struct page *);
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 7464d08..98af3bf 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode)
>  	 *  - f2fs_write_data_page
>  	 *    - f2fs_gc -> iput -> evict
>  	 *       - inode_wait_for_writeback(inode)
> +	 * In order to avoid that, f2fs_write_data_page does not write data
> +	 * pages for orphan inode except tmpfile.
> +	 * Nevertheless, we need to truncate the tmpfile's data to avoid
> +	 * needless cleaning.
>  	 */
> -	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
> +	if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) &&
> +						inode->i_state & I_SYNC) {
> +		spin_unlock(&inode->i_lock);
> +		i_size_write(inode, 0);
> +
> +		if (F2FS_HAS_BLOCKS(inode))
> +			f2fs_truncate(inode);
> +		spin_lock(&inode->i_lock);
>  		return 0;
> +	}
>  	return generic_drop_inode(inode);
>  }
> 
> --
> 2.1.1



  reply	other threads:[~2015-05-18  2:44 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-13  7:17 Space leak in f2fs hujianyang
2015-05-13 17:46 ` [f2fs-dev] " Jaegeuk Kim
2015-05-14  0:24   ` Jaegeuk Kim
2015-05-14  1:40     ` hujianyang
2015-05-14  1:45       ` [f2fs-dev] " Jaegeuk Kim
2015-05-14 21:14       ` Jaegeuk Kim
2015-05-15  8:31         ` Chao Yu
2015-05-16  0:55           ` Jaegeuk Kim
2015-05-18  2:43             ` Chao Yu [this message]
2015-05-18  2:50               ` Nicholas Krause
2015-05-18  5:44               ` [f2fs-dev] " Jaegeuk Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000001d09114$7e0318d0$7a094a70$@samsung.com' \
    --to=chao2.yu@samsung.com \
    --cc=hujianyang@huawei.com \
    --cc=jaegeuk@kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).