linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Chao Yu <chao2.yu@samsung.com>
Cc: 'hujianyang' <hujianyang@huawei.com>,
	linux-fsdevel@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] Space leak in f2fs
Date: Sun, 17 May 2015 22:44:21 -0700	[thread overview]
Message-ID: <20150518054421.GA21292@jaegeuk-mac02.hsd1.ca.comcast.net> (raw)
In-Reply-To: <000001d09114$7e0318d0$7a094a70$@samsung.com>

Hi Chao,

On Mon, May 18, 2015 at 10:43:14AM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > Sent: Saturday, May 16, 2015 8:56 AM
> > To: Chao Yu
> > Cc: 'hujianyang'; linux-fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net
> > Subject: Re: [f2fs-dev] Space leak in f2fs
> > 
> > Hi Chao,
> > 
> > On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote:
> > > Hi Jaegeuk,
> > >
> > 
> > [snip]
> > 
> > > > +	/* if orphan inode, we don't need to write its data */
> > > > +	if (is_orphan_inode(sbi, inode->i_ino))
> > > > +		goto out;
> > >
> > > When user create a temp file by invoking open with O_TMPFILE flag,
> > > in ->tmpfile our temp file will be added into orphan list as its
> > > nlink is zero.
> > >
> > > If we skip writting out data for this orphan inode, later, even though
> > > we add nlink/directory entry for orphan inode by calling linkat,
> > > our file will contain inconsistent data between in-memory and on-disk.
> > >
> > > So how about considering for this case?
> > 
> > Right.
> > How about the below patch?
> > 
> > >
> > > BTW, the previous fixing patch looks good to me.
> > 
> > But, my new concern here is a memory pressure. If we do not drop the inode
> > when iput was called, we need to wait for another time slot to reclaim its
> > memory.
> 
> Agree. Please see below.
> 
> > 
> > Thanks,
> > 
> > ---
> >  fs/f2fs/checkpoint.c | 19 +++++++++++++++++++
> >  fs/f2fs/data.c       |  8 ++++++++
> >  fs/f2fs/dir.c        |  1 +
> >  fs/f2fs/f2fs.h       |  2 ++
> >  fs/f2fs/super.c      | 14 +++++++++++++-
> >  5 files changed, 43 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 7b7a9d8..74875fb 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int
> > type)
> >  	spin_unlock(&im->ino_lock);
> >  }
> > 
> > +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
> > +{
> > +	struct inode_management *im = &sbi->im[type];
> > +	struct ino_entry *e;
> > +	bool exist = false;
> > +
> > +	spin_lock(&im->ino_lock);
> > +	e = radix_tree_lookup(&im->ino_root, ino);
> > +	if (e)
> > +		exist = true;
> > +	spin_unlock(&im->ino_lock);
> > +	return exist;
> > +}
> > +
> >  void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
> >  {
> >  	/* add new dirty ino entry into list */
> > @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> >  	__remove_ino_entry(sbi, ino, ORPHAN_INO);
> >  }
> > 
> > +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> > +{
> > +	return __exist_ino_entry(sbi, ino, ORPHAN_INO);
> > +}
> > +
> >  static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> >  {
> >  	struct inode *inode = f2fs_iget(sbi->sb, ino);
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index b0cc2aa..d883c14 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -1749,6 +1749,14 @@ write:
> >  		goto out;
> >  	}
> > 
> > +	/*
> > +	 * if orphan inode, we don't need to write its data,
> > +	 * but, tmpfile is not the case.
> > +	 */
> > +	if (is_orphan_inode(sbi, inode->i_ino) &&
> > +			!is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE))
> 
> For normal inode, all dirty pages will not be written out, and after that pages
> can be reclaimed by VM any time due to they are be cleaned when flush. Then any
> process who held the orphan inode may not read any original data correctly from
> this inode.

Urg, right.
Indeed, I have not to do this for orphan inodes.

> 
> And here is the unlink description in POSIX:
> "If one or more processes have the file open when the last link is removed,
> the link shall be removed before unlink() returns, but the removal of the
> file contents shall be postponed until all references to the file are closed."
> 
> To my understanding for above description, we should keep data of helded orphan
> inode in memory or on disk until it is not referenced by any processes.
> 
> How do you think of it?
> 
> using "if (is_orphan_inode(sbi, inode->i_ino) && !atomic_read(&inode->i_count))"
> to skip writing at the beginning of ->writepage()?

Hmm, IMO, we can't use i_count without i_lock. And this doesn't clearly address
the original race condition.

For now, simply we'd better keep v2 which only truncates data blocks in
f2fs_drop_inode.

Thank you for pointing this out.

Thanks,

> 
> Thanks,
> 
> > +		goto out;
> > +
> >  	if (!wbc->for_reclaim)
> >  		need_balance_fs = true;
> >  	else if (has_not_enough_free_secs(sbi, 0))
> > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> > index 3e92376..a2ea1b9 100644
> > --- a/fs/f2fs/dir.c
> > +++ b/fs/f2fs/dir.c
> > @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir)
> >  	update_inode(inode, page);
> >  	f2fs_put_page(page, 1);
> > 
> > +	set_inode_flag(F2FS_I(inode), FI_TMP_INODE);
> >  	clear_inode_flag(F2FS_I(inode), FI_NEW_INODE);
> >  fail:
> >  	up_write(&F2FS_I(inode)->i_sem);
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index cdcae06..de21d38 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
> >  /* used for f2fs_inode_info->flags */
> >  enum {
> >  	FI_NEW_INODE,		/* indicate newly allocated inode */
> > +	FI_TMP_INODE,		/* indicate tmpfile */
> >  	FI_DIRTY_INODE,		/* indicate inode is dirty or not */
> >  	FI_DIRTY_DIR,		/* indicate directory has dirty pages */
> >  	FI_INC_LINK,		/* need to increment i_nlink */
> > @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *);
> >  void release_orphan_inode(struct f2fs_sb_info *);
> >  void add_orphan_inode(struct f2fs_sb_info *, nid_t);
> >  void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
> > +bool is_orphan_inode(struct f2fs_sb_info *, nid_t);
> >  void recover_orphan_inodes(struct f2fs_sb_info *);
> >  int get_valid_checkpoint(struct f2fs_sb_info *);
> >  void update_dirty_page(struct inode *, struct page *);
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 7464d08..98af3bf 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode)
> >  	 *  - f2fs_write_data_page
> >  	 *    - f2fs_gc -> iput -> evict
> >  	 *       - inode_wait_for_writeback(inode)
> > +	 * In order to avoid that, f2fs_write_data_page does not write data
> > +	 * pages for orphan inode except tmpfile.
> > +	 * Nevertheless, we need to truncate the tmpfile's data to avoid
> > +	 * needless cleaning.
> >  	 */
> > -	if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
> > +	if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) &&
> > +						inode->i_state & I_SYNC) {
> > +		spin_unlock(&inode->i_lock);
> > +		i_size_write(inode, 0);
> > +
> > +		if (F2FS_HAS_BLOCKS(inode))
> > +			f2fs_truncate(inode);
> > +		spin_lock(&inode->i_lock);
> >  		return 0;
> > +	}
> >  	return generic_drop_inode(inode);
> >  }
> > 
> > --
> > 2.1.1

      parent reply	other threads:[~2015-05-18  5:44 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-13  7:17 Space leak in f2fs hujianyang
2015-05-13 17:46 ` [f2fs-dev] " Jaegeuk Kim
2015-05-14  0:24   ` Jaegeuk Kim
2015-05-14  1:40     ` hujianyang
2015-05-14  1:45       ` [f2fs-dev] " Jaegeuk Kim
2015-05-14 21:14       ` Jaegeuk Kim
2015-05-15  8:31         ` Chao Yu
2015-05-16  0:55           ` Jaegeuk Kim
2015-05-18  2:43             ` Chao Yu
2015-05-18  2:50               ` Nicholas Krause
2015-05-18  5:44               ` Jaegeuk Kim [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150518054421.GA21292@jaegeuk-mac02.hsd1.ca.comcast.net \
    --to=jaegeuk@kernel.org \
    --cc=chao2.yu@samsung.com \
    --cc=hujianyang@huawei.com \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).