From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Chao Yu <chao2.yu@samsung.com>
Cc: 'hujianyang' <hujianyang@huawei.com>,
linux-fsdevel@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] Space leak in f2fs
Date: Sun, 17 May 2015 22:44:21 -0700 [thread overview]
Message-ID: <20150518054421.GA21292@jaegeuk-mac02.hsd1.ca.comcast.net> (raw)
In-Reply-To: <000001d09114$7e0318d0$7a094a70$@samsung.com>
Hi Chao,
On Mon, May 18, 2015 at 10:43:14AM +0800, Chao Yu wrote:
> Hi Jaegeuk,
>
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaegeuk@kernel.org]
> > Sent: Saturday, May 16, 2015 8:56 AM
> > To: Chao Yu
> > Cc: 'hujianyang'; linux-fsdevel@vger.kernel.org; linux-f2fs-devel@lists.sourceforge.net
> > Subject: Re: [f2fs-dev] Space leak in f2fs
> >
> > Hi Chao,
> >
> > On Fri, May 15, 2015 at 04:31:43PM +0800, Chao Yu wrote:
> > > Hi Jaegeuk,
> > >
> >
> > [snip]
> >
> > > > + /* if orphan inode, we don't need to write its data */
> > > > + if (is_orphan_inode(sbi, inode->i_ino))
> > > > + goto out;
> > >
> > > When user create a temp file by invoking open with O_TMPFILE flag,
> > > in ->tmpfile our temp file will be added into orphan list as its
> > > nlink is zero.
> > >
> > > If we skip writting out data for this orphan inode, later, even though
> > > we add nlink/directory entry for orphan inode by calling linkat,
> > > our file will contain inconsistent data between in-memory and on-disk.
> > >
> > > So how about considering for this case?
> >
> > Right.
> > How about the below patch?
> >
> > >
> > > BTW, the previous fixing patch looks good to me.
> >
> > But, my new concern here is a memory pressure. If we do not drop the inode
> > when iput was called, we need to wait for another time slot to reclaim its
> > memory.
>
> Agree. Please see below.
>
> >
> > Thanks,
> >
> > ---
> > fs/f2fs/checkpoint.c | 19 +++++++++++++++++++
> > fs/f2fs/data.c | 8 ++++++++
> > fs/f2fs/dir.c | 1 +
> > fs/f2fs/f2fs.h | 2 ++
> > fs/f2fs/super.c | 14 +++++++++++++-
> > 5 files changed, 43 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index 7b7a9d8..74875fb 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -378,6 +378,20 @@ static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int
> > type)
> > spin_unlock(&im->ino_lock);
> > }
> >
> > +static bool __exist_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
> > +{
> > + struct inode_management *im = &sbi->im[type];
> > + struct ino_entry *e;
> > + bool exist = false;
> > +
> > + spin_lock(&im->ino_lock);
> > + e = radix_tree_lookup(&im->ino_root, ino);
> > + if (e)
> > + exist = true;
> > + spin_unlock(&im->ino_lock);
> > + return exist;
> > +}
> > +
> > void add_dirty_inode(struct f2fs_sb_info *sbi, nid_t ino, int type)
> > {
> > /* add new dirty ino entry into list */
> > @@ -458,6 +472,11 @@ void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> > __remove_ino_entry(sbi, ino, ORPHAN_INO);
> > }
> >
> > +bool is_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> > +{
> > + return __exist_ino_entry(sbi, ino, ORPHAN_INO);
> > +}
> > +
> > static void recover_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
> > {
> > struct inode *inode = f2fs_iget(sbi->sb, ino);
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index b0cc2aa..d883c14 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -1749,6 +1749,14 @@ write:
> > goto out;
> > }
> >
> > + /*
> > + * if orphan inode, we don't need to write its data,
> > + * but, tmpfile is not the case.
> > + */
> > + if (is_orphan_inode(sbi, inode->i_ino) &&
> > + !is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE))
>
> For normal inode, all dirty pages will not be written out, and after that pages
> can be reclaimed by VM any time due to they are be cleaned when flush. Then any
> process who held the orphan inode may not read any original data correctly from
> this inode.
Urg, right.
Indeed, I have not to do this for orphan inodes.
>
> And here is the unlink description in POSIX:
> "If one or more processes have the file open when the last link is removed,
> the link shall be removed before unlink() returns, but the removal of the
> file contents shall be postponed until all references to the file are closed."
>
> To my understanding for above description, we should keep data of helded orphan
> inode in memory or on disk until it is not referenced by any processes.
>
> How do you think of it?
>
> using "if (is_orphan_inode(sbi, inode->i_ino) && !atomic_read(&inode->i_count))"
> to skip writing at the beginning of ->writepage()?
Hmm, IMO, we can't use i_count without i_lock. And this doesn't clearly address
the original race condition.
For now, simply we'd better keep v2 which only truncates data blocks in
f2fs_drop_inode.
Thank you for pointing this out.
Thanks,
>
> Thanks,
>
> > + goto out;
> > +
> > if (!wbc->for_reclaim)
> > need_balance_fs = true;
> > else if (has_not_enough_free_secs(sbi, 0))
> > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> > index 3e92376..a2ea1b9 100644
> > --- a/fs/f2fs/dir.c
> > +++ b/fs/f2fs/dir.c
> > @@ -648,6 +648,7 @@ int f2fs_do_tmpfile(struct inode *inode, struct inode *dir)
> > update_inode(inode, page);
> > f2fs_put_page(page, 1);
> >
> > + set_inode_flag(F2FS_I(inode), FI_TMP_INODE);
> > clear_inode_flag(F2FS_I(inode), FI_NEW_INODE);
> > fail:
> > up_write(&F2FS_I(inode)->i_sem);
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index cdcae06..de21d38 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1337,6 +1337,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr)
> > /* used for f2fs_inode_info->flags */
> > enum {
> > FI_NEW_INODE, /* indicate newly allocated inode */
> > + FI_TMP_INODE, /* indicate tmpfile */
> > FI_DIRTY_INODE, /* indicate inode is dirty or not */
> > FI_DIRTY_DIR, /* indicate directory has dirty pages */
> > FI_INC_LINK, /* need to increment i_nlink */
> > @@ -1726,6 +1727,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *);
> > void release_orphan_inode(struct f2fs_sb_info *);
> > void add_orphan_inode(struct f2fs_sb_info *, nid_t);
> > void remove_orphan_inode(struct f2fs_sb_info *, nid_t);
> > +bool is_orphan_inode(struct f2fs_sb_info *, nid_t);
> > void recover_orphan_inodes(struct f2fs_sb_info *);
> > int get_valid_checkpoint(struct f2fs_sb_info *);
> > void update_dirty_page(struct inode *, struct page *);
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 7464d08..98af3bf 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -430,9 +430,21 @@ static int f2fs_drop_inode(struct inode *inode)
> > * - f2fs_write_data_page
> > * - f2fs_gc -> iput -> evict
> > * - inode_wait_for_writeback(inode)
> > + * In order to avoid that, f2fs_write_data_page does not write data
> > + * pages for orphan inode except tmpfile.
> > + * Nevertheless, we need to truncate the tmpfile's data to avoid
> > + * needless cleaning.
> > */
> > - if (!inode_unhashed(inode) && inode->i_state & I_SYNC)
> > + if (is_inode_flag_set(F2FS_I(inode), FI_TMP_INODE) &&
> > + inode->i_state & I_SYNC) {
> > + spin_unlock(&inode->i_lock);
> > + i_size_write(inode, 0);
> > +
> > + if (F2FS_HAS_BLOCKS(inode))
> > + f2fs_truncate(inode);
> > + spin_lock(&inode->i_lock);
> > return 0;
> > + }
> > return generic_drop_inode(inode);
> > }
> >
> > --
> > 2.1.1
prev parent reply other threads:[~2015-05-18 5:44 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-13 7:17 Space leak in f2fs hujianyang
2015-05-13 17:46 ` [f2fs-dev] " Jaegeuk Kim
2015-05-14 0:24 ` Jaegeuk Kim
2015-05-14 1:40 ` hujianyang
2015-05-14 1:45 ` [f2fs-dev] " Jaegeuk Kim
2015-05-14 21:14 ` Jaegeuk Kim
2015-05-15 8:31 ` Chao Yu
2015-05-16 0:55 ` Jaegeuk Kim
2015-05-18 2:43 ` Chao Yu
2015-05-18 2:50 ` Nicholas Krause
2015-05-18 5:44 ` Jaegeuk Kim [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150518054421.GA21292@jaegeuk-mac02.hsd1.ca.comcast.net \
--to=jaegeuk@kernel.org \
--cc=chao2.yu@samsung.com \
--cc=hujianyang@huawei.com \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).