From: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> To: Andrew Morton <akpm@linux-foundation.org> Cc: LKML <linux-kernel@vger.kernel.org>, linux-nilfs <linux-nilfs@vger.kernel.org>, Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>, Andreas Rohner <andreas.rohner@gmx.net> Subject: [PATCH 1/4] nilfs2: Fix race condition that causes file system corruption Date: Mon, 30 Oct 2017 21:52:12 +0900 [thread overview] Message-ID: <1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp> (raw) In-Reply-To: <1509367935-3086-1-git-send-email-konishi.ryusuke@lab.ntt.co.jp> From: Andreas Rohner <andreas.rohner@gmx.net> There is a race condition between the function nilfs_dirty_inode() and nilfs_set_file_dirty(). When a file is opened, nilfs_dirty_inode() is called to update the access timestamp in the inode. It calls __nilfs_mark_inode_dirty() in a separate transaction. __nilfs_mark_inode_dirty() caches the ifile buffer_head in the i_bh field of the inode info structure and marks it as dirty. After some data was written to the file in another transaction, the function nilfs_set_file_dirty() is called, which adds the inode to the ns_dirty_files list. Then the segment construction calls nilfs_segctor_collect_dirty_files(), which goes through the ns_dirty_files list and checks the i_bh field. If there is a cached buffer_head in i_bh it is not marked as dirty again. Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate transactions, it is possible that a segment construction that writes out the ifile occurs in-between the two. If this happens the inode is not on the ns_dirty_files list, but its ifile block is still marked as dirty and written out. In the next segment construction, the data for the file is written out and nilfs_bmap_propagate() updates the b-tree. Eventually the bmap root is written into the i_bh block, which is not dirty, because it was written out in another segment construction. As a result the bmap update can be lost, which leads to file system corruption. Either the virtual block address points to an unallocated DAT block, or the DAT entry will be reused for something different. The error can remain undetected for a long time. A typical error message would be one of the "bad btree" errors or a warning that a DAT entry could not be found. This bug can be reproduced reliably by a simple benchmark that creates and overwrites millions of 4k files. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Andreas Rohner <andreas.rohner@gmx.net> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable@vger.kernel.org --- fs/nilfs2/segment.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 70ded52..50e1295 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1958,8 +1958,6 @@ static int nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, err, ii->vfs_inode.i_ino); return err; } - mark_buffer_dirty(ibh); - nilfs_mdt_mark_dirty(ifile); spin_lock(&nilfs->ns_inode_lock); if (likely(!ii->i_bh)) ii->i_bh = ibh; @@ -1968,6 +1966,10 @@ static int nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci, goto retry; } + // Always redirty the buffer to avoid race condition + mark_buffer_dirty(ii->i_bh); + nilfs_mdt_mark_dirty(ifile); + clear_bit(NILFS_I_QUEUED, &ii->i_state); set_bit(NILFS_I_BUSY, &ii->i_state); list_move_tail(&ii->i_dirty, &sci->sc_dirty_files); -- 1.8.3.1
next prev parent reply other threads:[~2017-10-30 12:57 UTC|newest] Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-10-30 12:52 [PATCH 0/4] nilfs2 updates Ryusuke Konishi 2017-10-30 12:52 ` Ryusuke Konishi [this message] 2017-10-30 12:52 ` [PATCH 2/4] fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t Ryusuke Konishi 2017-10-30 12:52 ` [PATCH 3/4] nilfs2: align block comments of nilfs_sufile_truncate_range() at * Ryusuke Konishi 2017-10-30 12:52 ` [PATCH 4/4] nilfs2: use octal for unreadable permission macro Ryusuke Konishi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1509367935-3086-2-git-send-email-konishi.ryusuke@lab.ntt.co.jp \ --to=konishi.ryusuke@lab.ntt.co.jp \ --cc=akpm@linux-foundation.org \ --cc=andreas.rohner@gmx.net \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nilfs@vger.kernel.org \ --subject='Re: [PATCH 1/4] nilfs2: Fix race condition that causes file system corruption' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.