LKML Archive on lore.kernel.org
 help / Atom feed
* [PATCH 0/4] nilfs2 updates 
@ 2017-10-30 12:52 Ryusuke Konishi
  2017-10-30 12:52 ` [PATCH 1/4] nilfs2: Fix race condition that causes file system corruption Ryusuke Konishi
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Ryusuke Konishi @ 2017-10-30 12:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, linux-nilfs, Ryusuke Konishi

Hi Andrew,

Please queue the following changes for the next merge window:

Andreas Rohner (1):
      nilfs2: Fix race condition that causes file system corruption

Elena Reshetova (1):
      fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t

Ryusuke Konishi (2):
      nilfs2: align block comments of nilfs_sufile_truncate_range() at *
      nilfs2: use octal for unreadable permission macro

In this series,

>      nilfs2: Fix race condition that causes file system corruption

fixes a file system corruption issue in some high loads.

>      fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t

introduces refcount_t type to nilfs.

>      nilfs2: align block comments of nilfs_sufile_truncate_range() at *
>      nilfs2: use octal for unreadable permission macro

are style fixes, both of which are based on suggestions of recent
checkpatch.pl.

Thanks,
Ryusuke Konishi
--
 fs/nilfs2/namei.c     |  2 +-
 fs/nilfs2/segment.c   |  6 ++++--
 fs/nilfs2/sufile.c    | 32 ++++++++++++++++----------------
 fs/nilfs2/the_nilfs.c |  8 ++++----
 fs/nilfs2/the_nilfs.h |  5 +++--
 5 files changed, 28 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] nilfs2: Fix race condition that causes file system corruption
  2017-10-30 12:52 [PATCH 0/4] nilfs2 updates Ryusuke Konishi
@ 2017-10-30 12:52 ` Ryusuke Konishi
  2017-10-30 12:52 ` [PATCH 2/4] fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t Ryusuke Konishi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Ryusuke Konishi @ 2017-10-30 12:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, linux-nilfs, Ryusuke Konishi, Andreas Rohner

From: Andreas Rohner <andreas.rohner@gmx.net>

There is a race condition between the function nilfs_dirty_inode() and
nilfs_set_file_dirty().

When a file is opened, nilfs_dirty_inode() is called to update the
access timestamp in the inode. It calls __nilfs_mark_inode_dirty() in a
separate transaction. __nilfs_mark_inode_dirty() caches the ifile
buffer_head in the i_bh field of the inode info structure and marks it
as dirty.

After some data was written to the file in another transaction, the
function nilfs_set_file_dirty() is called, which adds the inode to
the ns_dirty_files list.

Then the segment construction calls nilfs_segctor_collect_dirty_files(),
which goes through the ns_dirty_files list and checks the i_bh field. If
there is a cached buffer_head in i_bh it is not marked as dirty again.

Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate
transactions, it is possible that a segment construction that
writes out the ifile occurs in-between the two. If this happens the
inode is not on the ns_dirty_files list, but its ifile block is still
marked as dirty and written out.

In the next segment construction, the data for the file is written out
and nilfs_bmap_propagate() updates the b-tree. Eventually the bmap root
is written into the i_bh block, which is not dirty, because it was
written out in another segment construction.

As a result the bmap update can be lost, which leads to file system
corruption. Either the virtual block address points to an unallocated
DAT block, or the DAT entry will be reused for something different.

The error can remain undetected for a long time. A typical error message
would be one of the "bad btree" errors or a warning that a DAT entry
could not be found.

This bug can be reproduced reliably by a simple benchmark that creates
and overwrites millions of 4k files.

Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable@vger.kernel.org
---
 fs/nilfs2/segment.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 70ded52..50e1295 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1958,8 +1958,6 @@ static int nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci,
 					  err, ii->vfs_inode.i_ino);
 				return err;
 			}
-			mark_buffer_dirty(ibh);
-			nilfs_mdt_mark_dirty(ifile);
 			spin_lock(&nilfs->ns_inode_lock);
 			if (likely(!ii->i_bh))
 				ii->i_bh = ibh;
@@ -1968,6 +1966,10 @@ static int nilfs_segctor_collect_dirty_files(struct nilfs_sc_info *sci,
 			goto retry;
 		}
 
+		// Always redirty the buffer to avoid race condition
+		mark_buffer_dirty(ii->i_bh);
+		nilfs_mdt_mark_dirty(ifile);
+
 		clear_bit(NILFS_I_QUEUED, &ii->i_state);
 		set_bit(NILFS_I_BUSY, &ii->i_state);
 		list_move_tail(&ii->i_dirty, &sci->sc_dirty_files);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/4] fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t
  2017-10-30 12:52 [PATCH 0/4] nilfs2 updates Ryusuke Konishi
  2017-10-30 12:52 ` [PATCH 1/4] nilfs2: Fix race condition that causes file system corruption Ryusuke Konishi
@ 2017-10-30 12:52 ` Ryusuke Konishi
  2017-10-30 12:52 ` [PATCH 3/4] nilfs2: align block comments of nilfs_sufile_truncate_range() at * Ryusuke Konishi
  2017-10-30 12:52 ` [PATCH 4/4] nilfs2: use octal for unreadable permission macro Ryusuke Konishi
  3 siblings, 0 replies; 5+ messages in thread
From: Ryusuke Konishi @ 2017-10-30 12:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, linux-nilfs, Ryusuke Konishi, Elena Reshetova

From: Elena Reshetova <elena.reshetova@intel.com>

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nilfs_root.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/the_nilfs.c | 8 ++++----
 fs/nilfs2/the_nilfs.h | 5 +++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 2dd75bf..afebb50 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -737,7 +737,7 @@ struct nilfs_root *nilfs_lookup_root(struct the_nilfs *nilfs, __u64 cno)
 		} else if (cno > root->cno) {
 			n = n->rb_right;
 		} else {
-			atomic_inc(&root->count);
+			refcount_inc(&root->count);
 			spin_unlock(&nilfs->ns_cptree_lock);
 			return root;
 		}
@@ -776,7 +776,7 @@ struct nilfs_root *
 		} else if (cno > root->cno) {
 			p = &(*p)->rb_right;
 		} else {
-			atomic_inc(&root->count);
+			refcount_inc(&root->count);
 			spin_unlock(&nilfs->ns_cptree_lock);
 			kfree(new);
 			return root;
@@ -786,7 +786,7 @@ struct nilfs_root *
 	new->cno = cno;
 	new->ifile = NULL;
 	new->nilfs = nilfs;
-	atomic_set(&new->count, 1);
+	refcount_set(&new->count, 1);
 	atomic64_set(&new->inodes_count, 0);
 	atomic64_set(&new->blocks_count, 0);
 
@@ -806,7 +806,7 @@ struct nilfs_root *
 
 void nilfs_put_root(struct nilfs_root *root)
 {
-	if (atomic_dec_and_test(&root->count)) {
+	if (refcount_dec_and_test(&root->count)) {
 		struct the_nilfs *nilfs = root->nilfs;
 
 		nilfs_sysfs_delete_snapshot_group(root);
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index b305c6f..883d732 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -27,6 +27,7 @@
 #include <linux/blkdev.h>
 #include <linux/backing-dev.h>
 #include <linux/slab.h>
+#include <linux/refcount.h>
 
 struct nilfs_sc_info;
 struct nilfs_sysfs_dev_subgroups;
@@ -246,7 +247,7 @@ struct nilfs_root {
 	__u64 cno;
 	struct rb_node rb_node;
 
-	atomic_t count;
+	refcount_t count;
 	struct the_nilfs *nilfs;
 	struct inode *ifile;
 
@@ -299,7 +300,7 @@ struct nilfs_root *nilfs_find_or_create_root(struct the_nilfs *nilfs,
 
 static inline void nilfs_get_root(struct nilfs_root *root)
 {
-	atomic_inc(&root->count);
+	refcount_inc(&root->count);
 }
 
 static inline int nilfs_valid_fs(struct the_nilfs *nilfs)
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/4] nilfs2: align block comments of nilfs_sufile_truncate_range() at *
  2017-10-30 12:52 [PATCH 0/4] nilfs2 updates Ryusuke Konishi
  2017-10-30 12:52 ` [PATCH 1/4] nilfs2: Fix race condition that causes file system corruption Ryusuke Konishi
  2017-10-30 12:52 ` [PATCH 2/4] fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t Ryusuke Konishi
@ 2017-10-30 12:52 ` Ryusuke Konishi
  2017-10-30 12:52 ` [PATCH 4/4] nilfs2: use octal for unreadable permission macro Ryusuke Konishi
  3 siblings, 0 replies; 5+ messages in thread
From: Ryusuke Konishi @ 2017-10-30 12:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, linux-nilfs, Ryusuke Konishi

Fix the following checkpatch warning:

 WARNING: Block comments should align the * on each line
 #633: FILE: sufile.c:633:
 +/**
 +  * nilfs_sufile_truncate_range - truncate range of segment array

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/sufile.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 1541a1e..1341a41 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -630,22 +630,22 @@ void nilfs_sufile_do_set_error(struct inode *sufile, __u64 segnum,
 }
 
 /**
-  * nilfs_sufile_truncate_range - truncate range of segment array
-  * @sufile: inode of segment usage file
-  * @start: start segment number (inclusive)
-  * @end: end segment number (inclusive)
-  *
-  * Return Value: On success, 0 is returned.  On error, one of the
-  * following negative error codes is returned.
-  *
-  * %-EIO - I/O error.
-  *
-  * %-ENOMEM - Insufficient amount of memory available.
-  *
-  * %-EINVAL - Invalid number of segments specified
-  *
-  * %-EBUSY - Dirty or active segments are present in the range
-  */
+ * nilfs_sufile_truncate_range - truncate range of segment array
+ * @sufile: inode of segment usage file
+ * @start: start segment number (inclusive)
+ * @end: end segment number (inclusive)
+ *
+ * Return Value: On success, 0 is returned.  On error, one of the
+ * following negative error codes is returned.
+ *
+ * %-EIO - I/O error.
+ *
+ * %-ENOMEM - Insufficient amount of memory available.
+ *
+ * %-EINVAL - Invalid number of segments specified
+ *
+ * %-EBUSY - Dirty or active segments are present in the range
+ */
 static int nilfs_sufile_truncate_range(struct inode *sufile,
 				       __u64 start, __u64 end)
 {
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 4/4] nilfs2: use octal for unreadable permission macro
  2017-10-30 12:52 [PATCH 0/4] nilfs2 updates Ryusuke Konishi
                   ` (2 preceding siblings ...)
  2017-10-30 12:52 ` [PATCH 3/4] nilfs2: align block comments of nilfs_sufile_truncate_range() at * Ryusuke Konishi
@ 2017-10-30 12:52 ` Ryusuke Konishi
  3 siblings, 0 replies; 5+ messages in thread
From: Ryusuke Konishi @ 2017-10-30 12:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, linux-nilfs, Ryusuke Konishi

Replace S_IRWXUGO with 0777 because symbolic permissions are
considered harmful:

 https://lwn.net/Articles/696229/

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
---
 fs/nilfs2/namei.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nilfs2/namei.c b/fs/nilfs2/namei.c
index 515d13c..1a2894a 100644
--- a/fs/nilfs2/namei.c
+++ b/fs/nilfs2/namei.c
@@ -150,7 +150,7 @@ static int nilfs_symlink(struct inode *dir, struct dentry *dentry,
 	if (err)
 		return err;
 
-	inode = nilfs_new_inode(dir, S_IFLNK | S_IRWXUGO);
+	inode = nilfs_new_inode(dir, S_IFLNK | 0777);
 	err = PTR_ERR(inode);
 	if (IS_ERR(inode))
 		goto out;
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-30 12:52 [PATCH 0/4] nilfs2 updates Ryusuke Konishi
2017-10-30 12:52 ` [PATCH 1/4] nilfs2: Fix race condition that causes file system corruption Ryusuke Konishi
2017-10-30 12:52 ` [PATCH 2/4] fs, nilfs: convert nilfs_root.count from atomic_t to refcount_t Ryusuke Konishi
2017-10-30 12:52 ` [PATCH 3/4] nilfs2: align block comments of nilfs_sufile_truncate_range() at * Ryusuke Konishi
2017-10-30 12:52 ` [PATCH 4/4] nilfs2: use octal for unreadable permission macro Ryusuke Konishi

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox