Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/2] vfs: make active swap files unwritable
@ 2019-08-15 16:05 Darrick J. Wong
  2019-08-15 16:05 ` [PATCH 1/2] mm: set S_SWAPFILE on blockdev swap devices Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-08-15 16:05 UTC (permalink / raw)
  To: hch, akpm, tytso, viro, darrick.wong
  Cc: linux-xfs, linux-fsdevel, linux-kernel, linux-mm

Hi all,

I discovered that it's possible for userspace to write to active swap
files and swap devices.  While activated, the kernel effectively holds
an irrevocable (except by swapoff) longterm lease on the storage
associated with the swap device, so we need to shut down this vector for
memory corruption of userspace programs.

Nothing new for v3 other than rebasing against 5.3-rc4.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This has been lightly tested with fstests.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=immutable-swapfiles

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=immutable-swapfiles

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] mm: set S_SWAPFILE on blockdev swap devices
  2019-08-15 16:05 [PATCH v3 0/2] vfs: make active swap files unwritable Darrick J. Wong
@ 2019-08-15 16:05 ` Darrick J. Wong
  2019-08-16  6:39   ` Christoph Hellwig
  2019-08-15 16:05 ` [PATCH 2/2] vfs: don't allow writes to swap files Darrick J. Wong
  2019-08-15 16:34 ` [PATCH RFC 3/2] fstests: check that we can't write " Darrick J. Wong
  2 siblings, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2019-08-15 16:05 UTC (permalink / raw)
  To: hch, akpm, tytso, viro, darrick.wong
  Cc: linux-xfs, linux-fsdevel, linux-kernel, linux-mm

From: Darrick J. Wong <darrick.wong@oracle.com>

Set S_SWAPFILE on block device inodes so that they have the same
protections as a swap flie.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 mm/swapfile.c |   31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)


diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0789a762ce2f..a53b7c49b40e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2368,9 +2368,8 @@ EXPORT_SYMBOL_GPL(add_swap_extent);
  * requirements, they are simply tossed out - we will never use those blocks
  * for swapping.
  *
- * For S_ISREG swapfiles we set S_SWAPFILE across the life of the swapon.  This
- * prevents root from shooting her foot off by ftruncating an in-use swapfile,
- * which will scribble on the fs.
+ * For all swap devices we set S_SWAPFILE across the life of the swapon.  This
+ * prevents users from writing to the swap device, which will corrupt memory.
  *
  * The amount of disk space which a single swap extent represents varies.
  * Typically it is in the 1-4 megabyte range.  So we can have hundreds of
@@ -2661,13 +2660,14 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	inode = mapping->host;
 	if (S_ISBLK(inode->i_mode)) {
 		struct block_device *bdev = I_BDEV(inode);
+
 		set_blocksize(bdev, old_block_size);
 		blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
-	} else {
-		inode_lock(inode);
-		inode->i_flags &= ~S_SWAPFILE;
-		inode_unlock(inode);
 	}
+
+	inode_lock(inode);
+	inode->i_flags &= ~S_SWAPFILE;
+	inode_unlock(inode);
 	filp_close(swap_file, NULL);
 
 	/*
@@ -2890,11 +2890,11 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 		p->flags |= SWP_BLKDEV;
 	} else if (S_ISREG(inode->i_mode)) {
 		p->bdev = inode->i_sb->s_bdev;
-		inode_lock(inode);
-		if (IS_SWAPFILE(inode))
-			return -EBUSY;
-	} else
-		return -EINVAL;
+	}
+
+	inode_lock(inode);
+	if (IS_SWAPFILE(inode))
+		return -EBUSY;
 
 	return 0;
 }
@@ -3295,8 +3295,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	atomic_inc(&proc_poll_event);
 	wake_up_interruptible(&proc_poll_wait);
 
-	if (S_ISREG(inode->i_mode))
-		inode->i_flags |= S_SWAPFILE;
+	inode->i_flags |= S_SWAPFILE;
 	error = 0;
 	goto out;
 bad_swap:
@@ -3318,7 +3317,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	if (inced_nr_rotate_swap)
 		atomic_dec(&nr_rotate_swap);
 	if (swap_file) {
-		if (inode && S_ISREG(inode->i_mode)) {
+		if (inode) {
 			inode_unlock(inode);
 			inode = NULL;
 		}
@@ -3331,7 +3330,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	}
 	if (name)
 		putname(name);
-	if (inode && S_ISREG(inode->i_mode))
+	if (inode)
 		inode_unlock(inode);
 	if (!error)
 		enable_swap_slots_cache();


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 2/2] vfs: don't allow writes to swap files
  2019-08-15 16:05 [PATCH v3 0/2] vfs: make active swap files unwritable Darrick J. Wong
  2019-08-15 16:05 ` [PATCH 1/2] mm: set S_SWAPFILE on blockdev swap devices Darrick J. Wong
@ 2019-08-15 16:05 ` Darrick J. Wong
  2019-08-16  6:41   ` Christoph Hellwig
  2019-08-16 16:19   ` [PATCH v2 " Darrick J. Wong
  2019-08-15 16:34 ` [PATCH RFC 3/2] fstests: check that we can't write " Darrick J. Wong
  2 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-08-15 16:05 UTC (permalink / raw)
  To: hch, akpm, tytso, viro, darrick.wong
  Cc: linux-xfs, linux-fsdevel, linux-kernel, linux-mm

From: Darrick J. Wong <darrick.wong@oracle.com>

Don't let userspace write to an active swap file because the kernel
effectively has a long term lease on the storage and things could get
seriously corrupted if we let this happen.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/block_dev.c |    3 +++
 mm/filemap.c   |    3 +++
 mm/memory.c    |    4 ++++
 mm/mmap.c      |    8 ++++++--
 mm/swapfile.c  |   12 +++++++++++-
 5 files changed, 27 insertions(+), 3 deletions(-)


diff --git a/fs/block_dev.c b/fs/block_dev.c
index eb657ab94060..017f46719ebe 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -2011,6 +2011,9 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (bdev_read_only(I_BDEV(bd_inode)))
 		return -EPERM;
 
+	if (IS_SWAPFILE(bd_inode))
+		return -ETXTBSY;
+
 	if (!iov_iter_count(from))
 		return 0;
 
diff --git a/mm/filemap.c b/mm/filemap.c
index d0cf700bf201..40667c2f3383 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2988,6 +2988,9 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
 	loff_t count;
 	int ret;
 
+	if (IS_SWAPFILE(inode))
+		return -ETXTBSY;
+
 	if (!iov_iter_count(from))
 		return 0;
 
diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..b1dff75640b7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2196,6 +2196,10 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf)
 
 	vmf->flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
 
+	if (vmf->vma->vm_file &&
+	    IS_SWAPFILE(vmf->vma->vm_file->f_mapping->host))
+		return VM_FAULT_SIGBUS;
+
 	ret = vmf->vma->vm_ops->page_mkwrite(vmf);
 	/* Restore original flags so that caller is not surprised */
 	vmf->flags = old_flags;
diff --git a/mm/mmap.c b/mm/mmap.c
index 7e8c3e8ae75f..6bc21fca20bc 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1483,8 +1483,12 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		case MAP_SHARED_VALIDATE:
 			if (flags & ~flags_mask)
 				return -EOPNOTSUPP;
-			if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE))
-				return -EACCES;
+			if (prot & PROT_WRITE) {
+				if (!(file->f_mode & FMODE_WRITE))
+					return -EACCES;
+				if (IS_SWAPFILE(file->f_mapping->host))
+					return -ETXTBSY;
+			}
 
 			/*
 			 * Make sure we don't allow writing to an append-only
diff --git a/mm/swapfile.c b/mm/swapfile.c
index a53b7c49b40e..dab43523afdd 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3275,6 +3275,17 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	if (error)
 		goto bad_swap;
 
+	/*
+	 * Flush any pending IO and dirty mappings before we start using this
+	 * swap device.
+	 */
+	inode->i_flags |= S_SWAPFILE;
+	error = inode_drain_writes(inode);
+	if (error) {
+		inode->i_flags &= ~S_SWAPFILE;
+		goto bad_swap;
+	}
+
 	mutex_lock(&swapon_mutex);
 	prio = -1;
 	if (swap_flags & SWAP_FLAG_PREFER)
@@ -3295,7 +3306,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	atomic_inc(&proc_poll_event);
 	wake_up_interruptible(&proc_poll_wait);
 
-	inode->i_flags |= S_SWAPFILE;
 	error = 0;
 	goto out;
 bad_swap:


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH RFC 3/2] fstests: check that we can't write to swap files
  2019-08-15 16:05 [PATCH v3 0/2] vfs: make active swap files unwritable Darrick J. Wong
  2019-08-15 16:05 ` [PATCH 1/2] mm: set S_SWAPFILE on blockdev swap devices Darrick J. Wong
  2019-08-15 16:05 ` [PATCH 2/2] vfs: don't allow writes to swap files Darrick J. Wong
@ 2019-08-15 16:34 ` " Darrick J. Wong
  2019-08-15 21:26   ` Andrew Morton
  2019-08-17  2:05   ` [PATCH v2 " Darrick J. Wong
  2 siblings, 2 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-08-15 16:34 UTC (permalink / raw)
  To: hch, akpm, tytso, viro
  Cc: linux-xfs, linux-fsdevel, linux-kernel, linux-mm, fstests


From: Darrick J. Wong <darrick.wong@oracle.com>

While active, the media backing a swap file is leased to the kernel.
Userspace has no business writing to it.  Make sure we can't do this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 tests/generic/717     |   60 +++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/717.out |    7 ++++++
 tests/generic/718     |   46 ++++++++++++++++++++++++++++++++++++++
 tests/generic/718.out |    5 ++++
 tests/generic/group   |    2 ++
 5 files changed, 120 insertions(+)
 create mode 100755 tests/generic/717
 create mode 100644 tests/generic/717.out
 create mode 100755 tests/generic/718
 create mode 100644 tests/generic/718.out

diff --git a/tests/generic/717 b/tests/generic/717
new file mode 100755
index 00000000..ab12ee4d
--- /dev/null
+++ b/tests/generic/717
@@ -0,0 +1,60 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-newer
+# Copyright (c) 2019, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# FS QA Test No. 717
+#
+# Check that we can't modify a file that's an active swap file.
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1    # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	swapoff $testfile
+	rm -rf $tmp.* $testfile
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs generic
+_require_scratch_swapfile
+
+rm -f $seqres.full
+
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+testfile=$SCRATCH_MNT/$seq.swap
+
+_format_swapfile $testfile 20m
+swapon $testfile 2>&1 | _filter_scratch
+
+# Can we write to it?
+$XFS_IO_PROG -c 'pwrite -S 0x59 64k 64k' $testfile
+$XFS_IO_PROG -d -c 'pwrite -S 0x60 64k 64k' $testfile
+$XFS_IO_PROG -c 'mmap -rw 64k 64k' -c 'mwrite -S 0x61 64k 64k' $testfile
+
+# Can we change the file size?
+$XFS_IO_PROG -c 'truncate 18m' $testfile
+
+# Can you fallocate the file?
+$XFS_IO_PROG -c 'falloc 0 32m' $testfile
+
+# We test that you can't reflink, dedupe, or copy_file_range into a swapfile
+# in other tests.
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/717.out b/tests/generic/717.out
new file mode 100644
index 00000000..2cd9bcdb
--- /dev/null
+++ b/tests/generic/717.out
@@ -0,0 +1,7 @@
+QA output created by 717
+pwrite: Text file busy
+pwrite: Text file busy
+mmap: Text file busy
+no mapped regions, try 'help mmap'
+ftruncate: Text file busy
+fallocate: Text file busy
diff --git a/tests/generic/718 b/tests/generic/718
new file mode 100755
index 00000000..35cf718f
--- /dev/null
+++ b/tests/generic/718
@@ -0,0 +1,46 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-newer
+# Copyright (c) 2019, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# FS QA Test No. 718
+#
+# Check that we can't modify a block device that's an active swap device.
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1    # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	swapoff $SCRATCH_DEV
+	rm -rf $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs generic
+_require_scratch_nocheck
+
+rm -f $seqres.full
+
+$MKSWAP_PROG "$SCRATCH_DEV" >> $seqres.full
+swapon $SCRATCH_DEV 2>&1 | _filter_scratch
+
+# Can we write to it?
+$XFS_IO_PROG -c 'pwrite -S 0x59 64k 64k' $SCRATCH_DEV
+$XFS_IO_PROG -d -c 'pwrite -S 0x60 64k 64k' $SCRATCH_DEV
+$XFS_IO_PROG -c 'mmap -rw 64k 64k' -c 'mwrite -S 0x61 64k 64k' $SCRATCH_DEV
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/718.out b/tests/generic/718.out
new file mode 100644
index 00000000..5cd25b9a
--- /dev/null
+++ b/tests/generic/718.out
@@ -0,0 +1,5 @@
+QA output created by 718
+pwrite: Text file busy
+pwrite: Text file busy
+mmap: Text file busy
+no mapped regions, try 'help mmap'
diff --git a/tests/generic/group b/tests/generic/group
index 003fa963..c58d41e3 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -570,3 +570,5 @@
 565 auto quick copy_range
 715 auto quick rw
 716 auto quick rw
+717 auto quick rw swap
+718 auto quick rw swap

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 3/2] fstests: check that we can't write to swap files
  2019-08-15 16:34 ` [PATCH RFC 3/2] fstests: check that we can't write " Darrick J. Wong
@ 2019-08-15 21:26   ` Andrew Morton
  2019-08-16  2:13     ` Darrick J. Wong
  2019-08-17  2:05   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2019-08-15 21:26 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: hch, tytso, viro, linux-xfs, linux-fsdevel, linux-kernel,
	linux-mm, fstests

On Thu, 15 Aug 2019 09:34:34 -0700 "Darrick J. Wong" <darrick.wong@oracle.com> wrote:

> While active, the media backing a swap file is leased to the kernel.
> Userspace has no business writing to it.  Make sure we can't do this.

I don't think this tests the case where a file was already open for
writing and someone does swapon(that file)?

And then does swapoff(that file), when writes should start working again?

Ditto all the above, with s/open/mmap/.


Do we handle (and test!) the case where there's unwritten dirty
pagecache at the time of swapon()?  Ditto pte-dirty MAP_SHARED pages?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH RFC 3/2] fstests: check that we can't write to swap files
  2019-08-15 21:26   ` Andrew Morton
@ 2019-08-16  2:13     ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-08-16  2:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hch, tytso, viro, linux-xfs, linux-fsdevel, linux-kernel,
	linux-mm, fstests

On Thu, Aug 15, 2019 at 02:26:03PM -0700, Andrew Morton wrote:
> On Thu, 15 Aug 2019 09:34:34 -0700 "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
> 
> > While active, the media backing a swap file is leased to the kernel.
> > Userspace has no business writing to it.  Make sure we can't do this.
> 
> I don't think this tests the case where a file was already open for
> writing and someone does swapon(that file)?
> 
> And then does swapoff(that file), when writes should start working again?
> 
> Ditto all the above, with s/open/mmap/.

Heh, ok.  I'll start working on a C program to do that.

> Do we handle (and test!) the case where there's unwritten dirty
> pagecache at the time of swapon()? Ditto pte-dirty MAP_SHARED pages?

One of the tests I wrote for iomap_swapfile_activate way back when
checks that.  The iomap version calls vfs_fsync, but AFAICT
generic_swapfile_activate doesn't do that.

--D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm: set S_SWAPFILE on blockdev swap devices
  2019-08-15 16:05 ` [PATCH 1/2] mm: set S_SWAPFILE on blockdev swap devices Darrick J. Wong
@ 2019-08-16  6:39   ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2019-08-16  6:39 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: hch, akpm, tytso, viro, linux-xfs, linux-fsdevel, linux-kernel, linux-mm

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] vfs: don't allow writes to swap files
  2019-08-15 16:05 ` [PATCH 2/2] vfs: don't allow writes to swap files Darrick J. Wong
@ 2019-08-16  6:41   ` Christoph Hellwig
  2019-08-16  6:48     ` Darrick J. Wong
  2019-08-16 16:19   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2019-08-16  6:41 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: hch, akpm, tytso, viro, linux-xfs, linux-fsdevel, linux-kernel, linux-mm

The new checks look fine to me, but where does the inode_drain_writes()
function come from, I can't find that in my tree anywhere.

Also what does inode_drain_writes do about existing shared writable
mapping?  Do we even care about that corner case?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] vfs: don't allow writes to swap files
  2019-08-16  6:41   ` Christoph Hellwig
@ 2019-08-16  6:48     ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-08-16  6:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: akpm, tytso, viro, linux-xfs, linux-fsdevel, linux-kernel, linux-mm

On Thu, Aug 15, 2019 at 11:41:21PM -0700, Christoph Hellwig wrote:
> The new checks look fine to me, but where does the inode_drain_writes()
> function come from, I can't find that in my tree anywhere.

Doh.  Forgot to include that patch in the series. :(

/*
 * Flush file data before changing attributes.  Caller must hold any locks
 * required to prevent further writes to this file until we're done setting
 * flags.
 */
static inline int inode_drain_writes(struct inode *inode)
{
       inode_dio_wait(inode);
       return filemap_write_and_wait(inode->i_mapping);
}

> Also what does inode_drain_writes do about existing shared writable
> mapping?  Do we even care about that corner case?

We probably ought to flush and invalidate the pagecache for the entire
file so that page_mkwrite can bounce off the swapfile.

--D

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 2/2] vfs: don't allow writes to swap files
  2019-08-15 16:05 ` [PATCH 2/2] vfs: don't allow writes to swap files Darrick J. Wong
  2019-08-16  6:41   ` Christoph Hellwig
@ 2019-08-16 16:19   ` " Darrick J. Wong
  2019-08-18  8:39     ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2019-08-16 16:19 UTC (permalink / raw)
  To: hch, akpm, tytso, viro; +Cc: linux-xfs, linux-fsdevel, linux-kernel, linux-mm

From: Darrick J. Wong <darrick.wong@oracle.com>

Don't let userspace write to an active swap file because the kernel
effectively has a long term lease on the storage and things could get
seriously corrupted if we let this happen.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: add missing inode_drain_writes
---
 fs/block_dev.c     |    3 +++
 include/linux/fs.h |   11 +++++++++++
 mm/filemap.c       |    3 +++
 mm/memory.c        |    4 ++++
 mm/mmap.c          |    8 ++++++--
 mm/swapfile.c      |   12 +++++++++++-
 6 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index c2a85b587922..d9bab63a9b81 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1978,6 +1978,9 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (bdev_read_only(I_BDEV(bd_inode)))
 		return -EPERM;
 
+	if (IS_SWAPFILE(bd_inode))
+		return -ETXTBSY;
+
 	if (!iov_iter_count(from))
 		return 0;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 56b8e358af5c..a2e3d446ba8e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3547,4 +3547,15 @@ static inline void simple_fill_fsxattr(struct fsxattr *fa, __u32 xflags)
 	fa->fsx_xflags = xflags;
 }
 
+/*
+ * Flush file data before changing attributes.  Caller must hold any locks
+ * required to prevent further writes to this file until we're done setting
+ * flags.
+ */
+static inline int inode_drain_writes(struct inode *inode)
+{
+	inode_dio_wait(inode);
+	return filemap_write_and_wait(inode->i_mapping);
+}
+
 #endif /* _LINUX_FS_H */
diff --git a/mm/filemap.c b/mm/filemap.c
index f93c58dde9c8..89bcdde903a3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2989,6 +2989,9 @@ inline ssize_t generic_write_checks(struct kiocb *iocb, struct iov_iter *from)
 	loff_t count;
 	int ret;
 
+	if (IS_SWAPFILE(inode))
+		return -ETXTBSY;
+
 	if (!iov_iter_count(from))
 		return 0;
 
diff --git a/mm/memory.c b/mm/memory.c
index e2bb51b6242e..b1dff75640b7 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2196,6 +2196,10 @@ static vm_fault_t do_page_mkwrite(struct vm_fault *vmf)
 
 	vmf->flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
 
+	if (vmf->vma->vm_file &&
+	    IS_SWAPFILE(vmf->vma->vm_file->f_mapping->host))
+		return VM_FAULT_SIGBUS;
+
 	ret = vmf->vma->vm_ops->page_mkwrite(vmf);
 	/* Restore original flags so that caller is not surprised */
 	vmf->flags = old_flags;
diff --git a/mm/mmap.c b/mm/mmap.c
index 7e8c3e8ae75f..6bc21fca20bc 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1483,8 +1483,12 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		case MAP_SHARED_VALIDATE:
 			if (flags & ~flags_mask)
 				return -EOPNOTSUPP;
-			if ((prot&PROT_WRITE) && !(file->f_mode&FMODE_WRITE))
-				return -EACCES;
+			if (prot & PROT_WRITE) {
+				if (!(file->f_mode & FMODE_WRITE))
+					return -EACCES;
+				if (IS_SWAPFILE(file->f_mapping->host))
+					return -ETXTBSY;
+			}
 
 			/*
 			 * Make sure we don't allow writing to an append-only
diff --git a/mm/swapfile.c b/mm/swapfile.c
index a53b7c49b40e..dab43523afdd 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3275,6 +3275,17 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	if (error)
 		goto bad_swap;
 
+	/*
+	 * Flush any pending IO and dirty mappings before we start using this
+	 * swap device.
+	 */
+	inode->i_flags |= S_SWAPFILE;
+	error = inode_drain_writes(inode);
+	if (error) {
+		inode->i_flags &= ~S_SWAPFILE;
+		goto bad_swap;
+	}
+
 	mutex_lock(&swapon_mutex);
 	prio = -1;
 	if (swap_flags & SWAP_FLAG_PREFER)
@@ -3295,7 +3306,6 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	atomic_inc(&proc_poll_event);
 	wake_up_interruptible(&proc_poll_wait);
 
-	inode->i_flags |= S_SWAPFILE;
 	error = 0;
 	goto out;
 bad_swap:

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 RFC 3/2] fstests: check that we can't write to swap files
  2019-08-15 16:34 ` [PATCH RFC 3/2] fstests: check that we can't write " Darrick J. Wong
  2019-08-15 21:26   ` Andrew Morton
@ 2019-08-17  2:05   ` " Darrick J. Wong
  1 sibling, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-08-17  2:05 UTC (permalink / raw)
  To: hch, akpm, tytso, viro
  Cc: linux-xfs, linux-fsdevel, linux-kernel, linux-mm, fstests

From: Darrick J. Wong <darrick.wong@oracle.com>

While active, the media backing a swap file is leased to the kernel.
Userspace has no business writing to it.  Make sure we can't do this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: add tests for writable fds after swapon
---
 src/swapon.c          |  135 ++++++++++++++++++++++++++++++++++++++++++++++++-
 tests/generic/717     |   70 +++++++++++++++++++++++++
 tests/generic/717.out |   14 +++++
 tests/generic/718     |   55 ++++++++++++++++++++
 tests/generic/718.out |   12 ++++
 tests/generic/group   |    2 +
 6 files changed, 284 insertions(+), 4 deletions(-)
 create mode 100755 tests/generic/717
 create mode 100644 tests/generic/717.out
 create mode 100755 tests/generic/718
 create mode 100644 tests/generic/718.out

diff --git a/src/swapon.c b/src/swapon.c
index 0cb7108a..afaed405 100644
--- a/src/swapon.c
+++ b/src/swapon.c
@@ -3,22 +3,149 @@
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
+#include <string.h>
 #include <sys/swap.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <signal.h>
+
+static void usage(const char *prog)
+{
+	fprintf(stderr, "usage: %s [-v verb] PATH\n", prog);
+	exit(EXIT_FAILURE);
+}
+
+enum verbs {
+	TEST_SWAPON = 0,
+	TEST_WRITE,
+	TEST_MWRITE_AFTER,
+	TEST_MWRITE_BEFORE_AND_MWRITE_AFTER,
+	TEST_MWRITE_BEFORE,
+	MAX_TEST_VERBS,
+};
+
+#define BUF_SIZE 262144
+static char buf[BUF_SIZE];
+
+static void handle_signal(int signal)
+{
+	fprintf(stderr, "Caught signal %d, terminating...\n", signal);
+	exit(EXIT_FAILURE);
+}
 
 int main(int argc, char **argv)
 {
-	int ret;
+	struct sigaction act = {
+		.sa_handler	= handle_signal,
+	};
+	enum verbs verb = TEST_SWAPON;
+	void *p;
+	ssize_t sz;
+	int fd = -1;
+	int ret, c;
+
+	memset(buf, 0x58, BUF_SIZE);
+
+	while ((c = getopt(argc, argv, "v:")) != -1) {
+		switch (c) {
+		case 'v':
+			verb = atoi(optarg);
+			if (verb < TEST_SWAPON || verb >= MAX_TEST_VERBS) {
+				fprintf(stderr, "Verbs must be 0-%d.\n",
+						MAX_TEST_VERBS - 1);
+				usage(argv[0]);
+			}
+			break;
+		default:
+			usage(argv[0]);
+			break;
+		}
+	}
 
-	if (argc != 2) {
-		fprintf(stderr, "usage: %s PATH\n", argv[0]);
+	ret = sigaction(SIGSEGV, &act, NULL);
+	if (ret) {
+		perror("sigsegv action");
 		return EXIT_FAILURE;
 	}
 
-	ret = swapon(argv[1], 0);
+	ret = sigaction(SIGBUS, &act, NULL);
+	if (ret) {
+		perror("sigbus action");
+		return EXIT_FAILURE;
+	}
+
+	switch (verb) {
+	case TEST_WRITE:
+	case TEST_MWRITE_AFTER:
+	case TEST_MWRITE_BEFORE_AND_MWRITE_AFTER:
+	case TEST_MWRITE_BEFORE:
+		fd = open(argv[optind], O_RDWR);
+		if (fd < 0) {
+			perror(argv[optind]);
+			return EXIT_FAILURE;
+		}
+		break;
+	default:
+		break;
+	}
+
+	switch (verb) {
+	case TEST_MWRITE_BEFORE_AND_MWRITE_AFTER:
+	case TEST_MWRITE_BEFORE:
+		p = mmap(NULL, BUF_SIZE, PROT_WRITE | PROT_READ, MAP_SHARED,
+				fd, 65536);
+		if (p == MAP_FAILED) {
+			perror("mmap");
+			return EXIT_FAILURE;
+		}
+		memcpy(p, buf, BUF_SIZE);
+		break;
+	default:
+		break;
+	}
+
+	if (optind != argc - 1)
+		usage(argv[0]);
+
+	ret = swapon(argv[optind], 0);
 	if (ret) {
 		perror("swapon");
 		return EXIT_FAILURE;
 	}
 
+	switch (verb) {
+	case TEST_WRITE:
+		sz = pwrite(fd, buf, BUF_SIZE, 65536);
+		if (sz < 0) {
+			perror("pwrite");
+			return EXIT_FAILURE;
+		}
+		break;
+	case TEST_MWRITE_AFTER:
+		p = mmap(NULL, BUF_SIZE, PROT_WRITE | PROT_READ, MAP_SHARED,
+				fd, 65536);
+		if (p == MAP_FAILED) {
+			perror("mmap");
+			return EXIT_FAILURE;
+		}
+		/* fall through */
+	case TEST_MWRITE_BEFORE_AND_MWRITE_AFTER:
+		memcpy(p, buf, BUF_SIZE);
+		break;
+	default:
+		break;
+	}
+
+	if (fd >= 0) {
+		ret = fsync(fd);
+		if (ret)
+			perror("fsync");
+		ret = close(fd);
+		if (ret)
+			perror("close");
+	}
+
 	return EXIT_SUCCESS;
 }
diff --git a/tests/generic/717 b/tests/generic/717
new file mode 100755
index 00000000..92073dbb
--- /dev/null
+++ b/tests/generic/717
@@ -0,0 +1,70 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-newer
+# Copyright (c) 2019, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# FS QA Test No. 717
+#
+# Check that we can't modify a file that's an active swap file.
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1    # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	swapoff $testfile
+	rm -rf $tmp.* $testfile
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs generic
+_require_test_program swapon
+_require_scratch_swapfile
+
+rm -f $seqres.full
+
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+testfile=$SCRATCH_MNT/$seq.swap
+
+_format_swapfile $testfile 20m
+
+# Can you modify the swapfile via previously open file descriptors?
+for verb in 1 2 3 4; do
+	echo "verb $verb"
+	"$here/src/swapon" -v $verb $testfile
+	swapoff $testfile
+done
+
+# Now try writing with a new file descriptor.
+swapon $testfile 2>&1 | _filter_scratch
+
+# Can we write to it?
+$XFS_IO_PROG -c 'pwrite -S 0x59 64k 64k' $testfile
+$XFS_IO_PROG -d -c 'pwrite -S 0x60 64k 64k' $testfile
+$XFS_IO_PROG -c 'mmap -rw 64k 64k' -c 'mwrite -S 0x61 64k 64k' $testfile
+
+# Can we change the file size?
+$XFS_IO_PROG -c 'truncate 18m' $testfile
+
+# Can you fallocate the file?
+$XFS_IO_PROG -c 'falloc 0 32m' $testfile
+
+# We test that you can't reflink, dedupe, or copy_file_range into a swapfile
+# in other tests.
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/717.out b/tests/generic/717.out
new file mode 100644
index 00000000..59345ca1
--- /dev/null
+++ b/tests/generic/717.out
@@ -0,0 +1,14 @@
+QA output created by 717
+verb 1
+pwrite: Text file busy
+verb 2
+mmap: Text file busy
+verb 3
+Caught signal 7, terminating...
+verb 4
+pwrite: Text file busy
+pwrite: Text file busy
+mmap: Text file busy
+no mapped regions, try 'help mmap'
+ftruncate: Text file busy
+fallocate: Text file busy
diff --git a/tests/generic/718 b/tests/generic/718
new file mode 100755
index 00000000..504022e1
--- /dev/null
+++ b/tests/generic/718
@@ -0,0 +1,55 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-newer
+# Copyright (c) 2019, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# FS QA Test No. 718
+#
+# Check that we can't modify a block device that's an active swap device.
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1    # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+	cd /
+	swapoff $SCRATCH_DEV
+	rm -rf $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_os Linux
+_supported_fs generic
+_require_test_program swapon
+_require_scratch_nocheck
+
+rm -f $seqres.full
+
+$MKSWAP_PROG "$SCRATCH_DEV" >> $seqres.full
+
+# Can you modify the swap dev via previously open file descriptors?
+for verb in 1 2 3 4; do
+	echo "verb $verb"
+	"$here/src/swapon" -v $verb $SCRATCH_DEV
+	swapoff $SCRATCH_DEV
+done
+
+swapon $SCRATCH_DEV 2>&1 | _filter_scratch
+
+# Can we write to it?
+$XFS_IO_PROG -c 'pwrite -S 0x59 64k 64k' $SCRATCH_DEV
+$XFS_IO_PROG -d -c 'pwrite -S 0x60 64k 64k' $SCRATCH_DEV
+$XFS_IO_PROG -c 'mmap -rw 64k 64k' -c 'mwrite -S 0x61 64k 64k' $SCRATCH_DEV
+
+# success, all done
+status=0
+exit
diff --git a/tests/generic/718.out b/tests/generic/718.out
new file mode 100644
index 00000000..88d5cf3e
--- /dev/null
+++ b/tests/generic/718.out
@@ -0,0 +1,12 @@
+QA output created by 718
+verb 1
+pwrite: Text file busy
+verb 2
+mmap: Text file busy
+verb 3
+Caught signal 7, terminating...
+verb 4
+pwrite: Text file busy
+pwrite: Text file busy
+mmap: Text file busy
+no mapped regions, try 'help mmap'
diff --git a/tests/generic/group b/tests/generic/group
index 003fa963..c58d41e3 100644
--- a/tests/generic/group
+++ b/tests/generic/group
@@ -570,3 +570,5 @@
 565 auto quick copy_range
 715 auto quick rw
 716 auto quick rw
+717 auto quick rw swap
+718 auto quick rw swap

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 2/2] vfs: don't allow writes to swap files
  2019-08-16 16:19   ` [PATCH v2 " Darrick J. Wong
@ 2019-08-18  8:39     ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2019-08-18  8:39 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: hch, akpm, tytso, viro, linux-xfs, linux-fsdevel, linux-kernel, linux-mm

On Fri, Aug 16, 2019 at 09:19:49AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Don't let userspace write to an active swap file because the kernel
> effectively has a long term lease on the storage and things could get
> seriously corrupted if we let this happen.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] mm: set S_SWAPFILE on blockdev swap devices
  2019-06-28 18:35 [PATCH v2 0/2] vfs: make active swap files unwritable Darrick J. Wong
@ 2019-06-28 18:35 ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-06-28 18:35 UTC (permalink / raw)
  To: hch, akpm, tytso, viro, darrick.wong
  Cc: linux-xfs, linux-fsdevel, linux-kernel, linux-mm

From: Darrick J. Wong <darrick.wong@oracle.com>

Set S_SWAPFILE on block device inodes so that they have the same
protections as a swap flie.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 mm/swapfile.c |   31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)


diff --git a/mm/swapfile.c b/mm/swapfile.c
index 596ac98051c5..fa4edd0cca3a 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2284,9 +2284,8 @@ EXPORT_SYMBOL_GPL(add_swap_extent);
  * requirements, they are simply tossed out - we will never use those blocks
  * for swapping.
  *
- * For S_ISREG swapfiles we set S_SWAPFILE across the life of the swapon.  This
- * prevents root from shooting her foot off by ftruncating an in-use swapfile,
- * which will scribble on the fs.
+ * For all swap devices we set S_SWAPFILE across the life of the swapon.  This
+ * prevents users from writing to the swap device, which will corrupt memory.
  *
  * The amount of disk space which a single swap extent represents varies.
  * Typically it is in the 1-4 megabyte range.  So we can have hundreds of
@@ -2551,13 +2550,14 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 	inode = mapping->host;
 	if (S_ISBLK(inode->i_mode)) {
 		struct block_device *bdev = I_BDEV(inode);
+
 		set_blocksize(bdev, old_block_size);
 		blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
-	} else {
-		inode_lock(inode);
-		inode->i_flags &= ~S_SWAPFILE;
-		inode_unlock(inode);
 	}
+
+	inode_lock(inode);
+	inode->i_flags &= ~S_SWAPFILE;
+	inode_unlock(inode);
 	filp_close(swap_file, NULL);
 
 	/*
@@ -2780,11 +2780,11 @@ static int claim_swapfile(struct swap_info_struct *p, struct inode *inode)
 		p->flags |= SWP_BLKDEV;
 	} else if (S_ISREG(inode->i_mode)) {
 		p->bdev = inode->i_sb->s_bdev;
-		inode_lock(inode);
-		if (IS_SWAPFILE(inode))
-			return -EBUSY;
-	} else
-		return -EINVAL;
+	}
+
+	inode_lock(inode);
+	if (IS_SWAPFILE(inode))
+		return -EBUSY;
 
 	return 0;
 }
@@ -3185,8 +3185,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	atomic_inc(&proc_poll_event);
 	wake_up_interruptible(&proc_poll_wait);
 
-	if (S_ISREG(inode->i_mode))
-		inode->i_flags |= S_SWAPFILE;
+	inode->i_flags |= S_SWAPFILE;
 	error = 0;
 	goto out;
 bad_swap:
@@ -3208,7 +3207,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	if (inced_nr_rotate_swap)
 		atomic_dec(&nr_rotate_swap);
 	if (swap_file) {
-		if (inode && S_ISREG(inode->i_mode)) {
+		if (inode) {
 			inode_unlock(inode);
 			inode = NULL;
 		}
@@ -3221,7 +3220,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	}
 	if (name)
 		putname(name);
-	if (inode && S_ISREG(inode->i_mode))
+	if (inode)
 		inode_unlock(inode);
 	if (!error)
 		enable_swap_slots_cache();


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-15 16:05 [PATCH v3 0/2] vfs: make active swap files unwritable Darrick J. Wong
2019-08-15 16:05 ` [PATCH 1/2] mm: set S_SWAPFILE on blockdev swap devices Darrick J. Wong
2019-08-16  6:39   ` Christoph Hellwig
2019-08-15 16:05 ` [PATCH 2/2] vfs: don't allow writes to swap files Darrick J. Wong
2019-08-16  6:41   ` Christoph Hellwig
2019-08-16  6:48     ` Darrick J. Wong
2019-08-16 16:19   ` [PATCH v2 " Darrick J. Wong
2019-08-18  8:39     ` Christoph Hellwig
2019-08-15 16:34 ` [PATCH RFC 3/2] fstests: check that we can't write " Darrick J. Wong
2019-08-15 21:26   ` Andrew Morton
2019-08-16  2:13     ` Darrick J. Wong
2019-08-17  2:05   ` [PATCH v2 " Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2019-06-28 18:35 [PATCH v2 0/2] vfs: make active swap files unwritable Darrick J. Wong
2019-06-28 18:35 ` [PATCH 1/2] mm: set S_SWAPFILE on blockdev swap devices Darrick J. Wong

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git