linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* clean up kernel_{read,write} & friends v5
@ 2020-06-24 16:13 Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 01/14] cachefiles: switch to kernel_write Christoph Hellwig
                   ` (13 more replies)
  0 siblings, 14 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

Hi Al,

this series fixes a few issues and cleans up the helpers that read from
or write to kernel space buffers, and ensures that we don't change the
address limit if we are using the ->read_iter and ->write_iter methods
that don't need the changed address limit.

I did not add your suggested comments on the instances using
uaccess_kernel as all of them already have comments.  If you have
anything better in mind feel free to throw in additional comments.


Changes since v4:
 - warn on calling __kernel_write on files not open for write
 - add a FMODE_READ check and warning in __kernel_read
 - add a new patch to remove kernel_readv
 - stop preferring the iter variants if normal read/write is
   present

Changes since v3:
 - keep call_read_iter/call_write_iter for now
 - don't modify an existing long line
 - update a change log

Changes since v2:
 - picked up a few ACKs

Changes since v1:
 - __kernel_write must not take sb_writers
 - unexport __kernel_write

Diffstat:
 fs/autofs/waitq.c            |    2 
 fs/cachefiles/rdwr.c         |    2 
 fs/read_write.c              |  171 ++++++++++++++++++++++++++-----------------
 fs/splice.c                  |   53 +++----------
 include/linux/fs.h           |    4 -
 net/bpfilter/bpfilter_kern.c |    2 
 security/integrity/iint.c    |   14 ---
 7 files changed, 125 insertions(+), 123 deletions(-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/14] cachefiles: switch to kernel_write
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 02/14] autofs: " Christoph Hellwig
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

__kernel_write doesn't take a sb_writers references, which we need here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
---
 fs/cachefiles/rdwr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index e7726f5f1241c2..3080cda9e82457 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -937,7 +937,7 @@ int cachefiles_write_page(struct fscache_storage *op, struct page *page)
 	}
 
 	data = kmap(page);
-	ret = __kernel_write(file, data, len, &pos);
+	ret = kernel_write(file, data, len, &pos);
 	kunmap(page);
 	fput(file);
 	if (ret != len)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/14] autofs: switch to kernel_write
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 01/14] cachefiles: switch to kernel_write Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 03/14] bpfilter: " Christoph Hellwig
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

While pipes don't really need sb_writers projection, __kernel_write is an
interface better kept private, and the additional rw_verify_area does not
hurt here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ian Kent <raven@themaw.net>
---
 fs/autofs/waitq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/autofs/waitq.c b/fs/autofs/waitq.c
index b04c528b19d342..74c886f7c51cbe 100644
--- a/fs/autofs/waitq.c
+++ b/fs/autofs/waitq.c
@@ -53,7 +53,7 @@ static int autofs_write(struct autofs_sb_info *sbi,
 
 	mutex_lock(&sbi->pipe_mutex);
 	while (bytes) {
-		wr = __kernel_write(file, data, bytes, &file->f_pos);
+		wr = kernel_write(file, data, bytes, &file->f_pos);
 		if (wr <= 0)
 			break;
 		data += wr;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/14] bpfilter: switch to kernel_write
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 01/14] cachefiles: switch to kernel_write Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 02/14] autofs: " Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 04/14] fs: unexport __kernel_write Christoph Hellwig
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

While pipes don't really need sb_writers projection, __kernel_write is an
interface better kept private, and the additional rw_verify_area does not
hurt here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/bpfilter/bpfilter_kern.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
index c0f0990f30b604..1905e01c3aa9a7 100644
--- a/net/bpfilter/bpfilter_kern.c
+++ b/net/bpfilter/bpfilter_kern.c
@@ -50,7 +50,7 @@ static int __bpfilter_process_sockopt(struct sock *sk, int optname,
 	req.len = optlen;
 	if (!bpfilter_ops.info.pid)
 		goto out;
-	n = __kernel_write(bpfilter_ops.info.pipe_to_umh, &req, sizeof(req),
+	n = kernel_write(bpfilter_ops.info.pipe_to_umh, &req, sizeof(req),
 			   &pos);
 	if (n != sizeof(req)) {
 		pr_err("write fail %zd\n", n);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/14] fs: unexport __kernel_write
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 03/14] bpfilter: " Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 05/14] fs: check FMODE_WRITE in __kernel_write Christoph Hellwig
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

This is a very special interface that skips sb_writes protection, and not
used by modules anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index bbfa9b12b15eb7..2c601d853ff3d8 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -522,7 +522,6 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 	inc_syscw(current);
 	return ret;
 }
-EXPORT_SYMBOL(__kernel_write);
 
 ssize_t kernel_write(struct file *file, const void *buf, size_t count,
 			    loff_t *pos)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/14] fs: check FMODE_WRITE in __kernel_write
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 04/14] fs: unexport __kernel_write Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 06/14] fs: implement kernel_write using __kernel_write Christoph Hellwig
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

Add a WARN_ON_ONCE if the file isn't actually open for write.  This
matches the check done in vfs_write, but actually warn warns as a
kernel user calling write on a file not opened for writing is a pretty
obvious programming error.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index 2c601d853ff3d8..8f9fc05990ae8b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -505,6 +505,8 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 	const char __user *p;
 	ssize_t ret;
 
+	if (WARN_ON_ONCE(!(file->f_mode & FMODE_WRITE)))
+		return -EBADF;
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		return -EINVAL;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/14] fs: implement kernel_write using __kernel_write
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 05/14] fs: check FMODE_WRITE in __kernel_write Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 07/14] fs: remove __vfs_write Christoph Hellwig
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

Consolidate the two in-kernel write helpers to make upcoming changes
easier.  The only difference are the missing call to rw_verify_area
in kernel_write, and an access_ok check that doesn't make sense for
kernel buffers to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 8f9fc05990ae8b..5110cd1e6e2771 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -499,6 +499,7 @@ static ssize_t __vfs_write(struct file *file, const char __user *p,
 		return -EINVAL;
 }
 
+/* caller is responsible for file_start_write/file_end_write */
 ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs;
@@ -528,16 +529,16 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 ssize_t kernel_write(struct file *file, const void *buf, size_t count,
 			    loff_t *pos)
 {
-	mm_segment_t old_fs;
-	ssize_t res;
+	ssize_t ret;
 
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	/* The cast to a user pointer is valid due to the set_fs() */
-	res = vfs_write(file, (__force const char __user *)buf, count, pos);
-	set_fs(old_fs);
+	ret = rw_verify_area(WRITE, file, pos, count);
+	if (ret)
+		return ret;
 
-	return res;
+	file_start_write(file);
+	ret =  __kernel_write(file, buf, count, pos);
+	file_end_write(file);
+	return ret;
 }
 EXPORT_SYMBOL(kernel_write);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/14] fs: remove __vfs_write
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 06/14] fs: implement kernel_write using __kernel_write Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 08/14] fs: don't change the address limit for ->write_iter in __kernel_write Christoph Hellwig
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

Fold it into the two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 46 ++++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5110cd1e6e2771..96e8e354f99b45 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -488,17 +488,6 @@ static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t
 	return ret;
 }
 
-static ssize_t __vfs_write(struct file *file, const char __user *p,
-			   size_t count, loff_t *pos)
-{
-	if (file->f_op->write)
-		return file->f_op->write(file, p, count, pos);
-	else if (file->f_op->write_iter)
-		return new_sync_write(file, p, count, pos);
-	else
-		return -EINVAL;
-}
-
 /* caller is responsible for file_start_write/file_end_write */
 ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos)
 {
@@ -516,7 +505,12 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 	p = (__force const char __user *)buf;
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	ret = __vfs_write(file, p, count, pos);
+	if (file->f_op->write)
+		ret = file->f_op->write(file, p, count, pos);
+	else if (file->f_op->write_iter)
+		ret = new_sync_write(file, p, count, pos);
+	else
+		ret = -EINVAL;
 	set_fs(old_fs);
 	if (ret > 0) {
 		fsnotify_modify(file);
@@ -554,19 +548,23 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
 		return -EFAULT;
 
 	ret = rw_verify_area(WRITE, file, pos, count);
-	if (!ret) {
-		if (count > MAX_RW_COUNT)
-			count =  MAX_RW_COUNT;
-		file_start_write(file);
-		ret = __vfs_write(file, buf, count, pos);
-		if (ret > 0) {
-			fsnotify_modify(file);
-			add_wchar(current, ret);
-		}
-		inc_syscw(current);
-		file_end_write(file);
+	if (ret)
+		return ret;
+	if (count > MAX_RW_COUNT)
+		count =  MAX_RW_COUNT;
+	file_start_write(file);
+	if (file->f_op->write)
+		ret = file->f_op->write(file, buf, count, pos);
+	else if (file->f_op->write_iter)
+		ret = new_sync_write(file, buf, count, pos);
+	else
+		ret = -EINVAL;
+	if (ret > 0) {
+		fsnotify_modify(file);
+		add_wchar(current, ret);
 	}
-
+	inc_syscw(current);
+	file_end_write(file);
 	return ret;
 }
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/14] fs: don't change the address limit for ->write_iter in __kernel_write
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 07/14] fs: remove __vfs_write Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 09/14] fs: add a __kernel_read helper Christoph Hellwig
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

If we write to a file that implements ->write_iter there is no need
to change the address limit if we send a kvec down.  Implement that
case, and prefer it over using plain ->write with a changed address
limit if available.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 96e8e354f99b45..bd46c959799e97 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -489,10 +489,9 @@ static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t
 }
 
 /* caller is responsible for file_start_write/file_end_write */
-ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos)
+ssize_t __kernel_write(struct file *file, const void *buf, size_t count,
+		loff_t *pos)
 {
-	mm_segment_t old_fs;
-	const char __user *p;
 	ssize_t ret;
 
 	if (WARN_ON_ONCE(!(file->f_mode & FMODE_WRITE)))
@@ -500,18 +499,29 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		return -EINVAL;
 
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	p = (__force const char __user *)buf;
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	if (file->f_op->write)
-		ret = file->f_op->write(file, p, count, pos);
-	else if (file->f_op->write_iter)
-		ret = new_sync_write(file, p, count, pos);
-	else
+	if (file->f_op->write_iter) {
+		struct kvec iov = { .iov_base = (void *)buf, .iov_len = count };
+		struct kiocb kiocb;
+		struct iov_iter iter;
+
+		init_sync_kiocb(&kiocb, file);
+		kiocb.ki_pos = *pos;
+		iov_iter_kvec(&iter, WRITE, &iov, 1, count);
+		ret = file->f_op->write_iter(&kiocb, &iter);
+		if (ret > 0)
+			*pos = kiocb.ki_pos;
+	} else if (file->f_op->write) {
+		mm_segment_t old_fs = get_fs();
+
+		set_fs(KERNEL_DS);
+		ret = file->f_op->write(file, (__force const char __user *)buf,
+				count, pos);
+		set_fs(old_fs);
+	} else {
 		ret = -EINVAL;
-	set_fs(old_fs);
+	}
 	if (ret > 0) {
 		fsnotify_modify(file);
 		add_wchar(current, ret);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/14] fs: add a __kernel_read helper
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 08/14] fs: don't change the address limit for ->write_iter in __kernel_write Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 10/14] integrity/ima: switch to using __kernel_read Christoph Hellwig
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

This is the counterpart to __kernel_write, and skip the rw_verify_area
call compared to kernel_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c    | 23 +++++++++++++++++++++++
 include/linux/fs.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index bd46c959799e97..cc8e0b4f3cd697 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -430,6 +430,29 @@ ssize_t __vfs_read(struct file *file, char __user *buf, size_t count,
 		return -EINVAL;
 }
 
+ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
+{
+	mm_segment_t old_fs = get_fs();
+	ssize_t ret;
+
+	if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
+		return -EINVAL;
+	if (!(file->f_mode & FMODE_CAN_READ))
+		return -EINVAL;
+
+	if (count > MAX_RW_COUNT)
+		count =  MAX_RW_COUNT;
+	set_fs(KERNEL_DS);
+	ret = __vfs_read(file, (void __user *)buf, count, pos);
+	set_fs(old_fs);
+	if (ret > 0) {
+		fsnotify_access(file);
+		add_rchar(current, ret);
+	}
+	inc_syscr(current);
+	return ret;
+}
+
 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f881a892ea746..22cbe7b2e91994 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3033,6 +3033,7 @@ extern int kernel_read_file_from_path_initns(const char *, void **, loff_t *, lo
 extern int kernel_read_file_from_fd(int, void **, loff_t *, loff_t,
 				    enum kernel_read_file_id);
 extern ssize_t kernel_read(struct file *, void *, size_t, loff_t *);
+ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos);
 extern ssize_t kernel_write(struct file *, const void *, size_t, loff_t *);
 extern ssize_t __kernel_write(struct file *, const void *, size_t, loff_t *);
 extern struct file * open_exec(const char *);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/14] integrity/ima: switch to using __kernel_read
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 09/14] fs: add a __kernel_read helper Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 11/14] fs: implement kernel_read " Christoph Hellwig
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

__kernel_read has a bunch of additional sanity checks, and this moves
the set_fs out of non-core code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 security/integrity/iint.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/security/integrity/iint.c b/security/integrity/iint.c
index e12c4900510f60..1d20003243c3fb 100644
--- a/security/integrity/iint.c
+++ b/security/integrity/iint.c
@@ -188,19 +188,7 @@ DEFINE_LSM(integrity) = {
 int integrity_kernel_read(struct file *file, loff_t offset,
 			  void *addr, unsigned long count)
 {
-	mm_segment_t old_fs;
-	char __user *buf = (char __user *)addr;
-	ssize_t ret;
-
-	if (!(file->f_mode & FMODE_READ))
-		return -EBADF;
-
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	ret = __vfs_read(file, buf, count, &offset);
-	set_fs(old_fs);
-
-	return ret;
+	return __kernel_read(file, addr, count, &offset);
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 11/14] fs: implement kernel_read using __kernel_read
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 10/14] integrity/ima: switch to using __kernel_read Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 12/14] fs: remove __vfs_read Christoph Hellwig
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

Consolidate the two in-kernel read helpers to make upcoming changes
easier.  The only difference are the missing call to rw_verify_area
in kernel_read, and an access_ok check that doesn't make sense for
kernel buffers to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index cc8e0b4f3cd697..a0a0b5d1d9249c 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -455,15 +455,12 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 
 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
-	mm_segment_t old_fs;
-	ssize_t result;
+	ssize_t ret;
 
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	/* The cast to a user pointer is valid due to the set_fs() */
-	result = vfs_read(file, (void __user *)buf, count, pos);
-	set_fs(old_fs);
-	return result;
+	ret = rw_verify_area(READ, file, pos, count);
+	if (ret)
+		return ret;
+	return __kernel_read(file, buf, count, pos);
 }
 EXPORT_SYMBOL(kernel_read);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 12/14] fs: remove __vfs_read
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 11/14] fs: implement kernel_read " Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 13/14] fs: implement default_file_splice_read using __kernel_read Christoph Hellwig
  2020-06-24 16:13 ` [PATCH 14/14] fs: don't change the address limit for ->read_iter in __kernel_read Christoph Hellwig
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

Fold it into the two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c    | 43 +++++++++++++++++++++----------------------
 include/linux/fs.h |  1 -
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index a0a0b5d1d9249c..6a2170eaee64f9 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -419,17 +419,6 @@ static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, lo
 	return ret;
 }
 
-ssize_t __vfs_read(struct file *file, char __user *buf, size_t count,
-		   loff_t *pos)
-{
-	if (file->f_op->read)
-		return file->f_op->read(file, buf, count, pos);
-	else if (file->f_op->read_iter)
-		return new_sync_read(file, buf, count, pos);
-	else
-		return -EINVAL;
-}
-
 ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs = get_fs();
@@ -443,7 +432,12 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
 	set_fs(KERNEL_DS);
-	ret = __vfs_read(file, (void __user *)buf, count, pos);
+	if (file->f_op->read)
+		ret = file->f_op->read(file, (void __user *)buf, count, pos);
+	else if (file->f_op->read_iter)
+		ret = new_sync_read(file, (void __user *)buf, count, pos);
+	else
+		ret = -EINVAL;
 	set_fs(old_fs);
 	if (ret > 0) {
 		fsnotify_access(file);
@@ -476,17 +470,22 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 		return -EFAULT;
 
 	ret = rw_verify_area(READ, file, pos, count);
-	if (!ret) {
-		if (count > MAX_RW_COUNT)
-			count =  MAX_RW_COUNT;
-		ret = __vfs_read(file, buf, count, pos);
-		if (ret > 0) {
-			fsnotify_access(file);
-			add_rchar(current, ret);
-		}
-		inc_syscr(current);
-	}
+	if (ret)
+		return ret;
+	if (count > MAX_RW_COUNT)
+		count =  MAX_RW_COUNT;
 
+	if (file->f_op->read)
+		ret = file->f_op->read(file, buf, count, pos);
+	else if (file->f_op->read_iter)
+		ret = new_sync_read(file, buf, count, pos);
+	else
+		ret = -EINVAL;
+	if (ret > 0) {
+		fsnotify_access(file);
+		add_rchar(current, ret);
+	}
+	inc_syscr(current);
 	return ret;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 22cbe7b2e91994..0c0ec76b600b50 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1917,7 +1917,6 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 			      struct iovec *fast_pointer,
 			      struct iovec **ret_pointer);
 
-extern ssize_t __vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 13/14] fs: implement default_file_splice_read using __kernel_read
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 12/14] fs: remove __vfs_read Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
       [not found]   ` <20200701091943.GC3874@shao2-debian>
  2020-06-24 16:13 ` [PATCH 14/14] fs: don't change the address limit for ->read_iter in __kernel_read Christoph Hellwig
  13 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

default_file_splice_read goes through great lenght to create an
iovec array and iov_iter for all the reads, but is a helper only
useful for files not implementing ->read_iter as we have the much
better generic_file_splice_read version available for those.  Remove
the iters and just call __kernel_read in a loop instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c    |  2 +-
 fs/splice.c        | 53 +++++++++++++---------------------------------
 include/linux/fs.h |  2 --
 3 files changed, 16 insertions(+), 41 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 6a2170eaee64f9..1c41c25e548d8c 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1070,7 +1070,7 @@ ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos,
 }
 EXPORT_SYMBOL(vfs_iter_write);
 
-ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
+static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
 		  unsigned long vlen, loff_t *pos, rwf_t flags)
 {
 	struct iovec iovstack[UIO_FASTIOV];
diff --git a/fs/splice.c b/fs/splice.c
index d7c8a7c4db07ff..d1efc53875bd93 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -342,38 +342,26 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = {
 };
 EXPORT_SYMBOL(nosteal_pipe_buf_ops);
 
-static ssize_t kernel_readv(struct file *file, const struct kvec *vec,
-			    unsigned long vlen, loff_t offset)
-{
-	mm_segment_t old_fs;
-	loff_t pos = offset;
-	ssize_t res;
-
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	/* The cast to a user pointer is valid due to the set_fs() */
-	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
-	set_fs(old_fs);
-
-	return res;
-}
-
 static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 				 struct pipe_inode_info *pipe, size_t len,
 				 unsigned int flags)
 {
-	struct kvec *vec, __vec[PIPE_DEF_BUFFERS];
 	struct iov_iter to;
 	struct page **pages;
 	unsigned int nr_pages;
 	unsigned int mask;
 	size_t offset, base, copied = 0;
+	loff_t pos;
 	ssize_t res;
 	int i;
 
 	if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
 		return -EAGAIN;
 
+	res = rw_verify_area(READ, in, ppos, len);
+	if (res < 0)
+		return res;
+
 	/*
 	 * Try to keep page boundaries matching to source pagecache ones -
 	 * it probably won't be much help, but...
@@ -386,37 +374,26 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 	if (res <= 0)
 		return -ENOMEM;
 
-	nr_pages = DIV_ROUND_UP(res + base, PAGE_SIZE);
-
-	vec = __vec;
-	if (nr_pages > PIPE_DEF_BUFFERS) {
-		vec = kmalloc_array(nr_pages, sizeof(struct kvec), GFP_KERNEL);
-		if (unlikely(!vec)) {
-			res = -ENOMEM;
-			goto out;
-		}
-	}
-
 	mask = pipe->ring_size - 1;
 	pipe->bufs[to.head & mask].offset = offset;
 	pipe->bufs[to.head & mask].len -= offset;
 
+	nr_pages = DIV_ROUND_UP(res + base, PAGE_SIZE);
+
+	pos = *ppos;
 	for (i = 0; i < nr_pages; i++) {
 		size_t this_len = min_t(size_t, len, PAGE_SIZE - offset);
-		vec[i].iov_base = page_address(pages[i]) + offset;
-		vec[i].iov_len = this_len;
+
+		res = __kernel_read(in, page_address(pages[i]) + offset,
+				this_len, &pos);
+		if (res < 0)
+			goto out;
 		len -= this_len;
 		offset = 0;
 	}
+	copied = pos - *ppos;
+	*ppos = pos;
 
-	res = kernel_readv(in, vec, nr_pages, *ppos);
-	if (res > 0) {
-		copied = res;
-		*ppos += res;
-	}
-
-	if (vec != __vec)
-		kfree(vec);
 out:
 	for (i = 0; i < nr_pages; i++)
 		put_page(pages[i]);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0c0ec76b600b50..fac6aead402a98 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1919,8 +1919,6 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
-extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
-		unsigned long, loff_t *, rwf_t);
 extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
 				   loff_t, size_t, unsigned int);
 extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 14/14] fs: don't change the address limit for ->read_iter in __kernel_read
  2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
                   ` (12 preceding siblings ...)
  2020-06-24 16:13 ` [PATCH 13/14] fs: implement default_file_splice_read using __kernel_read Christoph Hellwig
@ 2020-06-24 16:13 ` Christoph Hellwig
  13 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

If we read to a file that implements ->read_iter there is no need
to change the address limit if we send a kvec down.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 40 +++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 1c41c25e548d8c..e7f36b15683049 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -421,7 +421,6 @@ static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, lo
 
 ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
-	mm_segment_t old_fs = get_fs();
 	ssize_t ret;
 
 	if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
@@ -431,14 +430,25 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	set_fs(KERNEL_DS);
-	if (file->f_op->read)
+	if (file->f_op->read) {
+		mm_segment_t old_fs = get_fs();
+
+		set_fs(KERNEL_DS);
 		ret = file->f_op->read(file, (void __user *)buf, count, pos);
-	else if (file->f_op->read_iter)
-		ret = new_sync_read(file, (void __user *)buf, count, pos);
-	else
+		set_fs(old_fs);
+	} else if (file->f_op->read_iter) {
+		struct kvec iov = { .iov_base = buf, .iov_len = count };
+		struct kiocb kiocb;
+		struct iov_iter iter;
+
+		init_sync_kiocb(&kiocb, file);
+		kiocb.ki_pos = *pos;
+		iov_iter_kvec(&iter, READ, &iov, 1, count);
+		ret = file->f_op->read_iter(&kiocb, &iter);
+		*pos = kiocb.ki_pos;
+	} else {
 		ret = -EINVAL;
-	set_fs(old_fs);
+	}
 	if (ret > 0) {
 		fsnotify_access(file);
 		add_rchar(current, ret);
@@ -520,7 +530,14 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count,
 
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	if (file->f_op->write_iter) {
+	if (file->f_op->write) {
+		mm_segment_t old_fs = get_fs();
+
+		set_fs(KERNEL_DS);
+		ret = file->f_op->write(file, (__force const char __user *)buf,
+				count, pos);
+		set_fs(old_fs);
+	} else if (file->f_op->write_iter) {
 		struct kvec iov = { .iov_base = (void *)buf, .iov_len = count };
 		struct kiocb kiocb;
 		struct iov_iter iter;
@@ -531,13 +548,6 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count,
 		ret = file->f_op->write_iter(&kiocb, &iter);
 		if (ret > 0)
 			*pos = kiocb.ki_pos;
-	} else if (file->f_op->write) {
-		mm_segment_t old_fs = get_fs();
-
-		set_fs(KERNEL_DS);
-		ret = file->f_op->write(file, (__force const char __user *)buf,
-				count, pos);
-		set_fs(old_fs);
 	} else {
 		ret = -EINVAL;
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [fs] 140402bab8: stress-ng.splice.ops_per_sec -100.0% regression
       [not found]   ` <20200701091943.GC3874@shao2-debian>
@ 2020-07-01 12:13     ` Christoph Hellwig
  2020-07-01 20:32       ` Linus Torvalds
  0 siblings, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2020-07-01 12:13 UTC (permalink / raw)
  To: kernel test robot
  Cc: Christoph Hellwig, Al Viro, Linus Torvalds, Ian Kent,
	David Howells, linux-kernel, linux-fsdevel,
	linux-security-module, netfilter-devel, lkp

FYI, this is because stress-nh tests splice using /dev/null.  Which
happens to actually have the iter ops, but doesn't have explicit
splice_read operation.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [fs] 140402bab8: stress-ng.splice.ops_per_sec -100.0% regression
  2020-07-01 12:13     ` [fs] 140402bab8: stress-ng.splice.ops_per_sec -100.0% regression Christoph Hellwig
@ 2020-07-01 20:32       ` Linus Torvalds
  0 siblings, 0 replies; 17+ messages in thread
From: Linus Torvalds @ 2020-07-01 20:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: kernel test robot, Al Viro, Ian Kent, David Howells,
	Linux Kernel Mailing List, linux-fsdevel, LSM List, NetFilter,
	lkp

On Wed, Jul 1, 2020 at 5:13 AM Christoph Hellwig <hch@lst.de> wrote:
>
> FYI, this is because stress-nh tests splice using /dev/null.  Which
> happens to actually have the iter ops, but doesn't have explicit
> splice_read operation.

Heh. I guess a splice op for /dev/null should be fairly trivial to implement..

               Linus

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-07-01 20:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24 16:13 clean up kernel_{read,write} & friends v5 Christoph Hellwig
2020-06-24 16:13 ` [PATCH 01/14] cachefiles: switch to kernel_write Christoph Hellwig
2020-06-24 16:13 ` [PATCH 02/14] autofs: " Christoph Hellwig
2020-06-24 16:13 ` [PATCH 03/14] bpfilter: " Christoph Hellwig
2020-06-24 16:13 ` [PATCH 04/14] fs: unexport __kernel_write Christoph Hellwig
2020-06-24 16:13 ` [PATCH 05/14] fs: check FMODE_WRITE in __kernel_write Christoph Hellwig
2020-06-24 16:13 ` [PATCH 06/14] fs: implement kernel_write using __kernel_write Christoph Hellwig
2020-06-24 16:13 ` [PATCH 07/14] fs: remove __vfs_write Christoph Hellwig
2020-06-24 16:13 ` [PATCH 08/14] fs: don't change the address limit for ->write_iter in __kernel_write Christoph Hellwig
2020-06-24 16:13 ` [PATCH 09/14] fs: add a __kernel_read helper Christoph Hellwig
2020-06-24 16:13 ` [PATCH 10/14] integrity/ima: switch to using __kernel_read Christoph Hellwig
2020-06-24 16:13 ` [PATCH 11/14] fs: implement kernel_read " Christoph Hellwig
2020-06-24 16:13 ` [PATCH 12/14] fs: remove __vfs_read Christoph Hellwig
2020-06-24 16:13 ` [PATCH 13/14] fs: implement default_file_splice_read using __kernel_read Christoph Hellwig
     [not found]   ` <20200701091943.GC3874@shao2-debian>
2020-07-01 12:13     ` [fs] 140402bab8: stress-ng.splice.ops_per_sec -100.0% regression Christoph Hellwig
2020-07-01 20:32       ` Linus Torvalds
2020-06-24 16:13 ` [PATCH 14/14] fs: don't change the address limit for ->read_iter in __kernel_read Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).